Translate

Archives

Convert Date TimeZone in Apache Server Log

Recently, I helped a user who was given the task of converting the dates in a Apache server log from GMT (Greenwich Mean Time) to PST (Pacific Standard Time). In theory all that is necessary is to extract the date/time string from each log entry line, subtract 7 or 8 hours depending on whether daylight savings time is in force or not, and rebuild the log entry line with the new date/time string.

On first examination, this would seen to be an easy task which would be suitable for gawk given that it has support for mktime and strptime. However I quickly found out that mktime assumes the current timezone and there is no way whatsoever to specify an alternative timezone to mktime.

So I decided to use Python, which is one of my favourite scripting languages, to solve the problem. Here is the finished script:

#!/usr/bin/python

from datetime import datetime
from pytz import timezone
import time
import pytz
import sys
import re

utc = pytz.utc
pacific = timezone("US/Pacific")

log_re = '(?P<ip>[.\d]+)(\s+)-(\s+)-(\s+)\[(?P<time>.+)\](?P<remainder>.*$)'
pattern = re.compile(log_re)

lineno = 0
for line in open("infile"):                       # change to suit
    lineno += 1
    # print line
    m = pattern.match(line)
    if m is None:
        print "ERROR: At line number:", lineno
        continue
    res = m.groupdict()

    datestr = str(res["time"])
    datestr = re.sub("\s+" , " ", datestr)

    dt = time.strptime(datestr[:-6], "%d/%b/%Y:%H:%M:%S")
    utc_dt = datetime(dt[0], dt[1], dt[2], dt[3], dt[4], dt[5], tzinfo=utc)
    loc_dt = utc_dt.astimezone(pacific)

    print res["ip"] + "\t-\t -\t[" + loc_dt.strftime("%d/%b/%Y:%H:%M:%S %z") + "]"+res["remainder"]

sys.exit(0)


The Python module pytz makes most of the Olsen timezone database (AKA zoneinfo database) available to the script. This module supports converting between timezones. Since I was not dealing with daylight savings time transitions, I was able to avoid the need for normalize and localize. These particular methods ensure that the conversion is correct where daylight savings is involved.

The other interesting thing about this script is the use of a compiled regular expression (log_re) to parse each log file entry. There are a number of Python modules available for parsing Apache web server log files but I like to use a regular expression. The use of named subgroups makes it easy to manipulate the results of the regular expression match. groupdict returns a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument None is used for subgroups that did not participate in the match – hence the if m is None test.

Given the following input:

99.60.97.205    -       -       [26/Mar/2007:06:00:00   +0000]  GET /world/impact.row.atlantic_1_rower-paul-ridley-cancer-research?_s=PM:WORLD HTTP/1.1      200     9386    www.abc.com     Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.107 Safari/534.13        TCP_MISS        Apache=-        -       1068000 -       -       -       deflate=-       rmt=-
72.234.67.132   -       -       [26/Mar/2007:09:00:00   +0000]  GET /ad-abc.php?f=medium_rectangle HTTP/1.1     200     869     www.abc.com     Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.16) Gecko/20101130 Firefox/3.5.16       TCP_HIT Apache=-        -       1000    -       -       -       deflate=-       rmt=-
68.12.178.167   -       -       [26/Mar/2007:09:30:00   +0000]  GET /ad-feedback.js.php?e3e999d9b79cf36c165f5b379a0e9f269be82344 HTTP/1.1       200     600     www.abc.com     Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) TCP_HIT Apache=-        -       1000    -       -       -       deflate=-       rmt=-
128.186.145.53  -       -       [26/Mar/2007:10:00:00   +0000]  GET /ad-feedback.js.php?e3e999d9b79cf36c165f5b379a0e9f269be82344 HTTP/1.1       200     628     www.abc.com     Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; MAAU; .NET4.0C)    TCP_HIT Apache=-        -       2000    -       -       -       deflate=-       rmt=-
174.253.212.250 -       -       [27/Mar/2007:05:00:00   +0000]  GET /css/ap-CN1-G02.css?e3e999d9b79cf36c165f5b379a0e9f269be82344 HTTP/1.1       200     7146    www.abc.com     Mozilla/5.0 (Linux; U; Android 2.2; en-us; ADR6300 Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1   TCP_HIT Apache=-        -       5000    -       -       -       deflate=-       rmt=-


here is the output produced by this script:

99.60.97.205	-	 -	[25/Mar/2007:23:00:00 -0700]  GET /world/impact.row.atlantic_1_rower-paul-ridley-cancer-research?_s=PM:WORLD HTTP/1.1      200     9386    www.abc.com     Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.107 Safari/534.13        TCP_MISS        Apache=-        -       1068000 -       -       -       deflate=-       rmt=-
72.234.67.132	-	 -	[26/Mar/2007:02:00:00 -0700]  GET /ad-abc.php?f=medium_rectangle HTTP/1.1     200     869     www.abc.com     Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.16) Gecko/20101130 Firefox/3.5.16       TCP_HIT Apache=-        -       1000    -       -       -       deflate=-       rmt=-
68.12.178.167	-	 -	[26/Mar/2007:02:30:00 -0700]  GET /ad-feedback.js.php?e3e999d9b79cf36c165f5b379a0e9f269be82344 HTTP/1.1       200     600     www.abc.com     Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) TCP_HIT Apache=-        -       1000    -       -       -       deflate=-       rmt=-
128.186.145.53	-	 -	[26/Mar/2007:03:00:00 -0700]  GET /ad-feedback.js.php?e3e999d9b79cf36c165f5b379a0e9f269be82344 HTTP/1.1       200     628     www.abc.com     Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; MAAU; .NET4.0C)    TCP_HIT Apache=-        -       2000    -       -       -       deflate=-       rmt=-
174.253.212.250	-	 -	[26/Mar/2007:22:00:00 -0700]  GET /css/ap-CN1-G02.css?e3e999d9b79cf36c165f5b379a0e9f269be82344 HTTP/1.1       200     7146    www.abc.com     Mozilla/5.0 (Linux; U; Android 2.2; en-us; ADR6300 Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1   TCP_HIT Apache=-        -       5000    -       -       -       deflate=-       rmt=-


Note that the above example uses the Apache Common Log Format. There are other web server formats out there. If your web server logs are in a different format you will need to modify log_re to handle your particular format.

Comments are closed.