Day 11–101 Days of DevOps — Apache log parser using Python

To view the complete course, please check the below url.

For more info, register via the below link

YouTube Channel link

Welcome to Day 11 of 101 Days of DevOps. The topic for today is an Apache log parser using python. This tutorial aims to create an Apache log parser which is really helpful in determining offending IP addresses during the DDoS attack on your website. This is what we are going to do.

  • Read Apache log file(access.log)
  • Count quantity of requests to your website from each IP address
  • Save the output in a csv file
  • Make script user friendly by using argparse

If you look at the content of access.log, this is how it looks

192.168.0.1 — — [23/Apr/2017:05:54:36 -0400] “GET / HTTP/1.1” 403 3985 “-” “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36”

Let’s break down these fields

192.168.0.1 --> IP address[23/Apr/2017:05:54:36 -0400] --> Date,time and timezoneGET / HTTP/1.1 --> HTTP get request to read the page403 --> Server response code3985 --> Number of byte transferredMozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36” --> Finally there are data of user's hardware,OS and browser

But the piece we are interested in is the IP address

So as mentioned above, our first step is

  • Importing all standard python modules.
import re
from collections import Counter
import csv
import argparse
  • Read the Apache log file.
with open("access_log") as f:
fread = f.read()

\d: Any numeric digit[0–9]

Then we are going to use findall() to get the list of IP addresses.

logreg="\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
ip_list = re.findall(logreg, fread)
  • As we filtered the IP address so far, we will count the number of times IP repeated in the file in the next step. To do that, we are going to use the collections module and its method called Counter. We need to pass ip_list the IP list we filtered via the regex to Counter(ip_list) method and then print it.
for k, v in Counter(ip_list).items():
print(k,v)
  • So far, our code will look like this.
import re
from collections import Counter
import csv
import argparse

logreg="\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
with open("access_log") as f:
fread = f.read()
ip_list = re.findall(logreg, fread)
for k, v in Counter(ip_list).items():
print(k,v)
  • If we execute our code at this stage(store the code in a file apache_parser.py and then execute it, you will see we are getting an IP address along with the count.
python3 apache_parser.py 
194.5.53.89 1
20.191.45.212 4
35.208.242.202 60
88.150.188.171 1
5.0.375.99 1
185.191.171.37 1
  • As our next requirement, we need to save the output in a csv file. To achieve that, we are going to open the ipnewcount.csv file in write mode. Then we will create a fwritercsv object. The next line, “fwritercsv.writerow([“IP_Address”, “Count”])” act as a header to the file, i.e., at the top of a csv file, you will see IP_Address and Count as header. Then we will modify our code a little bit, and rather than printing the value; we will store that in a csv file.
with open("ipnewcount.csv", "w") as f:
fwritercsv = csv.writer(f)
fwritercsv.writerow(["IP_Address", "Count"])
for k, v in Counter(ip_list).items():
fwritercsv.writerow([k, v])
  • Our modified code will look like this.
import re
from collections import Counter
import csv
import argparse

logreg="\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
with open("access_log") as f:
fread = f.read()
ip_list = re.findall(logreg, fread)
with open("ipnewcount.csv", "w") as f:
fwritercsv = csv.writer(f)
fwritercsv.writerow(["IP_Address", "Count"])
for k, v in Counter(ip_list).items():
fwritercsv.writerow([k, v])
  • If we execute our code now, you will not get any output
(venv) ➜  python3 test_apache_parser.py
(venv) ➜
  • But you will see the file is created
-rw-r--r--   1 plakhera  plakhera     585 Jul 11 15:43 ipnewcount.csv
  • If you read the content of a file, as you can see, we have got (IP_Address, Count) as header and then IP Address followed by the count.
head ipnewcount.csv 
IP_Address,Count
194.5.53.89,1
20.191.45.212,4
35.208.242.202,60
  • We met all our requirements except the last one to make this code user-friendly, which we will do with the help of argparse. What argparse will do give the user-friendly help output and allow us to pass logfile on the command line. As the first step, we need to create the parser, then we will add an argument to the parser, and the final step executes the parser.
my_parser = argparse.ArgumentParser(description='Reading the log file')
my_parser.add_argument("logfile",
help='Please enter the logfile to parse',type=argparse.FileType('r'))
args = my_parser.parse_args()
  • To use this parser, the only change you need to add in your script is rather than hardcoding the value of logfile, replace it with the value you read via a parser.
with args.logfile as f:
  • As now we met all the requirements, the final code will look like this
import re
from collections import Counter
import csv
import argparse

my_parser = argparse.ArgumentParser(description='Reading the log file')
my_parser.add_argument("--l","--logfile",
help='Please enter the logfile to parse',dest="logfile",type=argparse.FileType('r'), required=True)
args = my_parser.parse_args()


logreg="\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
with args.logfile as f:
fread = f.read()
ip_list = re.findall(logreg, fread)
with open("ipnewcount.csv", "w") as f:
fwritercsv = csv.writer(f)
fwritercsv.writerow(["IP_Address", "Count"])
for k, v in Counter(ip_list).items():
fwritercsv.writerow([k, v])
  • If you try to execute the code now without passing any argument, you will get a nice friendly output.
python3 apache_parser.py                     
usage: apache_parser.py [-h] --l LOGFILE
apache_parser.py: error: the following arguments are required: --l/--logfile
  • If you use the -h option, you will get all the help output
python3 apache_parser.py -h                  
usage: apache_parser.py [-h] --l LOGFILE
Reading the log fileoptional arguments:
-h, --help show this help message and exit
--l LOGFILE, --logfile LOGFILE
Please enter the logfile to parse
  • Finally, it's time to execute the code
python3 apache_parser.py --logfile access_log

This is version 1 of the code https://github.com/100daysofdevops/100daysofdevops/blob/master/apache_log_parsing/apache_log_parsing_v1.py

Assignment

  1. Try to break the code in terms of function, i.e., there is a separate function to perform regex, a separate function for performing the counter calculation, and a separate function for writing to csv file.

I am looking forward to you guys joining the amazing journey.

AWS Community Builder, Ex-Redhat, Author, Blogger, YouTuber, RHCA, RHCDS, RHCE, Docker Certified,4XAWS, CCNA, MCP, Certified Jenkins, Terraform Certified, 1XGCP