100 Days of DevOps — Day 79-Apache Log Parser Using Python

Prashant Lakhera
3 min readApr 30, 2019

--

To view the updated DevOps course(101DaysofDevOps)

Course Registration link: https://www.101daysofdevops.com/register/

Course Link: https://www.101daysofdevops.com/courses/101-days-of-devops/

YouTube link: https://www.youtube.com/user/laprashant/videos

Welcome to Day 79 of 100 Days of DevOps, Focus for today is Apache Log Parser Using Python

The aim of this tutorial is to create Apache log parser which is really helpful in determine offending IP addresses during the DDoS attack on your website. This is what we are going to do

  • Read Apache log file(access.log)
  • Count quantity of requests to your website from each IP address

If you look at the content of access.log this is how it looks

192.168.0.1 — — [23/Apr/2017:05:54:36 -0400] “GET / HTTP/1.1” 403 3985 “-” “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36”

Let’s break down these fields

192.168.0.1 --> IP address[23/Apr/2017:05:54:36 -0400] --> Date,time and timezoneGET / HTTP/1.1 --> HTTP get request to read the page403 --> Server response code3985 --> Number of byte transferredMozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36” --> Finally there are data of user's hardware,OS and browser

But the piece we are interested in is IP address

So as mentioned above our first step is

  • Read the Apache log file
def apache_log_reader(logfile):# We are saying opened file to the f variable, where f is a reference to the file object 
with
open(logfile) as f:
log = f.read()
print(log)
# Create entry point of our code
if __name__ == '__main__':
apache_log_reader("access_log")

Now let’s go to the second part

  • Count IP address
import re
from collections import Counter
def apache_log_reader(logfile):
myregex = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
with open(logfile) as f:
log = f.read()
my_iplist = re.findall(myregex,log)
ipcount = Counter(my_iplist)
for
k, v in ipcount.items():
print("IP Address " + "=> " + str(k) + " " + "Count " + "=> " + str(v))
# Create entry point of our code
if __name__ == '__main__':
apache_log_reader("access_log")

Let’s break this code

  • As we don’t need the entire entry we need to do some pattern search and that we can do with the help of regular expression
  • We imported the re module and then write the regular expression matching pattern, where
\d: Any numeric digit[0–9]

For more info please refer to

r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
  • r stands for raw string
  • Then we are using the collection module

Output

IP Address => 192.168.0.2 Count => 1
IP Address => 192.168.0.3 Count => 1
IP Address => 192.168.0.4 Count => 3
IP Address => 192.168.0.5 Count => 1
IP Address => 192.168.0.6 Count => 1

This is not the only way to write this code, there is a much better way to write the same piece of code, so stay tuned ;-)

Now the better way to write the apache log parser

GitHub Link

Looking forward from you guys to join this journey and spend a minimum an hour every day for the next 100 days on DevOps work and post your progress using any of the below medium.

Reference

--

--

Prashant Lakhera
Prashant Lakhera

Written by Prashant Lakhera

AWS Community Builder, Ex-Redhat, Author, Blogger, YouTuber, RHCA, RHCDS, RHCE, Docker Certified,4XAWS, CCNA, MCP, Certified Jenkins, Terraform Certified, 1XGCP

No responses yet