Custom pfSense Firewall Log Analyzer

April 18, 2011
Tags: , ,

UPDATE: Just use Splunk!

Preamble: This was setup using a pfSense machine and a Debian based LAMP.

pfSense is an amazing open source firewall capable of protecting a network ranging from a home network to a large corporate network. The one thing I could not get it to do was show me statistics on blocked TCP connections at the firewall. This is an interesting metric to discover IP addresses that are probing/scanning an outward facing IP. After searching high and low, I was unable to find an easy plugin solution to do this, so I wrote my own in python on a LAMP machine. Here are the steps to implement it:

      1. Setup the firewall logs to be sent to another machine via SYSLOG. This is accomplished by changing the pfSense settings in Status -> Settings -> Enable syslogging to a remote server  and inputting the remote syslog server IP address. Then, check the box marked “firewall events” and save. This will forward just the firewall logs to the remote server in addition to showing them in pfSense.
      2. Setup the remote log server to accept the logs. On the remote machine we need to set it up to accept the logs. Start by replacing the default rsyslog with syslog-ng (apt-get install syslog-ng). In the /etc/syslog-ng/syslog-ng.conf file add the following lines:
# Brad's Custom listener 
source s_net { udp (); }; 
#Brad's Custom Destination 
destination df_pfsensefirewall { file("/var/log/pfsense/pfsensefirewall.log"); }; 
#Brad's Custom filter 
filter f_pfsensefirewall { host( "192.168.0.1" ); }; 
#Brad's Log 
log { source ( s_net ); filter( f_pfsensefirewall ); destination ( df_pfsensefirewall ); };

You may need to adjust the filter IP address for your network.

      1. Next, setup Logrotate. Logrotate will be used to keep our log files from taking over the hard drive. Add an entry in /etc/logrotate.d/syslog-ng:
/var/log/pfsense/pfsensefirewall.log {
daily
rotate 7
size 100k
notifempty
postrotate
 /home/user/pfsenseparser/parser.sh
endscript
}

The postrotate shell script (/home/user/pfsenseparser/parser.sh) is what will be parsing the logs. I placed the parser.sh script in a folder entitled pfsenseparser in my home directory. This Logrotate entry will run daily, keep 7 old logs, and run the parser.sh script after the logs have been rotated. Restart syslog-ng at this step to see if the logs are being written to /var/log/pfsense/pfsensefirewall.log by running /etc/init.d/syslog-ng restart.

      1. Create the parser.sh file. Parser.sh is the bash script that is called by logrotate after logrotate has done its business. Here is what mine looks like:
#!/bin/sh
#location of the rotated log file
FILE=/var/log/pfsense/pfsensefirewall.log.1
#get rid of identical lines (to speed things up) and run the python script
grep TCP $FILE | uniq > /home/user/pfsenseparser/grepped.log | python /home/user/pfsenseparser/pythonparser.py
#restart syslog-ng
/etc/init.d/syslog-ng restart > /dev/null
exit 0
      1. Create the pythonparser.py file. Pythonparser.py is the actual log parsing portion. It cuts up the logs and stacks them in the MySQL database. You may need to install addition python software, e.g. MySQLdb.  My version is setup to record the latitude and longitude of the IP addresses that were attacking/scanning my firewall. I get this information from ipinfodb.com. If you want this functionality you will have to get your own API key at ipinfodb.com. I use this information to display the location of the IP on a google map. Here is the code:
#!/usr/bin/python

import re,urllib2,MySQLdb,datetime,os
from urllib import urlopen
from xml.dom.minidom import parse, parseString
from xml.etree import ElementTree as ET

#API key for ipinfodb.com
apikey = "GET YOUR OWN KEY"
#import the file
input = open('/home/user/pfsenseparser/grepped.log', 'r')
#error log
error_output = open('/home/user/pfsenseparser/error.log', 'a')
#output files
output = open('final.txt', 'a')

# this allows for the IP to Lat/Long conversion
url = "http://api.ipinfodb.com/v2/ip_query.php?key="+apikey+"&ip="

#MySQL Connect
db = MySQLdb.connect("localhost","pfsenseparser","username","password")
cursor = db.cursor()

#what time is it?
now = datetime.datetime.now()
error_output.write('<------Started at: ' + now.strftime("%Y-%m-%d %H:%M") + '--->\n')

#number of new entries
num_new_data = 0
num_exist_data = 0

#testing variable no SQL or file write if set to 0
testing = 1

for line in input:

#www.txt2re.com
#################################################

   re1='((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Sept|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?))'    # Month 1
   re2='.*?'    # Non-greedy match on filler
   re3='((?:(?:[0-2]?\\d{1})|(?:[3][0,1]{1})))(?![\\d])'    # Day 1
   re4='.*?'    # Non-greedy match on filler
   re5='((?:(?:[0-1][0-9])|(?:[2][0-3])|(?:[0-9])):(?:[0-5][0-9])(?::[0-5][0-9])?(?:\\s?(?:am|AM|pm|PM))?)'    # HourMinuteSec 1
   re6='.*?'    # Non-greedy match on filler
   re7='(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?![\\d])'    # Uninteresting: ipaddress
   re8='.*?'    # Non-greedy match on filler
   re9='((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\\d])'    # IPv4 IP Address 1
   re10='.*?'    # Non-greedy match on filler
   re11='(\\d+)'    # Integer Number 1
   re12='.*?'    # Non-greedy match on filler
   re13='((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\\d])'    # IPv4 IP Address 2
   re14='.*?'    # Non-greedy match on filler
   re15='(\\d+)'    # Integer Number 2

   rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10+re11+re12+re13+re14+re15,re.IGNORECASE|re.DOTALL)
   m = rg.search(line)
   if m:
           month1=m.group(1)
           day1=m.group(2)
           time1=m.group(3)
           ipaddress1=m.group(4)
           int1=m.group(5)
           ipaddress2=m.group(6)
           int2=m.group(7)
           #print "("+month1+")"+"("+day1+")"+"("+time1+")"+"("+ipaddress1+")"+"("+int1+")"+"("+ipaddress2+")"+"("+int2+")"+"\n"

##############################

       #ipaddress1 is the one we want
       #lets see if it is in the database before we ask ipinfodb
       sqlipcheck = "SELECT * FROM pfsenseparser WHERE Ip_Address = ('%s')" % (ipaddress1)
       cursor.execute(sqlipcheck)
       data = cursor.fetchall()
       ipcheck = ""
       for row in data:
           daycheck = row[2]
           timecheck = row[3]
           ipcheck = row[4]
       if ipcheck != ipaddress1:
           #This is a new IP Address
           error_output.write(ipaddress1+' = New Data\n')
           url2 = ipaddress1+"&timezone=false"
               url3 = url+url2
               urlobj = urllib2.urlopen(url3)
               data = urlobj.read()
                  urlobj.close()
               dom = ET.XML(data)
           city = dom.findtext("City")
           country = dom.findtext("CountryName")
           region = dom.findtext("RegionName")
           region = region.replace("'", "")
                lat = dom.findtext("Latitude")
               long = dom.findtext("Longitude")
           if testing == 1:
               cursor.execute("INSERT INTO pfsenseparser (Month,Day,Time,Ip_Address, Port_Num,Lat,Longitude,City,Country_Name,Region,Num_Connect,Type) VALUES (%s, %s, %s,%s, %s, %s, %s, %s, %s, %s, 1, \"firewall\")", (month1,day1,time1,ipaddress1,int2,lat,long,city,country,region))
               db.commit()
                   output.write(month1+","+day1+","+time1+","+ipaddress1+","+int1+","+lat+","+long+","+ipaddress2+","+int2+" "+"\n")
                       #output2.write(month1+","+day1+","+int1+c1+int2+","+ipaddress1+","+int1+","+lat+","+long+","+ipaddress2+","+int2+" "+"\n")
           num_new_data = num_new_data+1
       elif timecheck != time1 and daycheck != day1:
           #This is an existing IP Address but not a duplicate
           sql2 = "SELECT Num_Connect FROM pfsenseparser WHERE Ip_Address = ('%s')" % (ipaddress1)
           cursor.execute(sql2)
           data = cursor.fetchall()
           for row in data:
               num = row[0]
               num_new = int(num)+1
               sql5 = "UPDATE pfsenseparser SET Num_Connect = ('%d') WHERE Ip_Address = ('%s')" % (num_new,ipaddress1)
               if testing == 1:
                   cursor.execute(sql5)
                   db.commit()
           num_exist_data = num_exist_data+1

now2 = datetime.datetime.now()
if testing == 1:
#send me a text
   os.system('echo \'parser.sh just ran with ' + str(num_new_data) + ' new entries and ' + str(num_exist_data) + ' existing entries\' | mailx youremailhere ')
   #write to error.log
   #insert timestamp and new entries into database
   if num_new_data != 0:
       cursor.execute("INSERT INTO pf_meta (Last_Run,New_Data) VALUES (NOW(),'Y')")
   else:
       cursor.execute("INSERT INTO pf_meta (Last_Run,New_Data) VALUES (NOW(),'N')")
   error_output.write('Ended at: ' + now2.strftime("%Y-%m-%d %H:%M") + ' with ' + str(num_new_data) + ' new entries and ' + str(num_exist_data) + ' existing entries\n')
else:
    os.system('echo \'Just Testing\' | mailx youremailhere ')
db.close()

      1. Create a place to stack the logs in the mysql database. The pythonparser.sh looks at the logs and inserts them into the database if they are new IP addresses. If the IP address is already in the database it will increment the number of times the IP address has attempted a connection. Here are the create statements for MySQL
CREATE TABLE `pfsenseparser` (
  `Log_Id` int(11) NOT NULL auto_increment,
  `Month` varchar(30) default NULL,
  `Day` varchar(30) default NULL,
  `Time` varchar(30) default NULL,
  `Ip_Address` varchar(30) default NULL,
  `Port_Num` varchar(30) default NULL,
  `Lat` varchar(30) default NULL,
  `Longitude` varchar(30) default NULL,
  `notes` varchar(200) default NULL,
  `City` varchar(50) default NULL,
  `Country_Name` varchar(75) default NULL,
  `Region` varchar(75) default NULL,
  `Num_Connect` mediumtext,
  `Type` varchar(50) default NULL,
  PRIMARY KEY  (`Log_Id`)
)

CREATE TABLE `pf_meta` (
  `Meta_Id` int(11) NOT NULL auto_increment,
  `Last_Run` datetime default NULL,
  `New_Data` varchar(30) default NULL,
  PRIMARY KEY  (`Meta_Id`)
)
      1. Pull the data from the MySQL Database. The great part about inserting the data into a database is the ability to query that data in a meaningful way. By using PHP we can create statistics to help us analyze the information. Here is a sample PHP file for pulling the data out that you can download PHP index.php. The code grabs the top 10 ports, IPs, and countries. I also have PHP files for the Google maps if requested in the comments.
      2. Obviously, this is a work in progress. Please offer suggestions to help improve this code or leave questions if you have any!

UPDATE: As requested here is the code for the Google map displaying where the firewall blocks are coming from.

WARNING! This has only been tested in Firefox! It will not work in Chrome! This is very custom code. When it does not work for you, leave a comment!

11 Responses to “Custom pfSense Firewall Log Analyzer”

  1. I’d definitely love to see the Google Maps scripts!

  2. Steve, I have updated the blog post with a link to the Google map code.

  3. Hi brad, i start to try you script, i dont know python, but in any case i have some results.
    Ciao,
    andrea

  4. Hey, I love this idea, but… when pythonparser.py is run, it returns a traceback call stating that ipaddress1 is not defined;

    Traceback (most recent call last):
    File “/home/user/scripts/firewallparser/pythonparser.py”, line 72, in
    sqlipcheck = “SELECT * FROM pfsenseparser WHERE Ip_Address = (‘%s’)” % (ipaddress1)
    NameError: name ‘ipaddress1’ is not defined

    Any info on this?

    Daniel

  5. Daniel,

    What version of python and pfSense are you running?

  6. Hello again,

    Thank you for your swift reponse!

    I am currently running pfsense 2.0-RELEASE (i386) nanobsd (512mb), and python 2.6.5.

  7. Daniel,

    Give me a few days to remember what the code is doing and I will be able to get you a better answer. 🙂 Try adding a print ipaddress1 right before line 72 and see if the ipaddress1 variable has anything in it. What might be happening is the log format changed from pfsense 1.2.3 to a new format in 2.0. The regex may need to be updated to reflect the new version. I will look into it over the next few days.

  8. Ahoy,

    I tried printing the ipaddress1 variable, but I get the same error message.
    Not sure if it helps, but here is a sample log entry coming from pfsense 2.0-RELEASE to a logging server:

    Apr 19 09:02:03 192.168.34.1 pf: 00:00:01.989361 rule 1/0(match): block in on vr1: (tos 0x0, ttl 49, id 47405, offset 0, flags [DF], proto TCP (6), length 60)
    Apr 19 09:02:03 192.168.34.1 pf: 215.129.198.124.28548 > 192.168.77.24.80: Flags [S], cksum 0xc2b6 (correct), seq 3307482584, win 3210, options [mss 1154,sackOK,TS val 183254574 ecr 0,nop,wscale 6], length 0

    I did also find this information on the gltail parser page on pfsense forums, while searching for logfile format change in version 2.0:
    “NOTE: This currently does NOT work with 2.0 snapshots based on FreeBSD 8.0. The log format has changed, and the parser has yet to catch up.”

    Many thanks for spending some time on this.

    All the best,

    Daniel

  9. Daniel,

    After doing some googling, I confirmed that freeBSD has changed their log format. The pfsense dev team created a parser just for the new log format. Just like in your example, the logs are now split into two lines. This is making it very difficult to parse the log. I have two options I can uncut up the log and put them back into a single line format or I can create a multi line regular expression. Are you good at either of those? Perhaps we could work together to figure it out?

    Or I could integrate splunk?

  10. Brad,

    Thank you. Well, I am afraid I am not goot at either of the two suggestions. However much I would like to be adept at programming, I must admit that my skills are somewhere between novice and nil. I am in the process of trying to land at a language to stick with to learn properly. RegEx is definitely not where I have started.

    If it is simple to integrate, then maybe splunk is the way to go?

    If there are any tedious, non-advanced tasks I could help with, please let me know.

    Enjoy!

  11. Splunk it is! I was leaning that way already. On the programming languages, it seems you already are able to read and execute python. If you have any questions let me know.

Leave a Reply