Logstash: how to get logs in a readable format (part 1 of 3: Parsing)


logstashI am new to Logstash and have suffered mightily trying to understand how to get my logs in a readable format and it needed a lot of googling around.

In addition many of the posts assumed a level of experience with concepts that I just didn’t have.

So I thought I would write up a summary of what I discovered and explain it to others who are capable, but not yet deeply immersed into this subject – some “Logstash FAQs” you could say.

This blog entry isn’t meant to be exhaustive – merely a rag bag of experience to complement what is already out there.

Our environment

We are using

  • Elastic Search 1.7
  • Logstash 1.5
  • Kibana 4.1

And on Windows (in AWS)

Logstash FAQs

This breaks down into 3 parts:

Part 3 coming soon – for notifications, follow us via our blog or via Twitter: https://twitter.com/trainlineEngine)

Part 1: Parsing

Conditional parsing of logfiles

We had an IIS logfile that sometimes contains an IP address and sometimes it is substituted with a ‘-‘.  Could I have one pattern that copes with both of these situations?

e.g.:

2015-12-14 11:10:53.771 11.131.36.2 Get / servicename

or:

2015-12-14 11:10:53.771 - Get / servicename

There is a way and it is:

(?:%{IP:c_ip}|-)

…where IP is the built in pattern fragment to match an IP address, c_ip is the field to read it into and the final ‘-‘ is the character that appears in my logs when an IP address isnt present (see above)

Parsing an IIS logfile

You cannot just take something off the internet as gospel to parse your file – it will depend upon which values are being logged.  For instance if we are logging

#Fields:  date time c-ip xff-ip cs-username s-ip cs-method cs-uri-stem cs-uri-query cs(User-Agent) sc-bytes sc-status time-taken cs-host cs(Cookie) cs(Referer)

Then we can use

pattern => "%{DATESTAMP:eventTime} %{IP:c_ip} (?:%{IP:xff_ip}|-) (?:%{NOTSPACE:cs_username}|-) %{IP:s_ip} %{URIPROTO:method} %{URIPATH:request} (?:%{NOTSPACE:queryparam}|-) (?:%{QUOTEDSTRING:user_agent}|-) %{NUMBER:sc_bytes } %{NUMBER:sc_status } %{NUMBER:time_taken } %{NOTSPACE:cs_host} (?:%{QUOTEDSTRING:cs_cookie}|-) (?:%{NOTSPACE:cs_referrer}|-)"

Note that I inserted the conditional parsing in places as I could see from my logs that in those places I had variability of what was logged.

Ensuring all log entries appear

But note that if the above pattern doesnt exactly match the logline then it wont be activated and the log entry wont be passed on to elastic search.  But I wanted all entries to be logged.  So I added after the first pattern

pattern => "%{DATESTAMP:eventTime} %{GREEDYDATA:logEntry}"
pattern => "%{GREEDYDATA:logEntry}"

Once a pattern has matched (by default) no other patterns are attempted. So if my initial attempt at parsing the IIS logline failed I would essentially treat the whole logline as text except for the datestamp.  By extracting out the datestamp I can ensure that the log entry can be viewed at teh time the logline was created (not when it was read).

Finally, if the logline is so misunderstood that even the date is in an unexpected place or format, then we just do a total wildcard search

Finally realise that match and pattern do the same job in this case.

Testing a pattern

A job cant be done unless you can test progress and teh best way of iteratively mapping out a pattern is to use a grok debugger.  I used http://grokdebug.herokuapp.com/

Simply take a line from the logfile you want to parse and then build up the pattern left to right.  The first time your WIP pattern doesnt match the input, you will get nothing back.

Predefined pattern fragments

The nice people in Internet land have created a set of pattern fragments that can be used to map over just about all the common items you are likely to find in a logfile.  You dont have to do anything special to be able to use them.  They can be found here  https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns and the follownig is a useful page to identify what the regex patterns means  https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions

Part 2: Basic Principles

Read on >>>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s