This better be logged

One of the most difficult aspects of triaging any business situation is to get a complete picture of what is happening. Often in the technology world the response to this is "log everything", on the basis that if we at least have an audit trail we can piece together what occurred. The challenge of logging everything is that we then end up with an incredibly diverse collection of logs in numerous different formats scattered across many different machines.

Correlation required

To ask what actually happened at given moment, really we need our logs in one place in some kind of standard format that will allow us to search across them in their entirety. We want to be able to say things like 'Show me everything that happened at 10:03pm' or 'Show me everything concerning customer id 1103'. This is where Logstash filters come in. If we push all of our logs to one central place we can "filter" them as they arrive and normalise them into a standard format so that we can easily search across all our logs. Let's take a look at some Logstash filters and see when we might use them

date filter

An incredibly important filter to make use of. We want the timestamp of the log to be based upon when it occurred, not when it was received. Using this filter you can pick the timestamp out of your log like so

date {     match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z"] }

In the absense of this filter the timestamp will be based on when Logstash received the event. Making using of this filter is especially important when backfilling old logs into your platform.

grok filter

The goto filter for unstructured logs. Many logs come simply as arbitrary text. To give us the most visibility in our searching we'd like to break this text down into it's constituent parts. Let's take an apache access log for example, which looks something like this

127.0.0.1 - frank [10/Oct/2016:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/32.08 [en] (Win10; I ;Nav)"

As there are many common widely used logs like this Logstash comes with over 100 built in grok patterns. We can break this down very easily

grok {
    match => ["message", "%{COMBINEDAPACHELOG}"]
}

grok works by combining text patterns into something that matches your logs. In this case the pattern that's actually being ran is

COMBINEDAPACHELOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)%{QS:referrer} %{QS:agent}

Now when you come to search instead of simply having one field with the entire log in, we would have every part of the log in it's own field, so, for example, we could easily visualise on a graph the number of 2xx responses that were thrown in any given period, or highlight a particular client ip that was creating a disproportional amount of traffic.

geoip filter

An invaluable filter for understanding traffic sources. geoip can take an ip address and return a latitude and longitude. If we continue with our apache access log example we can add geographical information based on the client ip we picked out

geoip { source => "clientip" }

Now we can understand where our traffic is coming from and have the capacity to make our searches region specific.

prune filter

The filter for cleansing data. Many logs can contain information that we don't want to be stored. Whether this is financial data or personal information, we typically want to strip this before storing our logs. Let's take the example of deleting passwords from our logs. With prune we can create a blacklist based on field name

prune {    blacklist_names => [ "password" ]}

prune will now remove any field beginning with password. This is particularly powerful when we are logging nested objects and we don't know ahead of time at what depth the password fields will be.

There's so much more

This was just a basic introduction to some of the most common Logstash filters. There are many more available and this is just the start of what you can do with them.

At Logit we regularly provide consultancy on writing Logstash filters for our customers to get their data in a format which allows them to run the searches they are interested in. If you’d like to work with us and tap into our expertise do contact us on social media, live chat or talk to our incredible support team.