Sunday, June 2, 2013

Logstash Configuration

Most of the projects that we work on have a logging module , which keeps writing data into the log files and over a period of time , these log files grow huge in size and will be either archived or deleted. I am not sure how many people analyze their logs to see if the same exception / error is thrown again , was there a pattern to the exception being thrown ? when was this exception previously thrown ? In fact its not limited to exceptions alone , but user actions which are captured in the logs.

As a part of a project that I was working on an open source tool for managing events and logs called as Logstash. You can use it to collect logs, parse them, and store them for later use. It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way. (More info here)

In this blog I am going to explain about the configuration of logstash.

You need to have a JRE installed for running logstash

You need to have an Elastic Search server running (will be using an embedded server for the purpose of this post)

You can download the latest version of logstash (1.1.13) as of this post from the link here

You need to define a config file for running logstash (logstash.conf)

The format of the config file is as described below

# This is a comment. You should use comments to describe
# parts of your configuration.
input {
  ...
}

filter {
  ...
}

output {
  ...
}

You can define your input section by pointing it to the log file(s) which needs to be monitored

input {
  file {
        type => "tomcat"
        path => "/home/tomcat7/logs/catalina.out"
  }
  file {
        type => "application"
        path => "/home/apps/logs/app.log"
  }
}

In the filter section you can define the patterns for logstash to identify the patterns within the log file and parse them. There are a standard set of regular expression patterns available to match the standard date , time , month , loglevel etc .. called as grok patterns (Details here ). Now using these , we can setup grok patterns to match the log records in a log file.
For eg , To match the log records from tomcat's catalina , we can use this pattern

filter{
grok {
         pattern => ["(?m)(?<logdate>%{MONTH} %{MONTHDAY}, %{YEAR} %{DATA} [AP]{1}M{1}) %{NOTSPACE:package} %{WORD:method}.*%{LOGLEVEL:loglevel}: %{GREEDYDATA:message}"]
}
}

So with this filter definition , the following log message

May 31, 2013 9:24:24 AM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.35

can be broken down as

logdate => May 31, 2013 9:24:24 AM
package =>  org.apache.catalina.core.StandardEngine
method=> startInternal
loglevel=>INFO
message=> Starting Servlet Engine: Apache Tomcat/7.0.35

Now this log message represent one log record and there can be multiple such records , you can now define an output filter to push these records into Elastic Search.

Now once the logs have been moved to ES , you can run ES queries to search for logs based on keywords , loglevel , date range etc ...

output {
  elasticsearch {
       embedded => true
  }
}

And once you have finished editing the config file , you just need to move the logstash jar and config file into the same folder and start logstash from the same folder by running the following command


java -jar logstash-1.1.13-flat.jar agent -f logstash.conf
In order to enable verbose output while logstash is running , you can use -v or -vv (based on how verbose you need it to be )       




No comments:

Post a Comment