Logstash provides an abstraction layer for complex regexps called grok. It allows you to convert "streams" into fields (chunks of information) that may be queried later. We are going to parse nginx access logs using grok.
Hands on
In my case, I had to start learning GROK (grok debugger helped me a lot) in order to parse an nginx ( 0.7.67-3 on squeeze ) access log. So here is the logs file format (taken from slicehost)
If you need some more detail on nginx file format and so, check this.
Here is an example nginx log file:
Now create a directory called "patterns" and create a file "nginx.grok", and put the following pattern there:
Here is an example nginx log file:
Now create a directory called "patterns" and create a file "nginx.grok", and put the following pattern there:
I used the same var names used on the nginx site to create the field names. So it will be easier to track the information into your output.
Now lets put all this to work into logstash:
And here is the resulting json data as shown in the stdout:
The date
Something really important to remember its that logstash stores all the events on GMT using "@timestamp". We sent the event on the "time_local" field, and using the "date" filter we told logstash to use that field as its timestamp.
You may ask, Why do logstash changes my events time? The reason its simple, it will allow us to make event correlation among boxes on different timezones.
Here is what I sent to logstash on nginx $local_time format:
"time_local":["16/Feb/2013:12:30:20 -0430"]
Here is what logstash stored on ISO8601 format:
"@timestamp":"2013-02-16T17:00:20.000Z"
As you may notice, there is a time difference of 4 hours and 30 minutes. And the reason is that the linux box time zone is "America/Caracas", which is -4:30 from GTM.
I hope this may be useful for you, enjoy!
Now lets put all this to work into logstash:
And here is the resulting json data as shown in the stdout:
The date
Something really important to remember its that logstash stores all the events on GMT using "@timestamp". We sent the event on the "time_local" field, and using the "date" filter we told logstash to use that field as its timestamp.
You may ask, Why do logstash changes my events time? The reason its simple, it will allow us to make event correlation among boxes on different timezones.
Here is what I sent to logstash on nginx $local_time format:
"time_local":["16/Feb/2013:12:30:20 -0430"]
Here is what logstash stored on ISO8601 format:
"@timestamp":"2013-02-16T17:00:20.000Z"
As you may notice, there is a time difference of 4 hours and 30 minutes. And the reason is that the linux box time zone is "America/Caracas", which is -4:30 from GTM.
I hope this may be useful for you, enjoy!
How it will be for error log
ReplyDeleteBuen día, dónde trabajas? Qué puedes recomendar como lectura inicial a logstash? Estoy bastante interesado. Gracias.
ReplyDeleteHola David. Soy Venezolano y en el sector financiero. Mi recomendación es que compres el libro sobre logstash que escribió James Turnbull. Yo lo tengo la verdad me ha ayudado muchísimo:
Deletehttp://www.logstashbook.com/
El tutorial básico del sitio de logstash es bastante bueno:
http://logstash.net/docs/1.1.13/tutorials/getting-started-simple
La otra opción es consultar el canal irc de logstash en freenode. Siempre está muy activo y la comunidad está muy dispuesta a apoyar a quienes están aprendiendo.
Y por supuesto si te puedo ayudar en algo, escríbeme y veré como echarte una mano.
y todavia mas actualizado : http://logstash.net/docs/1.2.2/tutorials/getting-started-simple ...
ReplyDelete