Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

Structuring Unstructured Data with GROK

If you are using the Elastic Stack (ELK) and are interested in mapping custom Logstash logs to Elasticsearch, then this post is for you.

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

The ELK stack is an acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Together they form a log management platform.

  • Elasticsearch is a search and analytical system.
  • logstash is a server-side data processing pipeline that takes data from multiple sources at the same time, transforms it, and then sends it to a "stash" such as Elasticsearch.
  • kibana allows users to visualize data using charts and graphs in Elasticsearch.

Beats appeared later and is an easy data shipper. The introduction of Beats transformed the Elk Stack into an Elastic Stack, but that's not the point.

This article is about Grok, which is a feature in Logstash that can transform your logs before they are sent to the stash. For our purposes, I will only talk about processing data from Logstash to Elasticsearch.

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

Grok is a filter within Logstash that is used to parse unstructured data into something structured and queryable. It sits on top of a regular expression (regex) and uses text patterns to match strings in log files.

As we'll see in the following sections, using Grok goes a long way when it comes to efficient log management.

Without Grok, your log data is unstructured

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

Without Grok, when logs are sent from Logstash to Elasticsearch and rendered in Kibana, they only appear in the message value.

Querying meaningful information in this situation is difficult because all log data is stored in the same key. It would be better if the log messages were better organized.

Unstructured data from logs

localhost GET /v2/applink/5c2f4bb3e9fda1234edc64d 400 46ms 5bc6e716b5d6cb35fc9687c0

If you take a closer look at the raw data, you will see that it actually consists of different parts, each separated by a space.

For more experienced developers, you can probably guess what each of the parts means and what is the log message from the API call. The presentation of each item is set out below.

Structured view of our data

  • ​ localhost == environment
  • ​ GET == method
  • ​ /v2/applink/5c2f4bb3e9fda1234edc64d == url
  • 400 == response_status
  • ​ 46ms == response_time
  • ​ 5bc6e716b5d6cb35fc9687c0 == user_id

As we see in structured data, there is an order for unstructured logs. The next step is to programmatically process the raw data. That's where Grock shines.

Grok Templates

Built-in Grok Templates

Logstash comes with over 100 built-in templates for structuring unstructured data. You should definitely take advantage of this when possible for general syslogs like apache, linux, haproxy, aws and so on.

However, what happens when you have custom logs like in the example above? You must build your own Grok template.

Grok custom templates

Must try to build your own Grok template. I used Grok Debugger ΠΈ Grok Patterns.

Note that the syntax for Grok templates is as follows: %{SYNTAX:SEMANTIC}

The first thing I tried to do was go to the tab Discover in the Grok debugger. I thought it would be great if this tool could automatically generate the Grok pattern, but it wasn't very helpful as it only found two matches.

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

Using this discovery, I started building my own template on the Grok debugger using the syntax found on the Elastic Github page.

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

After playing around with different syntaxes, I was finally able to structure the log data the way I wanted.

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

Link to the Grok debugger https://grokdebug.herokuapp.com/

Original text:

localhost GET /v2/applink/5c2f4bb3e9fda1234edc64d 400 46ms 5bc6e716b5d6cb35fc9687c0

pattern:

%{WORD:environment} %{WORD:method} %{URIPATH:url} %{NUMBER:response_status} %{WORD:response_time} %{USERNAME:user_id}

What happened in the end

{
  "environment": [
    [
      "localhost"
    ]
  ],
  "method": [
    [
      "GET"
    ]
  ],
  "url": [
    [
      "/v2/applink/5c2f4bb3e9fda1234edc64d"
    ]
  ],
  "response_status": [
    [
      "400"
    ]
  ],
  "BASE10NUM": [
    [
      "400"
    ]
  ],
  "response_time": [
    [
      "46ms"
    ]
  ],
  "user_id": [
    [
      "5bc6e716b5d6cb35fc9687c0"
    ]
  ]
}

With the Grok template and mapped data in hand, the final step is to add it to Logstash.

Update the Logstash.conf configuration file

On the server where you installed the ELK stack, go to the Logstash configuration:

sudo vi /etc/logstash/conf.d/logstash.conf

Paste your changes.

input { 
  file {
    path => "/your_logs/*.log"
  }
}
filter{
  grok {
    match => { "message" => "%{WORD:environment} %{WORD:method} %{URIPATH:url} %{NUMBER:response_status} %{WORD:response_time} %{USERNAME:user_id}"}
  }
}
output {
  elasticsearch {
    hosts => [ "localhost:9200" ]
  }
}

After saving the changes, restart Logstash and check its status to make sure it's still running.

sudo service logstash restart
sudo service logstash status

Finally, to make sure the changes have taken effect, be sure to update the Elasticsearch index for Logstash in Kibana!

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

With Grok, your log data is structured!

Tips and tricks for converting unstructured data from logs to ELK Stack using GROK in Logstash

As we can see in the image above, Grok is able to automatically map log data to Elasticsearch. This makes it easier to manage logs and quickly query information. Instead of rummaging through log files to debug, you can simply filter out what you're looking for, such as an environment or a url.

Try giving Grok expressions a try! If you have another way to do this, or have any issues with the examples above, just drop a comment below to let me know.

Thanks for reading - and please follow me here on Medium for more interesting software engineering articles!

Resources

https://www.elastic.co/blog/do-you-grok-grok
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
https://grokdebug.herokuapp.com/

P.S Source reference

Telegram channel by Elasticsearch

Source: habr.com

Add a comment