Practical application of ELK. Setting up logstash

Introduction

While deploying another system, we faced the need to process a large number of various logs. ELK was chosen as the instrument. This article will talk about our experience in setting up this stack.

We do not set a goal to describe all its capabilities, but we want to concentrate on solving practical problems. This is due to the fact that with a sufficiently large amount of documentation and ready-made images, there are a lot of pitfalls, at least we found them.

We deployed the stack through docker-compose. Moreover, we had a well-written docker-compose.yml that allowed us to raise the stack with almost no problems. And it seemed to us that victory was already close, now we’ll twist it a little to fit our needs and that’s it.

Unfortunately, an attempt to tune the system to receive and process logs from our application was not successful right off the bat. Therefore, we decided that it is worth studying each component separately, and then return to their connections.

So let's start with logstash.

Environment, deployment, running Logstash in a container

For deployment, we use docker-compose, the experiments described here were carried out on MacOS and Ubuntu 18.0.4.

The logstash image that we had in our original docker-compose.yml is docker.elastic.co/logstash/logstash:6.3.2

We will use it for experiments.

To run logstash, we wrote a separate docker-compose.yml. Of course, it was possible to launch the image from the command line, but after all, we solved a specific task, where everything from docker-compose is launched for us.

Briefly about configuration files

As follows from the description, logstash can be run as for one channel, in this case, it needs to transfer the *.conf file or for several channels, in which case it needs to transfer the pipelines.yml file, which, in turn, will refer to files .conf for each channel.
We took the second path. It seemed to us more versatile and scalable. Therefore, we created pipelines.yml, and made a pipelines directory in which we will put .conf files for each channel.

Inside the container there is another configuration file - logstash.yml. We do not touch it, we use it as is.

So our directory structure is:

Practical application of ELK. Setting up logstash

For the time being, we assume that this is tcp on port 5046 to receive input data, and we will use stdout for output.

Here is such a simple configuration for the first run. Because the initial task is to launch.

So we have this docker-compose.yml

version: '3'

networks:
  elk:

volumes:
  elasticsearch:
    driver: local

services:

  logstash:
    container_name: logstash_one_channel
    image: docker.elastic.co/logstash/logstash:6.3.2
    networks:
      	- elk
    ports:
      	- 5046:5046
    volumes:
      	- ./config/pipelines.yml:/usr/share/logstash/config/pipelines.yml:ro
	- ./config/pipelines:/usr/share/logstash/config/pipelines:ro

What do we see here?

  1. Networks and volumes were taken from the original docker-compose.yml (the one where the entire stack is launched) and I think that they do not greatly affect the overall picture here.
  2. We create one service (services) logstash, from the docker.elastic.co/logstash/logstash:6.3.2 image and give it the name logstash_one_channel.
  3. We are forwarding port 5046 inside the container, to the same internal port.
  4. We map our ./config/pipelines.yml pipe configuration file to the /usr/share/logstash/config/pipelines.yml file inside the container, where logstash will pick it up and make it read-only, just to be on the safe side.
  5. We map the ./config/pipelines directory, where we have the pipe configuration files, to the /usr/share/logstash/config/pipelines directory and also make it read-only.

Practical application of ELK. Setting up logstash

piping.yml file

- pipeline.id: HABR
  pipeline.workers: 1
  pipeline.batch.size: 1
  path.config: "./config/pipelines/habr_pipeline.conf"

It describes one channel with the HABR identifier and the path to its configuration file.

And finally the file "./config/pipelines/habr_pipeline.conf"

input {
  tcp {
    port => "5046"
   }
  }
filter {
  mutate {
    add_field => [ "habra_field", "Hello Habr" ]
    }
  }
output {
  stdout {
      
    }
  }

We will not go into its description for now, we try to run:

docker-compose up

What do we see?

The container has started. We can check its work:

echo '13123123123123123123123213123213' | nc localhost 5046

And we see the response in the container console:

Practical application of ELK. Setting up logstash

But at the same time, we also see:

logstash_one_channel | [2019-04-29T11:28:59,790][ERROR][logstash.licensechecker.licensereader] Unable to retrieve license information from license server {:message=>"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore ::ResolutionFailure]elasticsearch", ...

logstash_one_channel | [2019-04-29T11:28:59,894][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>".monitoring-logstash", :thread=>"# Β»}

logstash_one_channel | [2019-04-29T11:28:59,988][INFO ][logstash.agent ] Pipelines running {:count=>2, :running_pipelines=>[:HABR, :".monitoring-logstash"], :non_running_pipelines=>[ ]}
logstash_one_channel | [2019-04-29T11:29:00,015][ERROR][logstash.inputs.metrics ] X-Pack is installed on Logstash but not on Elasticsearch. Please install X-Pack on Elasticsearch to use the monitoring feature. Other features may be available.
logstash_one_channel | [2019-04-29T11:29:00,526][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
logstash_one_channel | [2019-04-29T11:29:04,478][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elasticsearch:9200/, :path=> "/"}
logstash_one_channel | [2019-04-29T11:29:04,487][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"elasticsearch:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::ResolutionFailure] elasticsearch"}
logstash_one_channel | [2019-04-29T11:29:04,704][INFO ][logstash.licensechecker.licensereader] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elasticsearch:9200/, :path=> "/"}
logstash_one_channel | [2019-04-29T11:29:04,710][WARN ][logstash.licensechecker.licensereader] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"elasticsearch:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::ResolutionFailure] elasticsearch"}

And our log crawls up all the time.

Here I highlighted in green the message that the pipeline started successfully, in red the error message and in yellow the message about trying to contact elasticsearch: 9200.
This happens due to the fact that in the logstash.conf included in the image, there is a check for the availability of elasticsearch. After all, logstash assumes that it works as part of the Elk stack, and we separated it.

You can work, but it's not convenient.

The solution is to disable this check via the XPACK_MONITORING_ENABLED environment variable.

Let's make a change to docker-compose.yml and run it again:

version: '3'

networks:
  elk:

volumes:
  elasticsearch:
    driver: local

services:

  logstash:
    container_name: logstash_one_channel
    image: docker.elastic.co/logstash/logstash:6.3.2
    networks:
      - elk
    environment:
      XPACK_MONITORING_ENABLED: "false"
    ports:
      - 5046:5046
   volumes:
      - ./config/pipelines.yml:/usr/share/logstash/config/pipelines.yml:ro
      - ./config/pipelines:/usr/share/logstash/config/pipelines:ro

Now, everything is fine. The container is ready for experiments.

We can type again in the adjacent console:

echo '13123123123123123123123213123213' | nc localhost 5046

And see:

logstash_one_channel | {
logstash_one_channel |         "message" => "13123123123123123123123213123213",
logstash_one_channel |      "@timestamp" => 2019-04-29T11:43:44.582Z,
logstash_one_channel |        "@version" => "1",
logstash_one_channel |     "habra_field" => "Hello Habr",
logstash_one_channel |            "host" => "gateway",
logstash_one_channel |            "port" => 49418
logstash_one_channel | }

Work within one channel

So we started. Now you can actually take the time to configure logstash directly. Let's not touch the pipelines.yml file for now, let's see what we can get by working with one channel.

I must say that the general principle of working with the channel configuration file is well described in the official manual, here here
If you want to read in Russian, then we used this one an article(but the query syntax is old there, you need to take this into account).

Let's go sequentially from the Input section. We have already seen the work on tcp. What else can be interesting here?

Test messages using heartbeat

There is such an interesting possibility to generate automatic test messages.
To do this, you need to include the heartbean plugin in the input section.

input {
  heartbeat {
    message => "HeartBeat!"
   }
  } 

We turn it on, we start receiving once a minute

logstash_one_channel | {
logstash_one_channel |      "@timestamp" => 2019-04-29T13:52:04.567Z,
logstash_one_channel |     "habra_field" => "Hello Habr",
logstash_one_channel |         "message" => "HeartBeat!",
logstash_one_channel |        "@version" => "1",
logstash_one_channel |            "host" => "a0667e5c57ec"
logstash_one_channel | }

We want to receive more often, we need to add the interval parameter.
This is how we will receive a message every 10 seconds.

input {
  heartbeat {
    message => "HeartBeat!"
    interval => 10
   }
  }

Getting data from a file

We also decided to look at the file mode. If it works fine with the file, then it is possible that no agent is required, well, at least for local use.

According to the description, the mode of operation should be similar to tail -f, i.e. reads newlines or, optionally, reads the entire file.

So what we want to get:

  1. We want to receive lines that are appended to one log file.
  2. We want to receive data that is written to several log files, while being able to separate what was received from where.
  3. We want to make sure that when logstash is restarted, it will not receive this data again.
  4. We want to check that if logstash is disabled, and data continues to be written to files, then when we run it, we will receive this data.

To conduct the experiment, let's add one more line to docker-compose.yml, opening the directory where we put the files.

version: '3'

networks:
  elk:

volumes:
  elasticsearch:
    driver: local

services:

  logstash:
    container_name: logstash_one_channel
    image: docker.elastic.co/logstash/logstash:6.3.2
    networks:
      - elk
    environment:
      XPACK_MONITORING_ENABLED: "false"
    ports:
      - 5046:5046
   volumes:
      - ./config/pipelines.yml:/usr/share/logstash/config/pipelines.yml:ro
      - ./config/pipelines:/usr/share/logstash/config/pipelines:ro
      - ./logs:/usr/share/logstash/input

And change the input section in habr_pipeline.conf

input {
  file {
    path => "/usr/share/logstash/input/*.log"
   }
  }

We start:

docker-compose up

To create and write log files, we will use the command:


echo '1' >> logs/number1.log

{
logstash_one_channel |            "host" => "ac2d4e3ef70f",
logstash_one_channel |     "habra_field" => "Hello Habr",
logstash_one_channel |      "@timestamp" => 2019-04-29T14:28:53.876Z,
logstash_one_channel |        "@version" => "1",
logstash_one_channel |         "message" => "1",
logstash_one_channel |            "path" => "/usr/share/logstash/input/number1.log"
logstash_one_channel | }

Yep, it works!

At the same time, we see that we have automatically added the path field. So in the future, we will be able to filter records by it.

Let's try again:

echo '2' >> logs/number1.log

{
logstash_one_channel |            "host" => "ac2d4e3ef70f",
logstash_one_channel |     "habra_field" => "Hello Habr",
logstash_one_channel |      "@timestamp" => 2019-04-29T14:28:59.906Z,
logstash_one_channel |        "@version" => "1",
logstash_one_channel |         "message" => "2",
logstash_one_channel |            "path" => "/usr/share/logstash/input/number1.log"
logstash_one_channel | }

And now to another file:

 echo '1' >> logs/number2.log

{
logstash_one_channel |            "host" => "ac2d4e3ef70f",
logstash_one_channel |     "habra_field" => "Hello Habr",
logstash_one_channel |      "@timestamp" => 2019-04-29T14:29:26.061Z,
logstash_one_channel |        "@version" => "1",
logstash_one_channel |         "message" => "1",
logstash_one_channel |            "path" => "/usr/share/logstash/input/number2.log"
logstash_one_channel | }

Great! The file was picked up, the path was specified correctly, everything is fine.

Stop logstash and restart. Let's wait. Silence. Those. We do not receive these records again.

And now the most daring experiment.

We put logstash and execute:

echo '3' >> logs/number2.log
echo '4' >> logs/number1.log

Run logstash again and see:

logstash_one_channel | {
logstash_one_channel |            "host" => "ac2d4e3ef70f",
logstash_one_channel |     "habra_field" => "Hello Habr",
logstash_one_channel |         "message" => "3",
logstash_one_channel |        "@version" => "1",
logstash_one_channel |            "path" => "/usr/share/logstash/input/number2.log",
logstash_one_channel |      "@timestamp" => 2019-04-29T14:48:50.589Z
logstash_one_channel | }
logstash_one_channel | {
logstash_one_channel |            "host" => "ac2d4e3ef70f",
logstash_one_channel |     "habra_field" => "Hello Habr",
logstash_one_channel |         "message" => "4",
logstash_one_channel |        "@version" => "1",
logstash_one_channel |            "path" => "/usr/share/logstash/input/number1.log",
logstash_one_channel |      "@timestamp" => 2019-04-29T14:48:50.856Z
logstash_one_channel | }

Hooray! Everything picked up.

But, it is necessary to warn about the following. If the logstash container is removed (docker stop logstash_one_channel && docker rm logstash_one_channel), nothing will be picked up. The position of the file up to which it was read was stored inside the container. If you start from scratch, then it will only accept new lines.

Reading existing files

Let's say we are running logstash for the first time, but we already have logs and we would like to process them.
If we run logstash with the input section we used above, we won't get anything. Only newlines will be processed by logstash.

In order to pull lines from existing files, add an additional line to the input section:

input {
  file {
    start_position => "beginning"
    path => "/usr/share/logstash/input/*.log"
   }
  }

Moreover, there is a nuance, this only affects new files that logstash has not yet seen. For the same files that were already in the field of view of logstash, it has already remembered their size and will now take only new records in them.

Let's stop on this by studying the input section. There are many more options, but for now, we have enough for further experiments.

Routing and data transformation

Let's try to solve the following problem, let's say we have messages from one channel, some of them are informational, and some are error messages. They differ by tag. Some are INFO, others are ERROR.

We need to separate them at the exit. Those. We write informational messages in one channel, and error messages in another.

To do this, go from the input section to filter and output.

Using the filter section, we will parse the incoming message, getting a hash (key-value pairs) from it, with which we can already work, i.e. parse according to the conditions. And in the output section, we will select messages and send each one to its own channel.

Parsing a message with grok

In order to parse text strings and get a set of fields from them, there is a special plugin in the filter section - grok.

Without setting myself the goal of giving a detailed description of it here (for this I refer to official documentation), I will give my simple example.

To do this, you need to decide on the format of the input lines. I have them like this:

1 INFO message1
2 ERROR message2

Those. Identifier first, then INFO/ERROR, then some word without spaces.
Not difficult, but enough to understand the principle of operation.

So, in the filter section, in the grok plugin, we need to define a pattern for parsing our strings.

It will look like this:

filter {
  grok {
    match => { "message" => ["%{INT:message_id} %{LOGLEVEL:message_type} %{WORD:message_text}"] }
   }
  } 

Basically, it's a regular expression. Ready-made patterns are used, such as INT, LOGLEVEL, WORD. Their description, as well as other patterns, can be viewed here. here

Now, passing through this filter, our string will turn into a hash of three fields: message_id, message_type, message_text.

They will be displayed in the output section.

Routing messages in the output section with the if command

In the output section, as we remember, we were going to split the messages into two streams. Some - which are iNFO, we will output to the console, and with errors, we will output to a file.

How can we share these messages? The condition of the problem already suggests a solution - after all, we already have a dedicated message_type field, which can take only two values ​​INFO and ERROR. It is on it that we will make a choice using the if statement.

if [message_type] == "ERROR" {
        # Π—Π΄Π΅ΡΡŒ Π²Ρ‹Π²ΠΎΠ΄ΠΈΠΌ Π² Ρ„Π°ΠΉΠ»
       } else
     {
      # Π—Π΄Π΅ΡΡŒ Π²Ρ‹Π²ΠΎΠ΄ΠΈΠΌ Π² stdout
    }

Description of work with fields and operators can be found in this section official manual.

Now, about the conclusion itself.

Console output, everything is clear here - stdout {}

But the output to the file - remember that we are running all this from the container and in order for the file in which we write the result to be accessible from the outside, we need to open this directory in docker-compose.yml.

Total:

The output section of our file looks like this:


output {
  if [message_type] == "ERROR" {
    file {
          path => "/usr/share/logstash/output/test.log"
          codec => line { format => "custom format: %{message}"}
         }
    } else
     {stdout {
             }
     }
  }

Add one more volume to docker-compose.yml for output:

version: '3'

networks:
  elk:

volumes:
  elasticsearch:
    driver: local

services:

  logstash:
    container_name: logstash_one_channel
    image: docker.elastic.co/logstash/logstash:6.3.2
    networks:
      - elk
    environment:
      XPACK_MONITORING_ENABLED: "false"
    ports:
      - 5046:5046
   volumes:
      - ./config/pipelines.yml:/usr/share/logstash/config/pipelines.yml:ro
      - ./config/pipelines:/usr/share/logstash/config/pipelines:ro
      - ./logs:/usr/share/logstash/input
      - ./output:/usr/share/logstash/output

We start, we try, we see the division into two streams.

Source: habr.com

Add a comment