Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

Recall that the Elastic Stack is based on the non-relational database Elasticsearch, the Kibana web interface and data collectors (the most famous Logstash, various Beats, APM and others). One of the nice additions to the entire product stack listed is data analysis using machine learning algorithms. In the article we understand what these algorithms are. Please under cat.

Machine learning is a paid feature of the shareware Elastic Stack and is included in the X-Pack. To start using it, it is enough to activate a 30-day trial after installation. After the trial period has expired, you can request support to renew it or buy a subscription. The subscription price is calculated not on the amount of data, but on the number of nodes used. No, the amount of data affects, of course, the number of required nodes, but still this approach to licensing is more humane in relation to the company's budget. If there is no need for high performance, you can save money.

ML in the Elastic Stack is written in C++ and runs outside of the JVM in which Elasticsearch itself runs. That is, the process (by the way, it is called autodetect) consumes everything that the JVM does not swallow. On a demo stand, this is not so critical, but in a productive environment, it is important to allocate separate nodes for ML tasks.

Machine learning algorithms fall into two categories βˆ’ with a teacher ΠΈ without a teacher. In Elastic Stack, the algorithm is from the "unsupervised" category. By this link You can see the mathematical apparatus of machine learning algorithms.

The machine learning algorithm uses data stored in Elasticsearch indexes to perform analysis. You can create tasks for analysis both from the Kibana interface and through the API. If you do this through Kibana, then you do not need to know some things. For example, additional indexes that the algorithm uses during operation.

Additional indices used in the analysis process.ml-state - information about statistical models (analysis settings);
.ml-anomalies-* - results of ML algorithms;
.ml-notifications β€” notification settings based on analysis results.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

The data structure in the Elasticsearch database consists of indexes and the documents stored in them. Compared to a relational database, an index can be compared to a database schema, and a document to a record in a table. This comparison is conditional and is given to simplify the understanding of further material for those who have only heard about Elasticsearch.

The same functionality is available through the API as through the web interface, so for clarity and understanding of the concepts, we will show how to configure through Kibana. In the menu on the left there is a Machine Learning section, in which you can create a new job (Job). In the Kibana interface, it looks like the picture below. Now we will analyze each type of task and show the types of analysis that can be constructed here.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

Single Metric - analysis of one metric, Multi Metric - analysis of two or more metrics. In both cases, each metric is analyzed in an isolated environment, i.e. the algorithm does not take into account the behavior of metrics analyzed in parallel, as it might seem in the case of Multi Metric. To calculate taking into account the correlation of various metrics, you can apply the Population analysis. And Advanced is a fine-tuning of algorithms with additional options for certain tasks.

Single Metric

Analyzing changes in one single metric is the easiest thing to do here. After clicking on Create Job, the algorithm will look for anomalies.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

In the Aggregation you can choose the approach to search for anomalies. For example, when Min values ​​below the typical values ​​will be considered abnormal. Eat Max, High Mean, Low, Mean, Distinct and others. Description of all functions can be found here to register:.

In the Field the numeric field in the document is indicated, according to which we will carry out the analysis.

In the Bucket span - the granularity of the gaps on the timeline, according to which the analysis will be carried out. You can trust the automation or choose manually. The image below shows an example of too low granularity - you can miss the anomaly. With this setting, you can change the sensitivity of the algorithm to anomalies.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

The duration of the collected data is a key thing that affects the effectiveness of the analysis. During the analysis, the algorithm determines repeating gaps, calculates the confidence interval (baselines), and detects anomalies - atypical deviations from the usual behavior of the metric. Just for example:

Baselines for a small data segment:

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

When the algorithm has something to learn from, the baselines look like this:

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

After starting the task, the algorithm determines abnormal deviations from the norm and ranks them according to the probability of anomaly (the color of the corresponding label is indicated in brackets):

Warning (blue): less than 25
Minor (yellow): 25-50
Major (orange): 50-75
Critical (red): 75-100

The graph below shows an example with the found anomalies.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

Here you can see the number 94, which indicates the probability of an anomaly. It is clear that since the value is close to 100, then we have an anomaly. The column below the graph indicates a pejoratively small probability of 0.000063634% of the metric value appearing there.

In addition to searching for anomalies in Kibana, you can run forecasting. This is done elementarily and from the same representation with anomalies - the button Forecast in the upper right corner.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

The forecast is built up to 8 weeks ahead. Even if you really want to, you can no longer by design.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

In some situations, the forecast will be very useful, for example, when monitoring the user load on the infrastructure.

MultiMetric

Let's move on to the next ML feature in Elastic Stack - the analysis of several metrics in one batch. But this does not mean that the dependence of one metric on another will be analyzed. This is the same as Single Metric only with multiple metrics on one screen for easy comparison of the impact of one on another. We will talk about the analysis of the dependence of one metric on another in the Population part.

After clicking on the square with Multi Metric, a window with settings will appear. Let's dwell on them in more detail.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

First you need to select the fields for analysis and the aggregation of data on them. The aggregation options here are the same as for Single Metric (Max, High Mean, Low, Mean, Distinct and others). Further, the data, if desired, is divided into one of the fields (field Split data). In the example, we did it across the field OriginAirportID. Notice that the metrics graph on the right is now represented as a set of graphs.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

Field Key Fields (Influencers) directly affects the found anomalies. By default, there will always be at least one value, and you can add more. The algorithm will take into account the influence of these fields in the analysis and show the most "influential" values.

After launch, the following picture will appear in the Kibana interface.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

This is the so-called. heat map of anomalies for each field value OriginAirportIDwhich we indicated in Split data. As with Single Metric, the color indicates the level of outlier. It is convenient to do a similar analysis, for example, on workstations to track those where there are suspiciously many authorizations, etc. We have already written about suspicious events in the Windows EventLog, which can also be collected and analyzed here.

Below the heat map is a list of anomalies, from each you can go to the Single Metric view for detailed analysis.

Population

To look for anomalies among correlations between different metrics, the Elastic Stack has a specialized Population analysis. It is with the help of it that one can look for anomalous values ​​in the performance of a server in comparison with the rest when, for example, an increase in the number of requests to the target system.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

In this illustration, the Population field indicates the value that the analyzed metrics will refer to. In this case, this is the name of the process. As a result, we will see how the processor load of each of the processes influenced each other.

Please note that the plot of the analyzed data is different from the cases with Single Metric and Multi Metric. This is done in Kibana by design to improve the perception of the distribution of the values ​​of the analyzed data.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

The graph shows that the process behaved abnormally stress (by the way, generated by a special utility) on the server poipu, which influenced (or turned out to be an influencer) the occurrence of this anomaly.

Advanced

Analytics with fine tuning. With Advanced analysis in Kibana, additional settings appear. After clicking on the Advanced tile in the create menu, the following tabbed window appears. tab job details intentionally skipped, there are basic settings that are not directly related to the analysis setup.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

Π’ summary_count_field_name optionally, you can specify the name of the field from the documents containing the aggregated values. In this example, the number of events per minute. IN categorization_field_name the name of the value of the field from the document is indicated, which contains a certain variable value. By the mask on this field, you can split the analyzed data into subsets. Pay attention to the button Add detector in the previous illustration. Below is the result of clicking this button.

Understanding Machine Learning in the Elastic Stack (aka Elasticsearch, aka ELK)

Here is an additional block of settings for setting up the anomaly detector for a specific task. We plan to analyze specific use cases (especially security ones) in the following articles. For example, look one of the case studies. It is associated with the search for rarely occurring values ​​and is implemented rare function.

In the function you can select a specific function to search for anomalies. Except rare, there are a couple of interesting functions - time_of_day ΠΈ time_of_week. They reveal anomalies in the behavior of metrics throughout the day or week, respectively. Other Analysis Functions is in the documentation.

Π’ field_name the field of the document by which the analysis will be carried out is indicated. By_field_name can be used to separate the analysis results for each individual value of the document field specified here. If fill over_field_name we get the population-analysis, which we considered above. If you specify a value in partition_field_name, then separate baselines for each value will be calculated for this field of the document (for example, the name of the server or process on the server can act as a value). IN exclude_frequent can choose ALL or none, which would mean excluding (or including) commonly occurring document field values.

In the article, we tried to give the most concise idea of ​​​​the possibilities of machine learning in the Elastic Stack, there are still a lot of details behind the scenes. Tell us in the comments what cases you managed to solve with the help of Elastic Stack and for what tasks you use it. To contact us, you can use private messages on HabrΓ© or feedback form on the site.

Source: habr.com

Add a comment