9 Approaches for Anomaly Detection

В previous article we talked about time series forecasting. The logical continuation will be an article on the detection of anomalies.

Application

Anomaly detection is used in areas such as:

1) Equipment breakdown prediction

So, in 2010, Iranian centrifuges were attacked by the Stuxnet virus, which set a non-optimal mode of equipment operation and disabled part of the equipment due to accelerated wear.

If anomaly search algorithms were used on the equipment, the failure situation could have been avoided.

9 Approaches for Anomaly Detection

The search for anomalies in the operation of equipment is used not only in the nuclear industry, but also in metallurgy and the operation of aircraft turbines. And in other areas where the use of predictive diagnostics is cheaper than possible losses in the event of an unpredictable breakdown.

2) Prediction of fraudulent activities

If money is being withdrawn from the card you use in Podolsk in Albania, the transactions may need to be further verified.

3) Identification of abnormal consumer patterns

If some of the clients exhibit anomalous behavior, there may be a problem that you are not aware of.

4) Identification of abnormal demand and load

If sales in an FMCG store have fallen below the forecast confidence interval, it is worth finding the reason for what is happening.

Anomaly detection approaches

1) Single-Class Support Vector Machine One-Class SVM

It is suitable when the data in the training set follows a normal distribution, and in the test set it contains anomalies.

A single-class support vector machine constructs a non-linear surface around the origin. It is possible to set a cutoff limit for what data to consider as anomalous.

Based on the experience of our DATA4 team, One-Class SVM is the most commonly used algorithm for solving the anomaly search problem.

9 Approaches for Anomaly Detection

2) Isolate forest method

With the “random” way of constructing trees, outliers will fall into the leaves at early stages (at a small depth of the tree), i.e. outliers are easier to "isolate". The selection of anomalous values ​​occurs at the first iterations of the algorithm.

9 Approaches for Anomaly Detection

3) Elliptic envelope and statistical methods

Used when the data is normally distributed. The closer the measurement is to the tail of the mixture of distributions, the more anomalous the value.

Other statistical methods can also be attributed to this class.

9 Approaches for Anomaly Detection

9 Approaches for Anomaly Detection
Image from diakonov.org

4) Metric methods

Methods include algorithms such as k nearest neighbors, kth nearest neighbor, ABOD (angle-based outlier detection) or LOF (local outlier factor).

They are suitable if the distance between the values ​​in the signs is equivalent or normalized (so as not to measure the boa constrictor in parrots).

The k nearest neighbors algorithm assumes that normal values ​​are located in a certain region of multidimensional space, and the distance to anomalies will be greater than to the separating hyperplane.

9 Approaches for Anomaly Detection

5) Cluster methods

The essence of cluster methods is that if the value is removed from the cluster centers by more than a certain amount, the value can be considered anomalous.

The main thing is to use an algorithm that correctly clusters data, which depends on the specific task.

9 Approaches for Anomaly Detection

6) Principal Component Method

Suitable, where the directions of the greatest change in the variance stand out.

7) Algorithms based on time series forecasting

The idea is that if a value is out of the prediction confidence interval, the value is considered anomalous. To predict the time series, algorithms such as triple smoothing, S(ARIMA), boosting, etc. are used.

Time series forecasting algorithms were discussed in a previous article.

9 Approaches for Anomaly Detection

8) Supervised learning (regression, classification)

If the data allows, we use algorithms ranging from linear regression to recurrent networks. Let's measure the difference between the prediction and the actual value, and conclude how the data is out of the norm. It is important that the algorithm has sufficient generalizing ability, and the training sample does not contain anomalous values.

9) Model tests

Let us approach the anomaly search problem as a recommendation search problem. We decompose our feature matrix using SVD or factorization machines, and the values ​​in the new matrix, which differ significantly from the original ones, will be taken as anomalous.

9 Approaches for Anomaly Detection

Image from diakonov.org

Conclusion

In this article, we reviewed the main approaches to anomaly detection.

The search for anomalies can in many ways be called an art. There is no ideal algorithm or approach that solves all problems. More often, a set of methods is used to solve a specific case. Anomaly search is performed using single-class support vector machine, isolating forest, metric and cluster methods, as well as using principal components and time series forecasting.

If you know other methods, write about them in the comments to the article.

Source: habr.com

Add a comment