Analysis of encrypted traffic without decrypting it

A system for analyzing traffic without decrypting it. This method is simply called “machine learning”. It turned out that if a very large amount of various traffic is fed into the input of a special classifier, the system can detect the actions of malicious code inside encrypted traffic with a very high degree of probability.

Analysis of encrypted traffic without decrypting it

Network threats have changed and become smarter. Recently, the very concept of attack and defense has changed. The number of events in the network has increased significantly. Attacks have become more sophisticated, and hackers have a wider reach of attacks.

According to Cisco statistics, over the past year, cybercriminals have tripled the number of malicious programs that use encryption for their activities, or rather, to hide it. It is known from theory that the "correct" encryption algorithm cannot be cracked. In order to understand what is hidden inside the encrypted traffic, you must either decrypt it knowing the key, or try to decrypt it with various tricks, or by hacking in the forehead, or using some kind of vulnerabilities in cryptographic protocols.

Analysis of encrypted traffic without decrypting it
The picture of online threats of our time

Machine learning

Get to know the technology! Before talking about how the technology of decryption based on machine learning works, it is necessary to understand how the technology of neural networks works.

Machine learning is an extensive subsection of artificial intelligence that studies methods for building algorithms that can learn. This science is aimed at creating mathematical models of "training" a computer. The purpose of learning is to predict something. In human understanding, we call this process the word "wisdom". Wisdom is manifested in people who have lived for quite a few years (a child of 2 years cannot be wise). Turning to senior comrades for advice, we give them some information about the event (input data) and ask for help. They, in turn, remember all situations from life that are somehow related to your problem (knowledge base) and, based on this knowledge (data), give us a kind of prediction (advice). This type of advice began to be called a prediction due to the fact that the person giving advice does not know for sure what will happen, but only assumes. Life experience shows that a person may be right, or may be wrong.

You should not compare neural networks with the branching algorithm (if-else). They are different things and there are key differences between them. The branching algorithm has a clear “understanding” of what to do. I will demonstrate with examples.

Task. Determine the braking distance of the car by its make and year of manufacture.

An example of the branching algorithm. If the car is brand 1 and was released in 2012, its stopping distance is 10 meters, otherwise, if the car is brand 2 and was released in 2011, and so on.

An example of a neural network. We collect data on the braking distances of cars for the last 20 years. By make and year, we compile a table of the form “make-year of manufacture-braking distance”. We issue this neural network table and start learning it. Training is carried out as follows: we feed data into the neural network, but without a braking path. The neuron tries to predict what the braking distance will be, based on the table loaded into it. Predicts something and asks the user "Am I right?" Before asking, she creates a fourth column, the guessing column. If she is right, then she writes 1 in the fourth column, if she is wrong - 0. The neural network goes to the next event (even if it was wrong). This is how the network learns, and when the training is completed (a certain convergence criterion is reached), we submit data about the car of interest to us and finally get the answer.

To remove the question about the convergence criterion, I will explain that this is a mathematically derived formula for statistics. A vivid example of 2 different convergence formulas. Red - binary convergence, blue - normal convergence.

Analysis of encrypted traffic without decrypting it
Binomial and normal probability distributions

To make it clearer, ask the question "What is the probability of meeting a dinosaur?". There are 2 answers here. Option 1 - very small (blue graph). Option 2 - either a meeting or not (red graph).

Of course, a computer is not a person and it learns differently. There are 2 types of iron horse training - precedent learning и deductive learning.

Precedent learning is a way of learning from mathematical laws. Mathematicians collect tables of statistics, draw conclusions and load the result into the neural network - a formula for calculation.

Deductive learning - learning takes place entirely in the neuron (from data collection to analysis). Here a table is formed without a formula, but with statistics.

A broad overview of the technology would take a couple of dozen more articles. For now, this is enough for a common understanding.

Neuroplasticity

In biology, there is such a thing as neuroplasticity. Neuroplasticity is the ability of neurons (brain cells) to act “according to the situation”. For example, a person who has lost his sight hears sounds better, smells and feels objects. This is due to the fact that the part of the brain (part of neurons) responsible for vision redistributes its work to other functionality.

A vivid example of neuroplasticity in life is the BrainPort lollipop.

In 2009, the University of Wisconsin-Madison announced the release of a new device that develops the ideas of "language display" - it was called BrainPort. BrainPort works according to the following algorithm: the video signal comes from the camera to the processor, which controls the zoom, brightness and other picture parameters. It also converts digital signals into electrical impulses, effectively taking over the functions of the retina.

Analysis of encrypted traffic without decrypting it
BrainPort lollipop with glasses and camera

Analysis of encrypted traffic without decrypting it
BrainPort at work

Also with a computer. If the neural network senses a change in the process, it adapts to it. This is the key plus of neural networks compared to other algorithms - autonomy. A kind of humanity.

Encrypted Traffic Analytics

Encrypted Traffic Analytics is part of the Stealthwatch system. Stealthwatch is Cisco's pioneering security monitoring and analytics solution that leverages corporate telemetry data from existing network infrastructure.

Stealthwatch Enterprise is based on the Flow Rate License, Flow Collector, Management Console and Flow Sensor tools.

Analysis of encrypted traffic without decrypting it
Cisco Stealthwatch Interface

The problem with encryption has become very acute due to the fact that much more traffic has been encrypted. Previously, only the code was encrypted (mostly), now all traffic is encrypted, and it has become much more difficult to separate “clean” data from viruses. A prime example is WannaCry, which used Tor to hide its presence on the network.

Analysis of encrypted traffic without decrypting it
Visualization of the growth in the number of traffic encryption in the network

Analysis of encrypted traffic without decrypting it
Encryption in macroeconomics

The Encrypted Traffic Analytics (ETA) system is necessary just to work with encrypted traffic without decrypting it. Attackers are smart and use cryptographic encryption algorithms and cracking them is not only a problem, but also a huge cost for organizations.

The system works as follows. Any traffic comes to the company. It falls into TLS (transport layer security - transport layer security protocol). Let's say the traffic is encrypted. We are trying to answer a number of questions about what kind of connection was established.

Analysis of encrypted traffic without decrypting it
How the Encrypted Traffic Analytics (ETA) system works

To answer these questions we use machine learning in this system. Cisco studies are taken and, based on these studies, a table is created from 2 resulting ones - malicious and “good” traffic. Of course, we do not know for sure what kind of traffic entered the system directly at the current time, but we can trace the history of traffic both in the company and outside it by resorting to data from the world stage. At the exit from this stage, we get a huge table with data.

According to the result of the study, characteristic features are revealed - certain rules that can be written in mathematical form. These rules will vary greatly according to various criteria - the size of the transferred files, the type of connection, the country from which this traffic comes, etc. As a result of the work, a huge table turned into a set of heaps of formulas. There are fewer of them, but this is not enough for comfortable work.

Further, the technology of machine learning is applied - the convergence of the formula, and according to the result of the convergence, we get a trigger - a switch, where, when the data is output, we get a switch (flag) in a raised or lowered position.

The resulting stage is getting a set of triggers that covered 99% of the traffic.

Analysis of encrypted traffic without decrypting it
Stages of checking traffic in ETA

As a result of the work, another problem is solved - an attack from within. No more people in the middle filtering traffic manually (I'm drowning myself at this point). Firstly, you no longer need to spend a lot of money on a competent system administrator (I continue to drown myself). Secondly, there is no danger of hacking from the inside (at least partially).

Analysis of encrypted traffic without decrypting it
Outdated concept of Man-in-the-Middle

Now, let's see what the system is based on.

The system works on 4 communication protocols: TCP / IP - Internet data transfer protocol, DNS - domain name server, TLS - transport layer security protocol, SPLT (SpaceWire Physical Layer Tester) - physical layer tester.

Analysis of encrypted traffic without decrypting it
Protocols that work with ETA

Comparison goes by comparing data. The TCP / IP protocols check the reputation of sites (visit history, the purpose of creating a site, etc.), thanks to the DNS protocol, we can discard "bad" site addresses. The TLS protocol works with the "fingerprint" of the site (fingerprint) and checks the site against the computer emergency response team (cert). The last stage of the connection test is the physical layer test. The details of the work of this stage are not specified, but the meaning is as follows: checking the sinusoids and cosine curves of data transmission curves on oscillographic installations, i.e. due to the structure of the request at the physical layer, we determine the purpose of the connection.

As a result of the system, we can get data from encrypted traffic. By examining packets, we can read as much information as possible from the unencrypted fields in the packet itself. With the help of physical layer inspection of a package, we find out the characteristics of the package (partially or completely). Also, do not forget about the reputation of sites. If the request came from some .onion source, don't trust it. For ease of working with this kind of data, a risk map has been created.

Analysis of encrypted traffic without decrypting it
Result of ETA

And everything seems to be fine, but let's talk about network deployment.

Physical implementation of ETA

There are a number of nuances and subtleties here. First, when creating such
networks with high-level software, data collection is required. Collect data manually
wildly, and implementing a response system is already more interesting. Second, data
there should be a lot, which means that the installed network sensors should work
not only autonomously, but also in a finely tuned mode, which gives a number of difficulties.

Analysis of encrypted traffic without decrypting it
Sensors and Stealthwatch System

Installing a sensor is one thing, but setting it up is a completely different task. To configure sensors, there is a complex operating according to the following topology - ISR = router with the integration of Cisco services (Cisco Integrated Services Router); ASR = Cisco Aggregation Services Router; CSR = Cisco Cloud Services Router; WLC = Cisco Wireless LAN Controller; IE = Cisco Industrial Ethernet Switch (Cisco Industrial Ethernet); ASA = Cisco Adaptive Security Appliance; FTD = Cisco Firepower Threat Defense; WSA = Web Security Appliance; ISE = Identity Services Engine

Analysis of encrypted traffic without decrypting it
Comprehensive monitoring taking into account any telemetry data

Network administrators get arrhythmia from the number of words "Cisco" in the previous paragraph. The price of this miracle is rather big, but not about that today ...

The hacker behavior will be modeled as follows. Stealthwatch carefully monitors the activity of every device on the network and is able to create a pattern of normal behavior. In addition, this solution provides a deep understanding of known inappropriate behavior. This solution uses about 100 different analysis algorithms or heuristics that deal with different types of traffic behavior, such as scanning, sending alarm frames from the host, brute-force login, suspected data capture, suspected data leakage, etc. . The listed security events fall under the high-level logical alarm category. Some security events can also trigger an alarm on their own. Thus, the system is able to correlate numerous isolated anomalous incidents and collect them together to determine the possible type of attack, as well as tie it to a specific device and user (Figure 2). In the future, the incident can be studied in dynamics and taking into account the associated telemetry data. This constitutes contextual information at its best. Doctors examining a patient to understand what's wrong don't look at the symptoms in isolation. They look at the big picture to make a diagnosis. Likewise, Stealthwatch captures every anomalous activity on the network and analyzes it holistically to send contextual alerts, helping security professionals prioritize risks.

Analysis of encrypted traffic without decrypting it
Anomaly Detection with Behavior Modeling

The physical deployment of the network looks like this:

Analysis of encrypted traffic without decrypting it
Branch network deployment option (simplified)

Analysis of encrypted traffic without decrypting it
Branch network deployment option

The network is deployed, but there remains an open question about the neuron. They organized a data transmission network, installed sensors on thresholds and launched an information collection system, but the neuron did not take part in the matter. Bye.

Multilayer neural network

The system analyzes user and device behavior to detect malware infections, communications with command and control servers, data leaks, and potentially unwanted applications running on the organization's infrastructure. There are multiple layers of data processing where a combination of artificial intelligence, machine learning, and mathematical statistics help the network learn its normal activity so that it can detect malicious activity.

The network security analysis pipeline, which collects telemetry data from all parts of the extended network, including encrypted traffic, is a unique feature of Stealthwatch. It builds an understanding of what is "anomalous" in stages, then categorizes the actual individual elements of "threat activity" and finally makes a final decision as to whether the device or user was in fact compromised. The ability to piece together the small pieces that together form the evidence for the final decision to compromise an object comes from very careful analysis and correlation.

This ability is important because a typical enterprise can receive a huge number of alarms every day, and it is impossible to investigate each of them - the resources of security specialists are limited. The machine learning module processes a huge amount of information in near real time to identify critical incidents with a high level of confidence, and is also able to suggest a clear course of action for rapid resolution.

Let's take a closer look at the numerous machine learning methods used by Stealthwatch. When an incident is submitted to the Stealthwatch Machine Learning Module, it passes through a security analysis funnel that uses a combination of supervised and unsupervised machine learning techniques.

Analysis of encrypted traffic without decrypting it
Capabilities of multilevel machine learning

1 level. Anomaly Detection and Trust Modeling

At this level, 99% of the traffic is dropped by statistical anomaly detectors. These sensors together form complex models of what is normal and what is, on the contrary, abnormal. However, anomalous is not necessarily malicious. A lot of what's happening on your network has nothing to do with the threat - it's just weird. It is important to classify such processes regardless of threatening behavior. For this reason, the results of such detectors are further analyzed to capture strange behavior that can be explained and trusted. Ultimately, only a small fraction of the most important flows and requests move to levels 2 and 3. Without the use of such machine learning methods, the operational cost of separating the signal from noise would be too high.

Anomaly detection. The first step in anomaly detection uses statistical machine learning techniques to separate statistically normal traffic from anomalous traffic. Over 70 separate detectors process the telemetry data collected by Stealthwatch about the traffic that passes through your network perimeter, separating internal Domain Name System (DNS) traffic and proxy server data, if any. Each request is processed by more than 70 detectors, with each detector using its own statistical algorithm, forming an estimate of the detected anomalies. These scores are combined and several statistical methods are used to produce a single score for each individual query. This cumulative score is then used to separate normal and abnormal traffic.

Trust Modeling. Further, similar requests are grouped, and the cumulative anomaly score for such groups is determined as a long-term average. Over time, more requests are analyzed to determine the long-term average, reducing false positives and false negatives. The results of the trust modeling are used to select a subset of traffic whose anomaly score exceeds some dynamically defined threshold to move it to the next level of processing.

2nd level. Event classification and object modeling

At this level, the results obtained at the previous stages are classified and assigned to specific malicious events. Events are classified according to the value assigned by the machine learning classifiers to ensure a consistent accuracy rate above 90%. Among them:

  • linear models based on the Neyman-Pearson lemma (normal distribution law from the graph at the beginning of the article)
  • support vector machines using multivariate learning
  • neural networks and the "random forest" algorithm.

These isolated security events are then associated with a single endpoint over time. It is at this stage that the description of the threat is formed, on the basis of which a complete picture of how the corresponding attacker managed to achieve certain results is created.

Event classification. The statistically anomalous subset from the previous level is distributed into 100 or more categories using classifiers. Most classifiers are based on individual behavior, group relationships, or behavior on a global or local scale, while others can be very specific. For example, the classifier might indicate C&C traffic, a suspicious extension, or an unauthorized software update. Based on the results of this stage, a set of anomalous events in the security system, classified into certain categories, is formed.

Object Modeling. If the amount of evidence in support of the hypothesis of the harmfulness of a particular object exceeds the materiality threshold, a threat is determined. Relevant events that influenced the definition of a threat are associated with such a threat and become part of a discrete long-term model of the object. As evidence accumulates over time, the system identifies new threats when the materiality threshold is reached. This threshold is dynamic and intelligently adjusted based on the risk level of the threat and other factors. After that, the threat appears on the information panel of the web interface and is transferred to the next level.

3rd level. Relationship Modeling

The purpose of relationship modeling is to synthesize the results obtained at the previous levels from a global point of view, taking into account not only the local but also the global context of the corresponding incident. It is at this stage that you can determine how many organizations have experienced such an attack in order to understand whether it was directed specifically at you or is part of a global campaign, and you just fell into the hands.

Incidents are confirmed or discovered. A confirmed incident implies 99 to 100% certainty, as the relevant methods and tools have previously been observed in action on a larger (global) scale. Detected incidents are unique to you and form part of a highly targeted campaign. Past results are communicated with a known course of action, saving you time and resources when responding. They are provided along with the investigative tools you need to understand who attacked you and how targeted the campaign is to your digital business. As you can imagine, the number of confirmed incidents far exceeds the number of detected ones for the simple reason that confirmed incidents are not costly for attackers, while detected incidents are costly.
expensive as they have to be new and customized. By creating an opportunity to identify confirmed incidents, the game's economy has finally shifted in favor of the defenders, giving them a certain advantage.

Analysis of encrypted traffic without decrypting it
Multilevel training of a neural connection system based on ETA

Global Risk Map

The global risk map is created from analysis applied by machine learning algorithms to one of the largest datasets of its kind in the industry. It provides extensive statistics about the behavior of servers on the Internet, even if they are unknown. Such servers are associated with attacks and may be activated or used as part of an attack in the future. This is not a "blacklist", but a comprehensive picture of the server in question from a security point of view. This contextualized information about the activity of said servers allows Stealthwatch's detectors and machine learning classifiers to accurately predict the level of risk associated with communications with such servers.

View available cards here.

Analysis of encrypted traffic without decrypting it
World map showing 460 million IP addresses

Now the network is learning and defending your network.

Finally found a panacea?

Unfortunately, no. From experience with the system, I can say that there are 2 global problems.

Problem 1. Price. The entire network is deployed on a Cisco system. This is both good and bad. The good side is that you don't have to bother and put a bunch of plugs like D-Link, MikroTik, etc. The downside is the huge cost of the system. Considering the economic state of Russian business, at the current time only a wealthy owner of a large company or bank can afford this miracle.

Problem 2. Training. I did not write in the article the training period for the neural network, but not because it does not exist, but because it is learning all the time and we cannot predict when it will learn. Of course, there are tools of mathematical statistics (take the same formulation of Pearson's convergence criterion), but these are half measures. We get the probability of filtering traffic, and even then, provided that the attack is already mastered and known.

Despite these 2 problems, we have made a big leap in the development of information security in general and network protection in particular. This fact can be motivating to study network technologies and neural networks, which are now a very promising direction.

Source: habr.com

Add a comment