The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

In our previous material on cloud topics, we toldhow to protect IT resources in the public cloud and why traditional antiviruses are not quite suitable for these purposes. In this post, we will continue the topic of cloud security and talk about the evolution of WAF and what is better to choose: hardware, software or cloud. 

The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

What is WAF

More than 75% of hacker attacks target vulnerabilities in web applications and websites: such attacks are usually invisible to the information security infrastructure and services. Vulnerabilities in web applications carry, in turn, the risks of compromise and fraud of user accounts and personal data, passwords, credit card numbers. In addition, vulnerabilities in a website serve as an entry point for attackers into a corporate network.

Web Application Firewall (WAF) is a firewall that blocks attacks on web applications: SQL injection, cross-site scripting, remote code execution, brute force and authorization bypass (auth bypass). Including attacks using zero-day vulnerabilities. Application firewalls provide protection by monitoring web page content, including HTML, DHTML, and CSS, and filtering potentially malicious HTTP/HTTPS requests.

What were the first decisions?

The first attempts to create a Web Application Firewall were made in the early 90s. At least three engineers are known to have worked in this area. The first is computer science professor Jean Spafford of Purdue University. He described the proxy application firewall architecture and in 1991 published it in the book "UNIX Security in Practice".

The second and third were information security specialists William Cheswick and Marcus Ranum from Bell Labs. They developed one of the first prototype application firewalls. It was distributed by DEC - the product was released under the name SEAL (Secure External Access Link).

But SEAL was not a complete WAF solution. It was a classic network firewall with extended functionality - the ability to block attacks on FTP and RSH. For this reason, Perfecto Technologies (later Sanctum) is considered the first WAF solution today. In 1999 she presented AppShield system. At that time, Perfecto Technologies was developing information security solutions for e-commerce, and online stores became the target audience of their new product. AppShield was able to analyze HTTP requests and blocked attacks based on dynamic cybersecurity policies.

Around the same time as AppShield (in 2002), the first open source WAF appeared. They became ModSecurity. It was created to popularize WAF technologies and is still supported by the IT community (here is his repository on GitHub). ModSecurity blocks attacks on applications based on a standard set of regular expressions (signatures) - tools for checking requests against a pattern - OWASP Core Rule Set.

As a result, the developers managed to achieve their goal - new WAF solutions began to appear on the market, including those built on the basis of ModSecurity.

Three generations is history

It is customary to distinguish three generations of WAF systems that have evolved with the development of technology.

First generation. Works with regular expressions (or grammars). It includes ModSecurity. The system provider studies the types of attacks on applications and generates patterns that describe legitimate and potentially malicious requests. WAF consults these lists and decides what to do in a particular situation - to block traffic or not.

An example of regular expression based detection is the already mentioned project Core Rule Set open source. Another example - Naxsi, which is also open source. Systems with regular expressions have a number of disadvantages, in particular, when a new vulnerability is discovered, the administrator has to create additional rules manually. In the case of a large-scale IT infrastructure, there can be several thousand rules. Managing so many regular expressions is quite difficult, not to mention that checking them can slow down network performance.

Regular expressions also have a fairly high false positive rate. The famous linguist Noam Chomsky proposed a classification of grammars, in which he divided them into four conditional levels of complexity. According to this classification, regular expressions can only describe firewall rules that do not imply deviations from the pattern. This means that attackers can easily "trick" first-generation WAFs. One way to combat this is to add special characters to application requests that do not affect the logic of malicious data, but violate the signature rule.

The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

The second generation. Second-generation application firewalls have been developed to circumvent the performance and accuracy issues of WAFs. Parsers appeared in them, which are responsible for identifying strictly defined types of attacks (on HTML, JS, etc.). These parsers work with special tokens that describe requests (for example, variable, string, unknown, number). Potentially malicious sequences of tokens are placed in a separate list, which is regularly checked by the WAF system. For the first time this approach was shown at the Black Hat 2012 conference in the form of C / C ++ libinjection libraries, which allows you to detect SQL injections.

Compared to first generation WAFs, specialized parsers can run faster. However, they did not solve the difficulties associated with manually configuring the system when new malicious attacks appear.

The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

Third Generation. The evolution in the third generation detection logic is the use of machine learning methods that make it possible to bring the detection grammar as close as possible to the real SQL/HTML/JS grammar of the protected systems. This detection logic is able to adapt the Turing machine to cover recursively enumerable grammars. Moreover, earlier the task of creating an adaptable Turing machine was unsolvable until the first studies of neural Turing machines were published.

Machine learning provides the unique ability to tailor any grammar to cover any type of attack, without manually creating signature lists as required in first generation detection, and without developing new tokenizers/parsers for new attack types such as Memcached, Redis, Cassandra, SSRF implementations, as required by the second generation methodology.

Combining all three generations of detection logic, we can draw a new diagram in which the third generation of detection is represented by a red outline (Figure 3). One of the solutions that we are implementing in the cloud together with Onsec, the developer of the adaptive web application security platform and the Wallarm API, belongs to this generation.

The discovery logic now uses feedback from the bootstrapping application. In machine learning, this feedback loop is called β€œreinforcement.” Typically, there is one or more types of such reinforcement:

  • Analyze Application Response Behavior (Passive)
  • Scan/Fuzzer (active)
  • Report Files / Interceptor Procedures / Hooks (post factum)
  • Manual (determined by supervisor)

As a result, the third generation detection logic also solves the important problem of accuracy. It is now possible not only to avoid false positives and false negatives, but also to detect valid true negatives, such as detecting the use of the SQL command element in the control panel, loading web page templates, AJAX requests associated with JavaScript errors, and others.

The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

Next, we consider the technological capabilities of various WAF implementation options.

Hardware, software or cloud - what to choose?

One of the options for implementing application firewalls is an "iron" solution. Such systems are specialized computing devices that a company installs locally in its data center. But in this case, you have to purchase your own equipment and pay money to integrators for its configuration and debugging (if the company does not have its own IT department). At the same time, any equipment becomes obsolete and becomes unusable, so customers are forced to budget for hardware upgrades.

Another WAF deployment option is a software implementation. The solution is installed as an add-on for some software (for example, ModSecurity is configured on top of Apache) and works on the same server with it. As a rule, such solutions can be deployed both on a physical server and in the cloud. Their disadvantage is limited scalability and vendor support.

The third option is to set up WAF from the cloud. Such solutions are provided by cloud providers as a subscription service. The company does not need to purchase and configure specialized hardware, these tasks fall on the shoulders of the service provider. An important point - a modern cloud WAF does not imply the migration of resources to the provider's platform. The site can be deployed anywhere, even on-premise.

Why now they are increasingly looking towards cloud WAF, we will tell further.

What can WAF do in the cloud

In terms of technological capabilities:

  • The provider is responsible for updates.. WAF is provided on a subscription basis, so the service provider monitors the relevance of updates and licenses. Updates concern not only software, but also hardware. The provider upgrades the server park and maintains it. It is also responsible for load balancing and redundancy. If the WAF server fails, the traffic is immediately redirected to another machine. Rational traffic distribution allows you to avoid situations when the firewall enters fail open mode - it cannot cope with the load and stops filtering requests.
  • Virtual patching. Virtual patches restrict access to compromised parts of the application until the vulnerability is closed by the developer. As a result, the customer of the cloud provider gets the opportunity to calmly wait until the supplier of this or that software publishes official patches. Doing this as quickly as possible is a priority for the software vendor. For example, in the Valarm platform, a separate software module is responsible for virtual patching. The administrator can add custom regular expressions to block malicious requests. The system makes it possible to mark some requests with the "Confidential data" flag. Then their parameters are masked, and under no circumstances are they transferred outside the firewall working area.
  • Built-in perimeter and vulnerability scanner. This allows you to independently determine the network boundaries of the IT infrastructure using data from DNS queries and the WHOIS protocol. After WAF automatically analyzes the services and services running inside the perimeter (performs a port scan). The firewall is able to detect all common types of vulnerabilities - SQLi, XSS, XXE, etc. - and detect errors in software configuration, for example, unauthorized access to Git and BitBucket repositories and anonymous calls to Elasticsearch, Redis, MongoDB.
  • Attacks are monitored by cloud resources. As a rule, cloud providers have large amounts of computing power. This allows you to analyze threats with high accuracy and speed. A cluster of filter nodes is deployed in the cloud through which all traffic passes. These nodes block attacks on web applications and send statistics to the Analytics Center. It uses machine learning algorithms to update blocking rules for all protected applications. The implementation of such a scheme is shown in Fig. 4. Such adapted security rules minimize the number of false firewall positives.

The evolution of Web Application Firewall: from firewalls to cloud-based protection systems with machine learning

Now a little about the features of cloud WAFs in terms of organizational issues and management:

  • Transition to OpEx. In the case of cloud WAFs, the implementation cost will be zero, since the provider has already paid for all the hardware and licenses, the service is paid for by subscription.
  • Different tariff plans. The user of the cloud service can quickly enable or disable additional options. Functions are managed from a single control panel, which is also protected. It is accessed via HTTPS, plus there is a two-factor authentication mechanism based on the TOTP protocol (Time-based One-Time Password Algorithm).
  • DNS connection. You can change DNS and configure network routing yourself. To solve these problems, it is not necessary to recruit and train individual specialists. As a rule, the technical support of the provider can help with the configuration.

WAF technologies have evolved from simple firewalls with rules of thumb to complex protection systems with machine learning algorithms. Now application firewalls have a wide range of features that were difficult to implement in the 90s. In many ways, the emergence of new functionality has become possible thanks to cloud technologies. WAF solutions and their components continue to evolve. Just like other areas of information security.

The text was prepared by Alexander Karpuzikov, IS product development manager of #CloudMTS cloud provider.

Source: habr.com

Add a comment