Two weeks after the last global outage, yesterday, Cloudflare's content delivery network, which handles approximately 20% of all global web traffic, experienced a partial outage for 25 minutes. During the incident, approximately a third of Cloudflare requests returned a blank page with a 500 error code. This time, the cause was a long-overdue issue in the Lua code used by the WAF (Web Application Firewall) traffic filtering system to block malicious requests.

To protect customer systems from a critical vulnerability (CVE-2025-55182) in the server components of the React framework, following the public release of an exploit, Cloudflare engineers implemented protection at the WAF level. The implementation of the protection was not without its challenges: during the implementation, the buffer size for proxy traffic inspection was increased.серверах, but it turned out that the WAF testing toolkit didn't support the specified buffer size. Since this toolkit doesn't affect traffic, it was decided to disable it.
To disable the test, the engineers used the "killswitch" subsystem for quickly changing the configuration and disabling individual Lua handlers on proxy servers without changing the rules. This method of disabling rules is occasionally used to quickly debug bugs and results in the execution of some Lua code being skipped. However, the engineers failed to take into account that the Lua rules used the "execute" method to call the test toolkit being disabled, which runs an additional set of rules. The "killswitch" mode had never previously been used with rules that called "execute," and this combination had not been tested.
Using the "killswitch" disabled the code defining the additional test ruleset, but the invocation of this ruleset via "execute" remained. The code lacked additional checks for object existence and assumed that if the "execute" action was present in the ruleset, the "rule_result.execute" object would exist. As a result, an attempt was made to execute the "execute" method on an uninitialized object, which caused the handler to crash with the error "attempt to index field 'execute' (a nil value)". if rule_result.action == "execute" then rule_result.execute.results = ruleset_results[tonumber(rule_result.execute.results_index)] end
Source: opennet.ru
