Leak of BGP route in Rostelecom led to disruption of connectivity of the largest networks

As a result of an erroneous BGP announcement of more than 8800 foreign network prefixes were redirected through the Rostelecom network, which led to a short-term collapse of routing, disruption of network connectivity and problems with access to some services around the world. Problem embraced over 200 autonomous systems owned by major internet companies and content delivery networks including Akamai, Cloudflare, Digital Ocean, Amazon AWS, Hetzner, Level3, Facebook, Alibaba and Linode.

The erroneous announcement was made by Rostelecom (AS12389) on April 1 at 22:28 (MSK), then it was picked up by the provider Rascom (AS20764) and further along the chain spread to Cogent (AS174) and Level3 (AS3356), the field of which covered almost all Internet providers first level (Tier-1). Services monitoring BGP promptly notified Rostelecom about the problem, so the incident lasted about 10 minutes (according to other data effects were observed for about an hour).

This is not the first incident related to an error on the side of Rostelecom. In 2017 within 5-7 minutes via Rostelecom were redirected networks of the largest banks and financial services, including Visa and MasterCard. Apparently, in both incidents, the source of the problem served work related to traffic management, for example, route leaks could occur when organizing internal monitoring, prioritizing or mirroring the traffic of certain services and CDN passing through Rostelecom (due to an increase in network load due to mass work at home at the end of March discussed the issue of lowering the priority for the traffic of foreign services in favor of domestic resources). For example, a few years ago in Pakistan, an attempt wrapping YouTube subnets to the null interface caused these subnets to appear in BGP announcements and drain all YouTube traffic to Pakistan.

Leak of BGP route in Rostelecom led to disruption of connectivity of the largest networks

It is interesting that the day before the incident with Rostelecom, a small provider "New Reality" (AS50048) from the city of St. Sumerlya through Transtelecom was announced 2658 prefixes affecting Orange, Akamai, Rostelecom and networks of more than 300 companies. The route leak resulted in several waves of traffic redirects lasting several minutes. At its peak, the problem covered up to 13.5 million IP addresses. A noticeable global disruption was avoided thanks to the use of route restrictions in Transtelecom for each client.

Similar incidents occur on the global web regularly and will continue until universally implemented authorization methods BGP announcements based on RPKI (BGP Origin Validation), which allow receiving announcements only from network owners. Without authorization, any operator can advertise a subnet with fictitious information about the length of the route and initiate transit through itself of part of the traffic from other systems that do not apply advertisement filtering.

At the same time, in the incident under consideration, a check using the RIPE RPKI repository turned out to be useless. By coincidence, three hours before the leak of the BGP route in Rostelecom, in the process of updating the RIPE software, accidentally deleted 4100 ROA records (RPKI Route Origin Authorization). The database was restored only on April 2, and all this time for RIPE clients the check was inoperable (the problem did not affect the RPKI repositories of other registrars). Today RIPE has new issues and RPKI repository within 7 hours was unavailable.

Registry-based filtering can also be used as a solution to block leaks IRR (Internet Routing Registry), which defines autonomous systems through which routing of given prefixes is allowed. When interacting with small operators, you can limit the maximum number of accepted prefixes for EBGP sessions (maximum-prefix setting) to reduce the consequences of human errors.

In most cases, incidents are the result of random personnel errors, but recently there have also been targeted attacks, during which attackers by compromising the infrastructure of providers organize redirect ΠΈ intercept traffic for substitutions specific sites through the organization of MiTM attacks to replace DNS responses.
To make it more difficult to obtain TLS certificates during such attacks, the Let's Encrypt certification authority recently moved to multisite domain checking using different subnets. To bypass this check, an attacker would need to simultaneously achieve route redirection for several provider autonomous systems with different uplinks, which is much more difficult than redirecting a single route.

Source: opennet.ru

Add a comment