Yandex implements RPKI

Hello, my name is Alexander Azimov. At Yandex, I am developing various monitoring systems, as well as transport network architecture. But today we will talk about the BGP protocol.

Yandex implements RPKI

A week ago, Yandex enabled ROV (Route Origin Validation) at the junctions with all peering partners, as well as with traffic exchange points. Read below about why this was done and how it will affect interaction with telecom operators.

BGP and what's wrong with it

You probably know that BGP was conceived as an inter-domain routing protocol. However, during the journey, the number of user cases has grown: today, thanks to numerous extensions, BGP has turned into a message bus that covers tasks from carrier VPN to the now fashionable SD-WAN, and even found use as a transport for an SDN-like controller that turns distance vector BGP into something similar to links sate protocol.

Yandex implements RPKI

Fig. 1. BGP SAFI

Why has BGP received (and continues to receive) so many uses? There are two main reasons:

  • BGP is the only protocol that works between autonomous systems (ASs);
  • BGP supports attributes in TLV (type-length-value) format. Yes, the protocol is not alone in this, but since there is nothing to replace it at the junctions between telecom operators, it always turns out to be more profitable to attach one more functional element to it than to support an additional routing protocol.

What is wrong with him? In short, there are no built-in mechanisms for checking the correctness of the information received in the protocol. That is, BGP is a protocol of a priori trust: if you want to tell the world that you now own the network of Rostelecom, MTS or Yandex, please!

IRRDB based filter is the best of the worst

The question arises - why does the Internet still work in such a situation? Yes, it works most of the time, but at the same time it periodically explodes, making entire national segments inaccessible. Even though hacking activity in BGP is also on the rise, most of the anomalies are still due to bugs. This year's example is small operator error in Belarus, which made a significant part of the Internet inaccessible for MegaFon users for half an hour. Another example - mad BGP optimizer broke one of the largest CDN networks in the world.

Yandex implements RPKI

Rice. 2. Intercepting Cloudflare traffic

But still, why do such anomalies occur once every six months, and not every day? Because carriers use external routing information databases to verify what they are receiving from BGP neighbors. There are many such databases, some of them are managed by registrars (RIPE, APNIC, ARIN, AFRINIC), some are independent players (the most famous is RADB), and there is also a whole set of registrars owned by large companies (Level3, NTT, etc.). It is thanks to these databases that interdomain routing maintains the relative stability of its work.

However, there are nuances. Routing information is checked based on ROUTE-OBJECTS and AS-SET objects. And if the first ones imply authorization from the IRRDB part, then for the second class there is no authorization as a class. That is, anyone can add anyone to their sets and thereby bypass the filters of upstream providers. Moreover, the uniqueness of the AS-SET naming between different IRRs is not guaranteed, which can lead to surprising effects with a sudden loss of connectivity for a telecom operator, which, for its part, did not change anything.

An additional issue is the AS-SET usage model. There are two points here:

  • When an operator has a new client, he adds it to his AS-SET, but almost never removes it;
  • The filters themselves are configured only at the junctions with clients.

As a result, the modern format of BGP filters is gradually degrading filters at the junctions with clients and a priori trust in what comes from peering partners and IP transit providers.

What is replacing prefix filters based on AS-SET? The most interesting thing is that in the short term - nothing. But there are additional mechanisms that complement the work of filters based on IRRDB, and first of all, of course, this is RPKI.

RPKI

In a simplified way, the RPKI architecture can be represented as a distributed database whose records can be cryptographically verified. In the case of ROA (Route Object Authorization), the signer is the owner of the address space, and the entry itself is a triple (prefix, asn, max_length). In fact, this entry postulates the following - the owner of the $prefix address space allowed the AS with the number $asn to advertise prefixes with a length no greater than $max_length. And routers, using the RPKI cache, are able to match the pair prefix-first AS on the way.

Yandex implements RPKI

Fig 3. RPKI architecture

ROA objects have been standardized for a long time, but until recently they actually remained only on the paper of the IETF journal. In my opinion, the reason for this sounds scary - bad marketing. After the completion of standardization, it was claimed as an incentive that ROA protected against BGP hijacking - and this was not true. Attackers can easily bypass ROA-based filters by inserting the correct AS number at the beginning of the path. And as soon as this awareness came, the next natural step was to stop using ROA. And really, why do we need technology if it doesn't work?

Why is it time to change your mind? Because it's not the whole truth. ROA does not protect against hacker activity in BGP, but protects against accidental traffic hijacking, such as BGP static leaks, which are becoming more common. Also, unlike IRR-based filters, ROV can be used not only at the interfaces with clients, but also at the interfaces with peers and upstream providers. That is, along with the introduction of RPKI, a priori trust is gradually leaving BGP.

Now ROA-based route checking is gradually being introduced by key players: the largest European IX is already discarding incorrect routes, AT&T should be singled out among Tier-1 operators, which turned on filters at the junctions with its peering partners. The largest content providers are also approaching the projectile. And dozens of medium-sized transit operators have already quietly implemented it without telling anyone about it. Why are all these operators implementing RPKI? The answer is simple: to protect your outgoing traffic from other people's mistakes. That is why Yandex is one of the first in the Russian Federation to include ROV at the edge of its network.

What will happen next?

Now we have enabled checking of routing information at the junctions with traffic exchange points and private peerings. In the near future, verification will also be enabled with upstream traffic providers.

Yandex implements RPKI

What does it change for you? If you want to improve the security of traffic routing between your network and Yandex, we recommend:

  • Sign your address space in the RIPE portal It's easy and takes 5-10 minutes on average. This will protect our connection with you in case someone unwittingly hijacks your address space (and this will definitely happen sooner or later);
  • Install one of the open source RPKI caches (ripe-validator, routinator) and enable route checking at the network edge - this will take more time, but again, it will not cause technical difficulties.

Yandex also supports the development of a filtering system based on the new RPKI object β€” ASPA (Autonomous System Provider Authorization). Filters based on ASPA and ROA objects can not only replace "leaky" AS-SETs, but also close the questions of MiTM attacks using BGP.

I will be talking about ASPA in detail in a month at the Next Hop conference. Colleagues from Netflix, Facebook, Dropbox, Juniper, Mellanox and Yandex will also speak there. If you are interested in the network stack and its development in the future - come, registration is open.

Source: habr.com

Add a comment