Attack of the week: voice calls in LTE (ReVoLTE)

From translator and TL;DR

  1. TL; DR:

    It seems that VoLTE was even worse protected than the first Wi-Fi clients with WEP. An exceptional architectural miscalculation that allows you to XOR traffic a little and restore the key. The attack is possible if you are near the caller and he often makes calls.

  2. Thanks for the tip and TL;DR Klukonin

  3. Researchers made an app to determine if your carrier is vulnerable, learn more here. Share the results in the comments, VoLTE is disabled in my region on Megafon.

About the Developer

Matthew Green.

I am a cryptographer and professor at Johns Hopkins University. I have designed and analyzed cryptographic systems used in wireless networks, payment systems and digital content protection platforms. In my research, I look at various ways in which cryptography can be used to improve user privacy.

I haven't written a post in a while "attack of the week"and that upset me. Not because there were no attacks, but mainly because there was no attack on something widely used enough to get me out of a creative block.

But today I came across interesting attack called ReVoLTE on the protocols that make me particularly happy to be hacked, namely, the protocols of cellular networks (voice over) LTE. I'm excited about these particular protocols - and this new attack - because it's very rare to see real protocols and implementations of cellular networks being hacked. Mainly because these standards are developed in smoky rooms and are issued in 12000-page documents that not every researcher can handle. Moreover, the implementation of these attacks forces researchers to use complex radio protocols.

Thus, serious cryptographic vulnerabilities can spread all over the world, and will probably only be exploited by governments before any researcher pays attention to them. But there are exceptions from time to time, and today's attack is one of them.

Authors attacks: David Rupprecht, Katharina Kohls, Thorsten Holz and Christina PΓΆpper from Ruhr University Bochum and New York University Abu Dhabi. This is a great attack on reinstalling the key in a voice protocol that you're probably already using (assuming you're from an older generation that still makes phone calls with a cell phone).

First, a brief historical digression.

What is LTE and VoLTE?

The basis of our modern cellular telephony standards was laid in Europe back in the 80s by the standard Global System for Mobile (Global System for Mobile Communications). GSM was the first major digital cellular telephony standard that introduced a number of revolutionary features such as the use of encryption to protect phone calls. Early GSM was designed primarily for voice communications, although money could transfer other data.

As the importance of data transmission increased in cellular communications, Long Term Evolution (LTE) standards were developed to rationalize this type of communication. LTE is based on a group of older standards such as GSM, EDGE ΠΈ HSPA and is designed to increase the speed of data exchange. There is a lot of branding and misleading labels, but the TL;DR is that LTE is a data communications system that serves as a bridge between older packet data protocols and future cellular data technologies 5G.

Of course, history tells us that as soon as there is enough (IP) bandwidth, concepts like "voice" and "data" will begin to blur. The same applies to modern cellular protocols. To make this transition smoother, the LTE standards define Voice over LTE (VoLTE), which is an IP standard for transmitting voice calls directly over the data plane of an LTE system, completely bypassing the switched part of the cellular network. As in the case of standard VoIP calls, VoLTE calls can be terminated by the mobile operator and connected to the regular telephone network. Or (which is becoming more common) they can be routed directly from one cellular client to another, and even between different providers.

Like standard VoIP, VoLTE is based on two popular IP-based protocols: the Session Initiation Protocol (Session Initiation Protocol – SIP) for call setup, and real-time transport protocol (Real Time Transport Protocol, which should be called RTTP but is actually called RTP) for processing voice data. VoLTE also adds some additional bandwidth optimizations such as header compression.

Okay, what does this have to do with encryption?

LTE, like GSM, has a standard set of cryptographic protocols for encrypting packets during their transmission over the air. They are primarily designed to protect your data as it travels between the phone (called "User Equipment", or UE) and the cell tower (or wherever your ISP decides to terminate the connection). This is because cellular service providers view external eavesdropping devices as enemies. Well, of course.

(However, the fact that VoLTE connections can happen directly between clients on different provider networks means that the VoLTE protocol itself has some additional and optional encryption protocols that can happen at higher network layers. This does not apply to the current article, except the fact that they can mess things up (we'll talk about them briefly later).

Historically, encryption in GSM has many weaknesses: bad ciphers, protocols in which only the phone authenticated at the tower (meaning that an attacker could impersonate the tower, causing Stingray) and so on. LTE corrected many of the obvious bugs while retaining much of the same structure.

Let's start with encryption itself. Assuming that key generation has already taken place - and we'll talk about that in a minute - then each data packet is encrypted using streaming cipher mode with some kind of cipher called "EEA" (which in practice can be implemented using things like AES ). Essentially, here the encryption mechanism is CTRas shown below:

Attack of the week: voice calls in LTE (ReVoLTE)
Basic VoLTE packet encryption algorithm (source: ReVoLTE). EEA is a cipher, "COUNT" is a 32-bit counter, "BEARER" is a unique session identifier that separates VoLTE connections and regular Internet traffic. "DIRECTION" indicates in which direction the traffic goes - from the UE to the tower or vice versa.

Since the encryption algorithm (EEA) itself can be implemented using a strong cipher like AES, it is unlikely that there will be any direct attack on the cipher itself like this happened in the days of GSM. However, it is clear that even with a strong cipher, this encryption scheme is a great way to shoot yourself in the foot.

In particular: the LTE standard uses an (unauthenticated) stream cipher with a mode that will be extremely vulnerable if the counter - and other inputs such as "bearer" and "direction" - are ever reused. In modern language, the term for this concept is "nonce reuse attack", but the potential risks here are not something modern. They are famous and ancient, dating back to the days of glam metal and even disco.

Attack of the week: voice calls in LTE (ReVoLTE)
Nonce reuse attacks in CTR mode have existed since Poison became known

To be fair, the LTE standards say, "Don't reuse these meters, please." But the LTE standards are about 7000 pages long, and anyway, it's like begging kids not to play with a gun. They will inevitably do so, and terrible things will happen. In this case, the discharging gun is a keystream reuse attack in which two different confidential messages are XORed with the same keystream bytes. It is known that this extremely damaging to the privacy of messages.

What is ReVoLTE?

The ReVoLTE attack demonstrates that in practice this highly vulnerable encryption construct is misused by real hardware. In particular, the authors analyze real VoLTE calls made using commercial equipment and show that they can use something called a "key reset attack". (Much of the credit for finding this problem belongs to Reize and Lou (Raza & Lu), who were the first to point out the potential vulnerability. But ReVoLTE research is turning it into a practical attack.)

Let me show you briefly the essence of the attack, although you should look and source document.

It can be assumed that once LTE establishes a packet data connection, the task of voice over LTE becomes just a matter of routing voice packets over that connection along with all your other traffic. In other words, VoLTE will be a concept that only exists over 2 level [OSI models - approx.]. This is not entirely true.

In fact, the LTE link layer introduces the concept of "bearer". Bearers are separate session identifiers that separate different kinds of burst traffic. Regular internet traffic (your Twitter and Snapchat) goes through one bearer. The SIP signaling for VoIP goes through the other, and the voice traffic packets are processed on the third. I am not well versed in the mechanisms of LTE radio links and network routing, but I believe that this is done because LTE networks want to ensure that QoS (quality of service) mechanisms work so that different packet streams are processed with different priority levels: i.e. your second rate TCP connections to Facebook may have a lower priority than your real-time voice calls.

This is generally not a problem, but the consequences of this are as follows. Keys for LTE encryption are generated separately each time a new bearer is installed. Basically, this should happen all over again every time you make a new phone call. This will result in a different encryption key being used for each call, eliminating the possibility of reusing the same key to encrypt two different sets of voice call packets. Indeed, the LTE standard says something along the lines of "you must use a different key each time you install a new bearer to handle a new phone call". But that doesn't mean it actually happens.

In fact, in real implementations, two different calls occurring in close temporal proximity will use the same key - despite the fact that new (same-named) bearers are configured between them. The only practical change that occurs between these calls is that the encryption counter is reset to zero. In the literature, this is sometimes called key reset attack. It can be argued that this is, in fact, an implementation error, although in this case the risks seem to largely stem from the standard itself.

In practice, this attack results in keystream reuse, where the attacker can obtain encrypted packets $inline$C_1 = M_1 oplus KS$inline$ and $inline$C_2 = M_2 oplus KS$inline$, allowing $inline$C_1 oplus C_2 to be computed = M_1 plus M_2$inline$. Even better, if an attacker knows one of $inline$M_1$inline$ or $inline$M_2$inline$, then he can immediately restore the other one. This gives him a strong incentive find out one of the two unencrypted components.

This brings us to the complete and most efficient attack scenario. Consider an attacker who can intercept radio traffic between a target phone and a cell tower, and who is somehow "lucky" to record two different calls, where the second occurs immediately after the first. Now imagine that he can somehow guess the unencrypted content of one of the calls. With such happy accident our attacker can completely decrypt the first call using a simple XOR between the two sets of packets.

Of course, luck has nothing to do with it. Since phones are designed to receive calls, an attacker who can eavesdrop on the first call will be able to initiate a second call just as the first one ends. This second call, if the same encryption key is used again with the counter reset to zero, will recover the unencrypted data. Moreover, since our attacker actually controls the data during the second call, he can recover the contents of the first call - thanks to many specifically implemented little thingsplaying by his side.

Here is an image of the general plan of attack, taken from original document:

Attack of the week: voice calls in LTE (ReVoLTE)
Overview of attack from ReVoLTE document. This scheme assumes that two different calls are made using the same key. The attacker controls a passive sniffer (top left) as well as a second phone with which he can make a second call to the victim's phone.

So does the attack really work?

On the one hand, this is really the main question for the ReVoLTE article. Theoretically, all of the above ideas are great, but leave a lot of questions. Such as:

  1. Is it possible (for academic researchers) to actually intercept a VoLTE connection?
  2. Do real LTE systems really reset keys?
  3. Can you actually initiate a second call quickly and reliably enough for the phone and tower to reuse the key?
  4. Even if the systems reset the keys, can you actually find out the unencrypted content of the second call - given that things like codecs and re-encoding can completely change the (bitwise) content of that second call, even if you have access to the "bits" coming from your attacking phone?

ReVoLTE's work answers some of these questions in the affirmative. The authors use a commercial software-reconfigurable radio stream sniffer called air scope to intercept a VoLTE call from the downlink side. (I think that simply mastering the software and having a rough understanding of how it works took months of the life of the poor graduate students - which is typical for such academic studies).

The researchers found that for key reuse to work, the second call had to happen fairly quickly after the first call was completed, but not too fastβ€”about ten seconds for the agents they experimented with. Fortunately, it doesn't matter if the user answers the call within this time - "ring" i.e. the SIP communication itself forces the operator to reuse the same key.

Thus, many of the worst problems revolve around problem (4) - getting bits of the plain content of a call initiated by an attacker. This is because a lot of things can happen to your content as it travels from the attacker's phone to the victim's phone over the cellular network. For example, such dirty tricks as recoding the encoded audio stream, which leaves the sound the same, but completely changes its binary representation. LTE networks also use RTP header compression, which can significantly change a large part of the RTP packet.

Finally, the packets sent by the attacker should line up roughly in line with the packets sent during the first phone call. This can be problematic, as modifying the silence during a phone call results in shorter messages (so-called comfort noise) that may not fit well with the original call.

Section Β«real world attackΒ» worth reading in detail. It addresses many of the above issues - in particular, the authors found that some codecs are not re-encoded, and that approximately 89% of the target call's binary representation can be recovered. This is true for at least two European operators that have been tested.

This is a surprisingly high level of success, and frankly much more than I expected when I started this document.

So what can we do to fix it?

The immediate answer to this question is extremely simple: since the essence of the vulnerability is a key reuse (reinstallation) attack, just fix this problem. Make sure to get a new key for every phone call, and never let the packet counter reset back to zero with the same key. Problem solved!

Or maybe not. This will require the modernization of a large amount of equipment, and, frankly, such a fix in itself is not super reliable. It would be nice if the standards could find a more secure way to implement their encryption modes that isn't catastrophically vulnerable by default to these kinds of key reuse problems.

One possible option is to use encryption modes in which misuse of nonce does not lead to catastrophic consequences. This may be too expensive for some of today's hardware, but it's certainly a direction that designers should be thinking about in the future, especially as 5G standards are about to take over the world.

This new study also raises the general question of why the same damn attacks keep popping up in one standard after another, many of which use very similar constructs and protocols. When you're faced with the problem of reinstalling the same key across multiple widely used protocols such as WPA2, don't you think it's time to make your specifications and testing procedures more robust? Stop treating standards implementers as thoughtful partners, heedful of your warnings. Treat them like (unintentional) adversaries who are inevitably going to get things wrong.

Alternatively, we can do what companies like Facebook and Apple are increasingly doing: make voice encryption happen at a higher layer of the OSI network stack, without relying on cellular equipment manufacturers. We can even promote end-to-end encrypted voice calls like WhatsApp does with Signal and FaceTime, assuming the US government will just stop trip us up. Then (with the exception of some metadata) many of these problems would simply disappear. This decision is especially relevant in a world where even governments are not sure if they trust their equipment suppliers.

Or we can just do what our kids have already done: stop answering those annoying voice calls.

Source: habr.com

Add a comment