Is it possible to generate random numbers if we don't trust each other? Part 1

Hey Habr!

In this article, I will talk about the generation of pseudo-random numbers by participants who do not trust each other. As we will see below, it is quite easy to implement an “almost” good generator, but it is difficult to implement a very good one.

Why do you need to generate random numbers for participants who do not trust each other? One area of ​​application is decentralized applications. For example, an application that accepts a bet from a participant and either doubles the amount with a 49% chance or wins with a 51% chance will only work if it can get a random number unbiased. If an attacker can influence the result of the random number generator, and even slightly increase his chance of getting paid in the application, he will easily devastate it.

When we design a distributed random number generation protocol, we want it to have three properties:

  1. He must be unbiased. In other words, no participant should in any way influence the result of the random number generator.

  2. It must be unpredictable. In other words, no participant should be able to predict what number will be generated (or infer any of its properties) before it is generated.

  3. The protocol must be viable, that is, resistant to the fact that a certain percentage of participants disconnect from the network or deliberately try to stop the protocol.

In this article, we will look at two approaches: RANDAO + VDF and the erasure code approach. In the next part, we'll take a closer look at the Threshold Signatures approach.

But first, let's break down a simple and commonly used algorithm that is viable, unpredictable, but biased.

RANDAO

RANDAO is a very simple and therefore quite commonly used approach to get randomness. All network participants first select a pseudo-random number locally, then each participant sends a hash of the selected number. Next, the participants in turn open their chosen numbers, and perform the XOR operation on the opened numbers, and the result of this operation becomes the result of the protocol.

The step of publishing the hashes before revealing the numbers is necessary so that the attacker cannot choose his number after he has seen the numbers of the other participants. This would allow him to virtually single-handedly determine the output of the random number generator.

During the protocol, participants need to come to a common decision (so-called consensus) twice: when to start opening the selected numbers, and therefore stop accepting hashes, and when to stop accepting the selected numbers and calculate the resulting random number. Making such decisions between participants who do not trust each other is not an easy task in itself, and we will return to it in future articles, in this article we will assume that such a consensus algorithm is available to us.

Which of the properties we described above does RANDAO have? It is unpredictable, has the same viability as the underlying consensus protocol, but it is biased. In particular, an attacker can watch the network, and after other participants reveal their numbers, he can calculate their XOR, and decide whether or not to reveal his number to influence the result. While this does not allow an attacker to single-handedly determine the output of the random number generator, it still gives them 1 bit of influence. And if attackers control several participants, then the number of bits they control will be equal to the number of participants under their control.

Is it possible to generate random numbers if we don't trust each other? Part 1

The influence of attackers can be greatly reduced by requiring participants to reveal the numbers in order. Then the attacker will be able to influence the outcome only if it is opened last. While the impact is much smaller, the algorithm is still biased.

RANDAO+VDF

One way to make RANDAO unbiased is as follows: after all the numbers are revealed and the XOR is calculated, its result is fed into the input of a function that takes a very long time to calculate it, but allows you to check the correctness of the calculation very quickly.

(vdf_output, vdf_proof) = VDF_compute(input) // это очень медленно
correct = VDF_verify(input, vdf_output, vdf_proof) // это очень быстро

Such a function is called a Verifiable Delay Function, or VDF. If the calculation of the final result takes more time than the stage of revealing the numbers, then the attacker will not be able to predict the effect of showing or hiding his number, and therefore he will lose the opportunity to influence the result.

Designing good VDFs is extremely difficult. Recently, several breakthroughs have been made, for example this и this, which have made VDFs more practical, and Ethereum 2.0 plans to use RANDAO with VDF as a source of random numbers in the long term. In addition to the fact that this approach is unpredictable and unbiased, it has the added benefit of being viable if at least two participants are available on the network (assuming the consensus protocol used is viable when dealing with such a small number of participants).

The biggest difficulty with this approach lies in tuning the VDF so that even a participant with very expensive specialized equipment cannot calculate the VDF until the end of the disclosure phase. Ideally, the algorithm should even have a significant margin of safety, say 10x. The figure below shows an attack by a participant that has a specialized ASIC that allows it to run a VDF faster than the time allotted to reveal the RANDAO confirmation. Such a participant can still calculate the final result using and not using his number, and after that, based on the calculations, choose whether to show it or not.

Is it possible to generate random numbers if we don't trust each other? Part 1

For the VDF family mentioned above, the performance of a dedicated ASIC can be 100+ times faster than conventional hardware. Thus, if the deployment phase lasts 10 seconds, then the VDF computed on such an ASIC must take more than 100 seconds to have a 10x safety margin, and thus the same VDF computed on conventional hardware must take 100 x 100 seconds = ~ 3 hours.

The Ethereum Foundation plans to solve this problem by creating their own publicly available free ASICs. Once this happens, all other protocols can also take advantage of this technology, but until then, the RANDAO + VDF approach will not be as viable for protocols that cannot invest in developing their own ASICs.

Many articles, videos and other information about VDF are collected on this site.

Using erasing codes

In this section, we'll look at a random number generation protocol that uses erasing codes. It can tolerate up to ⅓ of attackers while remaining viable, and allows up to ⅔ of attackers to exist before they can predict or influence the outcome.

The main idea of ​​the protocol is as follows. For simplicity, let's assume that it has exactly 100 members. Let's also assume that all participants locally have some private key, and the public keys of all participants are known to all participants:

  1. Each participant locally invents a long string, splits it into 67 parts, generates erasure codes to obtain 100 shares such that any 67 is sufficient to restore the string, assigns each of the 100 shares to one of the participants, and encrypts them using the same participant's public key. Then all encoded shares are published.

  2. Participants use some consensus to reach agreement on coded sets from specific 67 participants.

  3. Once consensus is reached, each participant takes the encoded shares in each of the 67 sets encrypted with their public key, decrypts all such shares, and publishes all such decrypted shares.

  4. Once 67 participants have completed step (3), all matched sets can be fully decoded and reconstructed due to the properties of the erasure codes, and the final number can be obtained as an XOR of the initial lines from which the participants started in (1).

Is it possible to generate random numbers if we don't trust each other? Part 1

This protocol can be shown to be unbiased and unpredictable. The resulting random number is determined after consensus is reached, but nobody knows until ⅔ of the participants have decoded the pieces encrypted with their public key. Thus, the random number is determined before the information sufficient to recover it is published.

What happens if, in step (1), one of the participants sent encoded beats to the other participants that are not a valid erasure code for some string? Without additional changes, different participants will either not be able to restore the string at all, or will restore different strings, which will lead to the fact that different participants will receive a different random number. To prevent this, you can do the following: each participant, in addition to the encoded shares, also calculates merkle tree of all such shares, and sends to each participant both the encoded share itself and the root of the merkle tree, and proof of the inclusion of the share in the merkle tree. In the consensus in step (2), then the participants do not just agree on a set of sets, but on a set of specific roots of such trees (if some participant deviated from the protocol, and sent different roots of the merkle tree to different participants, and two such roots are shown during the consensus, its string is not included in the result set). As a result of the consensus, we will have 67 encoded strings and their corresponding merkle tree roots such that there are at least 67 participants (not necessarily the same ones who proposed the corresponding strings) who for each of the 67 strings have a message with a share of the erasure code, and a proof occurrences of their share in the corresponding merkle tree.

When in step (4) the participant deciphers 67 beats for some string, and tries to restore the original string from them, one of the options is possible:

  1. The string is restored, and if it is then coded with erasure codes again, and a merkle tree is calculated for the locally calculated shares, the root matches the one on which the consensus was reached.

  2. The row is restored, but the locally computed root does not match the consensus root.

  3. The line is not restored.

It is easy to show that if option (1) happened for at least one participant above, then option (1) will happen for all participants, and vice versa, if option (2) or (3) happened for at least one participant, then for all participants option (2) or (3) will happen. Thus, for each row in the set, either all participants will successfully restore it, or all participants will fail to restore it. The resulting random number is then an XOR of only those rows that the participants were able to recover.

Threshold signatures

Another approach to randomness is to use so-called threshold BLS signatures. The random number generator based on threshold signatures has exactly the same guarantees as the algorithm based on erasures described above, but has a much smaller asymptotics of the number of messages sent over the network per generated number.

BLS signatures are a construct that allows multiple parties to create one common signature for a message. Such signatures are often used to save space and bandwidth by not requiring multiple signatures to be distributed. 

A common use for BLS signatures in blockchain protocols, besides random number generation, is block signing in BFT protocols. Let's say 100 participants create blocks, and a block is considered final if 67 of them sign it. All of them can submit their parts of the BLS signature and use some consensus algorithm to agree on 67 of them and then merge them into one BLS signature. Any 67 (or more) parts can be used to create the final signature, which will depend on which 67 signatures were combined, and therefore may vary, but despite the fact that different choices of 67 participants will create a different signature , any such signature will be a valid block signature. It is then enough for the rest of the participants to receive and verify only one signature for each block, instead of 67, which significantly reduces the load on the network.

It turns out that if the private keys that participants use are generated in a certain way, then no matter which 67 signatures (or more, but no less) are aggregated, the resulting signature will be the same. This can be used as a source of randomness: the participants first agree on some message that they will sign (it could be a RANDAO output or just a hash of the last block, it doesn't really matter, as long as it changes each time and is consistent), and create a BLS signature for it. The result of the generation will be unpredictable until 67 participants provide their parts, after which the output is already predetermined and cannot depend on the actions of any participant.

This approach to randomness is viable if at least ⅔ of the participants are online and following the protocol, and unbiased and unpredictable as long as at least ⅓ of the participants are following the protocol. It is important to note that an attacker who controls more than ⅓ but less than ⅔ of the participants can stop the protocol, but cannot predict or influence its output.

Threshold signatures themselves are a very interesting topic. In the second part of the article, we will analyze in detail how they work, and how exactly it is necessary to generate participant keys so that threshold signatures can be used as a random number generator.

In conclusion

This article is the first in a series of technical blog articles. NEAR. NEAR is a blockchain protocol and decentralized application development platform with a focus on ease of development and ease of use for end users.

The protocol code is open, our implementation is written in Rust, it can be found here.

You can see what development under NEAR looks like, and you can experiment in the online IDE here.

You can follow all the news in Russian in telegram group and group on VKontakte, and in English in the official twitter.

See you soon!

Source: habr.com

Add a comment