Understanding the Stellar Consensus Protocol

Understanding the Stellar Consensus Protocol

The Stellar consensus protocol was first described in scientific article David Mazier in 2015. It is a "federal byzantine agreement system" that allows leaderless decentralized computing networks to efficiently reach consensus on a decision. The Stellar Payment Network uses the Stellar Consensus Protocol (SCP) to maintain a consistent transaction history that is visible to all participants.

Consensus protocols are considered difficult to understand. SCP is simpler than most of them, but still shares that reputation, partly because of the misguided idea that the "federated vote" that the first half of the paper is devoted to is SCP. But it's not! This is just an important building block, which in the second half of the article is used to create actual the Stellar consensus protocol.

In this article, we will briefly describe what a "system of agreements" is, what can make it "Byzantine" and why make the Byzantine system "federal". We will then explain the federated voting procedure described in the SCP article, and finally explain the SCP protocol itself.

Agreement systems

The agreement system allows a group of participants to come to a consensus on a topic, such as what to order for lunch.

We at Interstellar have implemented our own lunch agreement system: we order what our operations manager John says. This is a simple and effective system of agreements. We all trust John and believe that every day he will find something interesting and nutritious.

But what if John abuses our trust? He can single-handedly decide that we should all go vegan. In a week or two, we'll probably overthrow him and hand him over to Elizabeth. But suddenly she loves avocados with anchovies and thinks that everyone should become like that. Power corrupts. So it’s better to find some more democratic method: some way to make sure that different preferences are taken into account, while ensuring a timely and unambiguous result, so that no one orders dinner, or five people place different orders, or the discussion drags on until the evening.

It would seem that the solution is simple: to vote! But this is a misleading impression. Who will collect the ballots and report the results? And why should the rest believe what he says? Maybe we can first vote for a leader whom we trust to lead the vote - but who will lead this first voting? What if we can't agree on a leader? Or if we agree, and this leader gets stuck at a meeting or goes on sick leave?

Similar problems are encountered in distributed computer networks. All participants or nodes must agree on some decision, such as whose turn it is to update a shared file or take a task from a processing queue. In a cryptocurrency network, nodes repeatedly have to choose what the full story looks like from several possible versions that sometimes conflict. This network agreement guarantees the recipient that the coin is (a) valid (not counterfeit) and (b) has not yet been spent elsewhere. It also ensures that he can spend the coins in the future, because the new recipient will have the same guarantees for the same reasons.

Any consensus system in a distributed computing network must be fault-tolerant: it must produce consistent results despite errors such as slow links, unresponsive nodes, and bad message order. Byzantine the convention system is additionally resistant to "Byzantine" errors: nodes that give false information, whether due to error or in a deliberate attempt to undermine the system or gain some advantage. "Byzantine" fault tolerance - the ability to trust a group decision even when some members of the group may be lying or otherwise not following the decision rules - is named after parable of the generals of the Byzantine Empirewho tried to coordinate the attack. good description by Anthony Stevens.

Consider the owner of a cryptocoin, Alice, who must choose between buying delicious ice cream from Bob or paying off Carol's debt. Perhaps Alice wants to pay both at once by fraudulently spending the same coin. To do this, she must convince Bob's computer that the coin was never paid to Carol, and convince Carol's computer that the coin was never paid to Bob. The Byzantine convention system makes this virtually impossible by using a form of majority rule called quorum. A node in such a network refuses to jump to a particular version of history until it sees that a sufficient number of peers - a quorum - agree to such a jump. Once this happens, they will form a voting bloc large enough to force the remaining nodes in the network to agree with their decision. Alice can get some nodes to lie on her behalf, but if the network is large enough, then her attempt will be overwhelmed by the votes of honest nodes.

How many nodes are required for quorum? At a minimum, a majority, or rather, a qualified majority to deal with errors and fraud. But to calculate the majority, you need to know the total number of participants. In an Interstellar office or at a district election, these numbers are easy to find. But if your group is a loosely defined network in which nodes can enter and leave at will without coordination with the center, then you need federal a Byzantine convention system capable of determining quorums not from a predetermined list of nodes, but dynamically, from an ever-changing and inevitably incomplete snapshot of nodes at a given point in time.

It may seem impossible to create a quorum from the point of view of a single node in a vast network, but it is possible. Such a quorum can even guarantee the results of decentralized voting. The SCP white paper shows how to do this using a procedure called federal vote.

For the impatient

The rest of the article describes federated voting and the Stellar consensus protocol in more detail. If you're not interested in the details, here's a general overview of the process.

  1. The nodes conduct rounds of federal voting on "nominees". A federal vote round means:
    • The node votes for some statement, for example, "I propose the value of V";
    • The node listens to the voices of the peers until it finds one that can "accept";
    • The node seeks a "quorum" for this assertion. The quorum "confirms" the nominee.
  2. Once a node can confirm one or more nominees, it attempts to "prepare" a "ballot" through several rounds of federated voting.
  3. Once a node is able to verify the ballot is ready, it attempts to commit it with even more rounds of federated voting.
  4. Once a node can commit a bulletin, it can "externalize" the value of that bulletin, using it as a consensus result.

These steps include several rounds of federal voting, which together form one SCP round. Let's take a closer look at what happens at each step.

federated vote

Federated voting is a procedure for determining whether a network can agree on a proposal. In a voting round, each node must choose one of potentially many possible values. It cannot do this until it is sure that other nodes in the network will not choose a different outcome. To be sure of this, the nodes exchange a flurry of messages back and forth so that each confirmed that quorum knots takes same decision. The rest of this section explains the terms in this sentence and how the whole procedure takes place.

Quorums and Quorum Slices

Let's start with defining a quorum. As we discussed above, in a decentralized network with dynamic membership, it is impossible to know in advance the number of nodes and therefore how many the majority needs. Federated voting solves this problem by introducing a new idea cutoff quorum (quorum slice): A small set of peers that a node trusts to relay voting state information to the rest of the network. Each node defines its own slice of the quorum (of which it actually becomes a member).

The formation of a quorum begins with a cutoff of the quorum. For each node, the nodes of its slice are added. Then slice members are added these nodes and so on. As you continue, you come across more and more nodes that you cannot add because they are already included in the slice. When there are no more nodes to add, the process stops: we have formed a quorum by "transitive closure" of the initial node's quorum slice.

Understanding the Stellar Consensus Protocol
To find a quorum from a given node…

Understanding the Stellar Consensus Protocol
… add members to its slice…

Understanding the Stellar Consensus Protocol
… then add the slice members of those nodes.

Understanding the Stellar Consensus Protocol
Continue until there are no nodes left to add.

Understanding the Stellar Consensus Protocol

Understanding the Stellar Consensus Protocol
No nodes left to add. This is a quorum.

In fact, each node can be included in more than one slice. To form a quorum, select only one of the slices and add members; then select any slice for each of the members and add the members it cut and so on. This means that each node is a member of the set of possible quorums.

Understanding the Stellar Consensus Protocol
Select only one slice of the quorum at each step.

Understanding the Stellar Consensus Protocol

Understanding the Stellar Consensus Protocol

Understanding the Stellar Consensus Protocol
One possible quorum. Or alternative...

Understanding the Stellar Consensus Protocol
… select other slices…

Understanding the Stellar Consensus Protocol

Understanding the Stellar Consensus Protocol
…(when it's possible)…

Understanding the Stellar Consensus Protocol
… creates another quorum.

How does a node know which slices other nodes are in? Just like other information about other nodes: from the transmissions that each node broadcasts to the network when its voting state changes. Each translation includes information about the slices of the sending host. The SCP white paper does not specify a communication mechanism. Implementations usually use gossip protocol for guaranteed broadcasting of messages throughout the network.

Recall that in the non-federated Byzantine convention system, a quorum is defined as a majority of all nodes. The Byzantine system of agreements is designed from the point of view of the question: how many dishonest knots can the system endure? In a system of N nodes, designed to survive f failures (spoofs), the node must be able to make progress by getting a response from N − f peers, since f of them may not be functioning. But having received a response from N − f peers, we can assume that all f peers (from which the node did not receive a response) are actually honest. Thus, f out of N−f peers (from which the response is received) are malicious. In order for the nodes to come to the same consensus, most of the other nodes must be honest, that is, we need N−f to be greater than 2f or N > 3f. So typically a system designed to survive f failures will have a total of N=3f+1 nodes and a quorum size of 2f+1. Once a proposal passes the quorum threshold, the rest of the network is convinced that any competing proposals will fail. So the network converges to the result.

But in a federated Byzantine system of agreements, not only can there be no majority (because no one knows the total size of the network), but the concept of the majority is generally useless! If membership in the system is open, then someone can gain a majority by simply performing a so-called Sibyl attack: repeatedly joining the network through several nodes. So why can a transitive slice closure be called quorum, and how is it able to suppress competing proposals?

Technically, no way! Imagine a network of six nodes where two triples are isolated in each other's quorum slices. The first subgroup may make a decision that the second will never hear about, and vice versa. There is no way for this network to reach consensus (except by chance).

Therefore, SCP requires that for federated voting (and for applying the important theorems of the article), the network must have a property called crossing quorums. In a network with this property, any two quorums that can be built always overlap in at least one node. For determining prevailing network sentiment, it's as good as having a majority. Intuitively, this means that if any quorum agrees with statement X, no other quorum can ever agree with anything else, because it will necessarily include some node from the first quorum that has already voted for X.

Understanding the Stellar Consensus Protocol
If there is a quorum intersection in the network...

Understanding the Stellar Consensus Protocol
…then any two quorums you can build…

Understanding the Stellar Consensus Protocol
… will always intersect.

Understanding the Stellar Consensus Protocol

Understanding the Stellar Consensus Protocol

(Of course, overlapping nodes can turn out to be Byzantine-deceitful or bad in other ways. In this case, crossing quorums does not help the network agree at all. For this reason, many of the results in the SCP whitepaper are based on explicit assumptions, such as what is left in the network quorum crossing even after removing bad nodes. For simplicity, we leave these assumptions implicit in the rest of the article).

It may seem unreasonable to expect that reliable quorum intersection is possible in a network of independent nodes. But there are two reasons why this is so.

The first reason is the existence of the Internet itself. The Internet is a perfect example of a network of independent nodes with quorum intersection. Most nodes on the Internet only connect to a few other local nodes, but these small sets overlap enough that every node is reachable from every other node along one route or another.

The second reason is specific to the Stellar payment network (the most common use of SCP). Every asset on the Stellar network has an issuer, and the Stellar guidelines require that each issuer designate one or more nodes on the network to process redemption requests. It is in your interest to directly or indirectly include these nodes in the quorum slices for each asset you are interested in. Then the quorums for all nodes interested in a given asset will overlap in at least those redemption nodes. Nodes interested in multiple assets will include in their quorum slices all the repayment nodes of the respective issuers, and they will seek to pool all the assets together. In addition, any assets that are not linked in this way to others on the network, and should not be related - it is designed so that there is no quorum intersection for this network (for example, banks from the dollar zone sometimes want to trade with banks from the euro zone and banks from the peso zone, so they are in the same network, but none of them cares about a separate network of children selling baseball cards).

Of course, expectation quorum intersection is not a guarantee. Other Byzantine systems of agreement owe much of their complexity to the guarantee of quorums. An important innovation of SCP is that it removes the responsibility for creating quorums from the consensus algorithm itself and brings it to the application layer. Thus, while federated voting is general enough to vote on any issue, in fact its reliability depends critically on the broader meaning of these values. Some hypothetical uses may not be as convenient for building well-connected networks as others.

Voting, acceptance and confirmation

In a federated voting round, the node optionally starts voting for some value of V. This means broadcasting the message to the network: “I am node N, my quorum slices are Q, and I vote for V.” When a node votes this way, it promises that it has never voted against V and never will.

In peer-to-peer broadcasts, each node sees how the others vote. Once a node has collected enough of these messages, it can track quorum slicing and try to find quorums. If it sees a quorum of peers that are also voting for V, then it can proceed to acceptance V and broadcast this new message to the network: "I am node N, my quorum slices are Q, and I accept V." Acceptance provides a stronger guarantee than a simple vote. When a node votes for V, it can never vote for other options. But if a node accepts V, no node in the Network will ever accept another option (Theorem 8 in the SCP white paper proves this).

Of course, there is a high chance that there will not be a quorum of nodes that agree on V right away. Other nodes may vote for other values. But there is another way for a node to move from simple voting to acceptance. N can take a different value of W, even if it did not vote for it, and even if it does not see a quorum for it. To decide to change your vote, it is enough to see blocking set nodes that have accepted W. A blocking set is one node from each of the slices of N quorums. As the name suggests, it is capable of block any other value. If all nodes in such a set accept W, then (by Theorem 8) it will never be possible to form a quorum that takes a different value, and therefore it is also safe for N to accept W.

Understanding the Stellar Consensus Protocol
Node N with three quorum slices.

Understanding the Stellar Consensus Protocol
BDF is a blocking set for N: it includes one node from each of N's slices.

Understanding the Stellar Consensus Protocol
BE is also a blocking set for N because E appears in two slices of N.

But a blocking set is not a quorum. It would be too easy to trick node N into accepting the desired value if it is enough to hack just one node in each of the N slices. Therefore, accepting the value is not the end of the vote. Instead, N must acknowledge the value, that is, see a quorum of nodes accepting it. If it goes that far, then as the SCP whitepaper (in Theorem 11) proves, the rest of the network will also end up confirming the same value, so N will end the federated vote with a certain value as the result.

Understanding the Stellar Consensus Protocol
federal vote.

The process of voting, acceptance and confirmation constitutes one full round of federated voting. The Stellar Consensus Protocol brings together many of these rounds to create a complete consensus system.

Stellar Consensus Protocol

The two most important properties of a consensus system are − safety и survivability. A consensus algorithm is "safe" if it can never give different results to different participants (Bob's copy of the story will never contradict Carol). “Liveness” means that the algorithm will always give a result, that is, it will not get stuck.

Described federal voting procedure safe in the sense that if a node asserts the value of V, no other node will assert another value. But “will not confirm another value” does not mean that it will necessarily confirm something. Members can vote for so many different values ​​that none will reach the acceptance threshold. This means that in federated voting there is no survivability.

The Stellar Consensus Protocol uses federated voting in a way that guarantees both security and survivability. (SCP's safety and survivability guarantees are theoretically limited. The design opts for a very strong safety guarantee, sacrificing a small reduction in survivability, but given enough time, consensus is highly likely to be reached.) In a nutshell, the idea is to have multiple federated votes on multiple values ​​until one of them goes all the way through all the SCP voting phases described below.

The values ​​on which SCP seeks consensus could be transaction history or lunch orders or whatever, but it's important to note that these are not values ​​that are accepted or confirmed. Instead, federal voting takes place according to statements about these values.

The first rounds of federal voting take place on nomination stage (nomination phase), on a set of statements like "I nominate V", possibly for many different values ​​of V. The purpose of the nomination is to find one or more statements that go through acceptance and confirmation.

After finding confirmed candidates, SCP proceeds to the voting stage, where the goal is to find some bulletin (i.e. a container for the proposed value) and a quorum that can declare commit for it (commit). If the quorum commits the ballot, its value is taken as consensus. But before a node can vote on a ballot commit, it must first commit cancellation all ballots with a lower counter value. These steps - canceling the ballots to find one to commit to - involve several rounds of federated voting on several ballot claims.

The following sections describe nomination and voting in more detail.

nomination

At the beginning of the nomination stage, each node can spontaneously choose a value for V and vote for the statement "I nominate V". The goal at this stage is to confirm the nomination of some value through a federated vote.

Perhaps enough nodes vote for sufficiently different statements that no nomination can ever reach the acceptance threshold. Therefore, in addition to broadcasting their own nomination votes, nodes “reflect” the nominations of their peers. Echo means that if a node votes to nominate V but sees a message from a neighbor voting to nominate W, it will now vote to nominate both V and W. (Not all peer votes are echoed during nomination because this can lead to an explosion of different nominees. SCP includes a mechanism to regulate these votes. In short, there is a formula to determine the "priority" of a peer from the point of view of the node, and only the votes of high priority nodes are reflected. The longer the nomination takes, the lower the threshold, so the node expands the set of peers whose votes it will represent.The priority formula includes the slot number as one of the inputs, so a high-priority peer for one slot may be a low-priority peer for another, and vice versa).

Conceptually, running in parallel, both V and W are separate federated votes, each individually capable of achieving acceptance or confirmation. In practice, SCP protocol messages pack these individual voices together.

Although voting for the V nomination is a promise never to vote against the V nomination, it is at the application level - in this case SCP - that defines what "against" means. SCP does not see a statement that conflicts with voting "I am nominating X", i.e. there is no "I am against nominating X" message, so the node can vote to nominate any values. Many of these nominations will go nowhere, but eventually the node will be able to accept or validate one or more values. Once a nominee is confirmed, it becomes candidate.

Understanding the Stellar Consensus Protocol
SCP nomination using a federated vote. There can be many "B" values ​​pushed by peers and "reflected" by the peer.

Candidate nominations may result in multiple confirmed candidates. Therefore, SCP requires that the application layer provide some method to combine candidates into one composite (composite). The join method can be anything. The main thing is that if this method is deterministic, then each node will combine the same candidates. In a dinner voting system, "union" can simply mean rejecting one of the two candidates. (But in a deterministic way: each node must choose the same value to reset. For example, an earlier choice in alphabetical order). In the Stellar payment network, where transaction history is voted on, joining two proposed nominees involves joining the transactions they contain and the last of their two timestamps.

The SCP technical description proves (Theorem 12) that by the end of the promotion phase, the network eventually converges to one composite. But there is a problem: federated voting is an asynchronous protocol (just like SCP). In other words, the nodes do not coordinate in time, but only in the messages they send. From the point of view of the node, it is not clear when ended extension phase. And although all nodes will eventually arrive at the same composite, they may take different routes along the way, creating different composite candidates along the way, and can never tell which one is the final one.

But it normal. Advancing is just preparation. The main thing is to limit the number of candidates to reach a consensus that occurs in the process running (balloting).

balloting

The newsletter is a couple , where counter is an integer that starts at 1 and value is a candidate from the nomination stage. This may be the node's own candidate, or a neighboring node's candidate accepted by that node. Roughly speaking, balloting involves repeated attempts to get the network to reach consensus on some candidate on some ballot by holding potentially many federated votes on ballot claims. Ballot counters keep track of attempts made, and ballots with higher counters take precedence over ballots with lower counters. If the ballot stuck, new voting starts, now on ballot .

It is important to distinguish meaning (for example, what should be the order for lunch: pizza or salads), bulletins (counter-value pair) and statements about the ballots. The SCP round includes several rounds of federal voting, in particular on the following statements:

  • "I'm ready to commit Bulletin B" and
  • "I'm announcing a commit to Bulletin B"

From a given node's point of view, consensus is achieved when it finds a ballot B for which it can confirm (that is, find a quorum that accepts) the statement "I announce a commit of ballot B". From now on, it is safe to act on the value specified in B - for example, place this order for lunch. It is called externalization values. Once the acceptance of the ballot is confirmed, a node can be sure that any other node has externalized the same value or will certainly externalize it in the future.

Although conceptually many federated votes are held over claims for many different ballots, they do not exchange as many messages because each message encapsulates a number of ballots. One message thus advances the state of many federated votes at once, for example: "I am accepting a commit of ballots ranging from before ".

What do the terms "prepared" and "commit" mean?

A node votes to commit a ballot when it is convinced that other nodes will not commit ballots with different values. Convincing this is the purpose of preparing the statement. A vote that says "I'm ready to commit Bulletin B" is a promise to never commit a Bulletin smaller than B, i.e. with a lower count (SCP requires the values ​​in Bulletins to be in a certain order. Thus, bulletin less if N1

Why does "I'm ready to commit bulletin B" mean "I promise never to commit bulletins smaller than B"? Because SCP defines abort as the opposite of commit. Voting to prepare a ballot also means voting to cancel some other ballots, and as we discussed earlier, voting for one thing is a promise never to vote against it.

Before broadcasting a commit, a node must first find a bulletin that it can acknowledge as prepared. In other words, it federates a vote on "I'm ready to commit ballot B" to possibly many different ballots until it finds one that receives a quorum.

Where do ballot papers come from to prepare the vote? The node first broadcasts preparations for voting for <1,C>, where C is the composite candidate produced during the nomination stage. However, even after ballot preparations have begun, a nomination may result in additional candidates appearing as new ballots. Meanwhile, peers can have different candidates and form a blocking set that accepts "I'm ready to commit Bulletin B2", which will convince the node to accept it too. Finally, there is a timeout mechanism that generates new rounds of federated voting on new ballots with higher counts if the current ballots get stuck.

As soon as the node finds a ballot B that it can acknowledge as prepared, it broadcasts a new message "Bulletin B Commit". This vote tells the peers that the node will never give up B. In fact, if B is a ballot , then "Bulletin Commit "means the unconditional consent to vote for the readiness of each ballot from to <∞, s>. This extra value helps other peers catch up with the committed peer if they are still in the earlier stages of the protocol.

At this stage, it is worth emphasizing once again that these are asynchronous protocols. Just because one node sends commit votes doesn't mean its peers do so too. Some of them may still be voting on pre-vote statements, others may have already externalized the meaning. The SCP explains how a node should process each type of peer-to-peer message regardless of its phase.

If the message "I announce a commit » cannot be accepted or acknowledged, i.e. the possibility of accepting or acknowledging the message or — or at any rate, any bulletin with value C and not anything else, since the node has already promised never to cancel . By the time the node broadcasts votes for the commit, it will be C or nothing, depending on how far the consensus goes. However, this is not yet enough for the node to externalize C. Some Byzantine peers (which are less than a quorum based on our security assumptions) may be lying to the node. Accepting and then confirming some ballot (or range of ballots) is what gives the node the confidence to finally externalize C.

Understanding the Stellar Consensus Protocol
Balloting SCP through a federated vote. Not shown: A timer can go off at any time, incrementing the counter on the ballot (and possibly producing a new composite from additional nominated candidates).

And it's all! Once the network has reached a consensus, it is ready to do it again and again. On the Stellar payment network, this happens about once every 5 seconds: a feat that requires both the security and survivability guaranteed by SCP.

SCP can achieve this by relying on multiple rounds of federal voting. Federated voting is made possible by the concept of quorum slices: sets of peers that each node chooses to trust as part of its (subjective) quorum. This configuration means that it is possible to reach consensus even in a network with open membership and Byzantine hoaxes.

Further reading

  • The original SCP white paper can be found here, here draft specifications for its implementation.
  • The original author of the SCP protocol, David Mazier, explains it in a simplistic (but still technical) way here.
  • You may have been surprised not to find the terms "mining" or "proof of work" in this article. SCP does not use these methods, but some other consensus algorithms do. Zane Witherspoon wrote accessible overview of consensus algorithms.
  • Step-by-step description a simple network that reaches consensus in one full round of SCP.
  • For readers interested in SCP implementations: see c++ codeused by the Stellar payment network, or Go code, which I wrote for a better understanding of SCP.

Source: habr.com

Add a comment