Utreexo: compressing lots of Bitcoin UTXOs

Utreexo: compressing lots of Bitcoin UTXOs

Hey Habr!

In the Bitcoin network, all nodes in the course of consensus agree on a set of UTXOs: how many coins are available for spending, to whom, and under what conditions. The UTXO set is the minimum data set required for the validator node, without which the node will not be able to verify the validity of incoming transactions and blocks containing them.

In this regard, attempts are being made in every possible way to reduce the stored representation of this set, to compress it without losing security guarantees. The smaller the amount of stored data, the lower the disk space requirements of the validator node, which makes running the validator node cheap, allows you to grow the network and thereby increase the stability of the network.

In this note, we'll file a Rust prototype of a recent proposal from a co-author Lightning Network Paper, Thaddeus Dryja β€” Utreexo: a dynamic hash-based accumulator optimized for the Bitcoin UTXO set, which reduces the disk space requirements for validator nodes.

What's the problem?

One of the eternal problems of Bitcoin was its scalability. The idea of ​​being a bank of your own requires network participants to keep a record of all funds available for use. In Bitcoin, available funds are expressed as a set of unspent outputs - UTXO-set. Although not a very intuitive representation, it is advantageous from an implementation performance point of view, compared to the representation in which each "wallet" has a "balance" in the form of a separate entry, and also adds privacy (for example, provides work the coinjo).

It is important to distinguish between the history of transactions (what is called a blockchain) and the current state of the system. The Bitcoin transaction history currently occupies about 200 GB of disk space, and continues to grow. However, the state of the system is much smaller, on the order of 4 GB, and only takes into account the fact that someone currently owns the coins. The volume of these data also increases over time, but at a much slower rate and sometimes even tends to decrease (see KDPV).

Light clients (SPV) trade security guarantees for the ability to store no minimum state (UTXO-set) other than private keys.

UTXO and UTXO-set

UTXO (Unspent Transaction Output) - unspent transaction output, the end point of the journey of each Satoshi transferred in transactions. Unspent outputs become inputs for new transactions and are spent (spend) and removed from the UTXO-set.

New UTXOs are always created by transactions:

  • coinbase transactions with no inputs: create new UTXOs as miners issue coins
  • normal transactions: create new UTXOs while spending some set of existing UTXOs

The process of working with UTXO:
Utreexo: compressing lots of Bitcoin UTXOs

Wallets count the number of coins available for spending (balance) based on the amount of UTXOs available for that wallet to spend.

Each validator node, in order to prevent double spend attempts, must keep track of the set all UTXO when checking each transactions each block.

The node must have logic:

  • Additions to UTXO-set
  • Removal from UTXO-set
  • Checks for the presence of a single UTXO in the set

There are ways to reduce the requirements for stored information about the set, while retaining the ability to add and remove elements, check and prove the existence of an element in the set using cryptographic accumulators.

Batteries for UTXO

The idea of ​​using batteries to store many UTXOs was discussed earlier.

UTXO-set is built on the fly, during the initial block download (IBD, initial block download), is stored in full and permanently, while its content changes after processing transactions from each new and correct network block. This process requires downloading approximately 200 GB of block data and verifying hundreds of millions of digital signatures. After the IBD process is completed, the bottom line is that the UTXO-set will take up about 4 GB.

However, with accumulators, the funds consensus rules are reduced to verification and generation of cryptographic proofs, and the burden of tracking available funds is shifted to the owner of these funds, who provides proof of their existence and ownership.

An accumulator can be called a compact representation of a set. The size of the stored representation, in this case, must be either constant Utreexo: compressing lots of Bitcoin UTXOs, or increase sublinearly with respect to the cardinality of the set and the size of the element itself, for example Utreexo: compressing lots of Bitcoin UTXOs, where n is the cardinality of the stored set.

At the same time, the accumulator should allow generating a proof of the inclusion of an element in the set (inclusion proof) and make it possible to effectively check this proof.

The battery is called dynamic if allows you to add elements and remove elements from the set.

An example of such a battery is RSA battery proposed by Boneh, Bunz, Fisch in December 2018. Such an accumulator has a constant size of the stored representation, but requires shared secret (trusted setup). This requirement negates the applicability of such an accumulator for trustless networks like Bitcoin, since data leakage during secret generation can allow attackers to create fake evidence of the existence of UTXO by deceiving nodes with a UTXO-set based on such an accumulator.

Utreexo

The Utreexo design proposed by Thaddeus Dryja makes it possible to create dynamic battery without trusted setup.

Utreexo is a forest of perfect binaries Merkle trees and is a development of the ideas presented in Efficient asynchronous accumulators for distributed pki, adding the ability to remove elements from a set.

The logical structure of the accumulator

The elements of the accumulator are arranged in the form of a forest of ideal binary trees. Trees are ordered by height. This representation is chosen as the most visual and allows you to visualize the merging of trees in the course of operations on the accumulator.

The author notes that since all trees in the forest are ideal, their height can be expressed as a power of two, just as any natural number can be represented as a sum of powers of two. Accordingly, any set of sheets can be grouped as binary trees, and in all cases, adding a new element requires knowledge only about root nodes of stored trees.

Thus, the stored representation of the Utreexo accumulator is a list of root nodes (Merkle root), not the entire forest of trees.

Let's represent the list of root elements as Vec<Option<Hash>>. Optional type Option<Hash> says that the root element may be absent, which means that there is no tree with the corresponding height in the accumulator.

/// SHA-256 Ρ…Π΅Ρˆ
#[derive(Copy, Clone, Hash, Eq, PartialEq)]
pub struct Hash(pub [u8; 32]);

#[derive(Debug, Clone)]
pub struct Utreexo {
    pub roots: Vec<Option<Hash>>,
}

impl Utreexo {
    pub fn new(capacity: usize) -> Self {
        Utreexo {
            roots: vec![None; capacity],
        }
    }
}

Adding items

First, let's describe the function parent(), which recognizes the parent node for the two given elements.

parent() function

Since we are using Merkle trees, the parent of each of the two nodes is one node that stores the hash of the concatenation of the hashes of the child nodes:

fn hash(bytes: &[u8]) -> Hash {
    let mut sha = Sha256::new();
    sha.input(bytes);
    let res = sha.result();
    let mut res_bytes = [0u8; 32];
    res_bytes.copy_from_slice(res.as_slice());

    Hash(res_bytes)
}

fn parent(left: &Hash, right: &Hash) -> Hash {
    let concat = left
        .0
        .into_iter()
        .chain(right.0.into_iter())
        .map(|b| *b)
        .collect::<Vec<_>>();

    hash(&concat[..])
}

The author notes that in order to prevent the attacks described by Charles Bouillaguet, Pierre-Alain Fouque, Adi Shamir, and Sebastien Zimmer in
Second preimage attacks on dithered hash functions, in addition to two hashes, the height inside the tree should also be added to the concatenation.

As you add items to the accumulator, you need to keep track of which root items get changed. By following the path of changing the root elements for each element that is added, a proof of the presence of these elements can be constructed later.

Track changes as you add

To track the changes made, we will declare the structure Update, which will store data about node changes.

#[derive(Debug)]
pub struct Update<'a> {
    pub utreexo: &'a mut Utreexo,
    // ProofStep Ρ…Ρ€Π°Π½ΠΈΡ‚ "сосСда" элСмСнта ΠΈ Π΅Π³ΠΎ ΠΏΠΎΠ»ΠΎΠΆΠ΅Π½ΠΈΠ΅
    pub updated: HashMap<Hash, ProofStep>,
}

To add an element to the accumulator, you need:

  • Create an array of buckets of root elements new_roots and put the existing root elements there, one for each bucket:

Code

let mut new_roots = Vec::new();

for root in self.roots.iter() {
    let mut vec = Vec::<Hash>::new();
    if let Some(hash) = root {
        vec.push(*hash);
    }

    new_roots.push(vec);
}

  • Append the added elements (array insertions) to the first basket new_roots[0]:

Utreexo: compressing lots of Bitcoin UTXOs

Code

new_roots[0].extend_from_slice(insertions);

  • Coalescing the elements added to the first basket with the rest:
    • For all carts with more than one item:
      1. We take two elements from the end of the basket, calculate their parent, remove both elements
      2. Add computed parent to next bucket

Utreexo: compressing lots of Bitcoin UTXOs

Code

for i in 0..new_roots.len() {
    while new_roots[i].len() > 1 {
        // ОбъСдиняСм Π΄Π²Π° элСмСнта Π² ΠΎΠ΄ΠΈΠ½ ΠΈ удаляСм ΠΈΡ…
        let a = new_roots[i][new_roots[i].len() - 2];
        let b = new_roots[i][new_roots[i].len() - 1];
        new_roots[i].pop();
        new_roots[i].pop();
        let hash = self.parent(&a, &b);

        // НаращиваСм количСство ΠΊΠΎΡ€Π·ΠΈΠ½ Ссли трСбуСтся
        if new_roots.len() <= i + 1 {
            new_roots.push(vec![]);
        }

        // ΠŸΠΎΠΌΠ΅Ρ‰Π°Π΅ΠΌ элСмСнт Π² ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΡƒΡŽ ΠΊΠΎΡ€Π·ΠΈΠ½Ρƒ
        new_roots[i + 1].push(hash);

        // НС Π·Π°Π±Ρ‹Π²Π°Π΅ΠΌ ΠΎΡ‚ΡΠ»Π΅ΠΆΠΈΠ²Π°Ρ‚ΡŒ измСнСния;
        // это пригодится для Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ Π΄ΠΎΠΊΠ°Π·Π°Ρ‚Π΅Π»ΡŒΡΡ‚Π²Π° добавлСния элСмСнтов
        updated.insert(a, ProofStep { hash: b, is_left: false });
        updated.insert(b, ProofStep {hash: a, is_left: true });
    }
}

  • Move root elements from bins to resulting accumulator array

Code

for (i, bucket) in new_roots.into_iter().enumerate() {
    // НаращиваСм аккумулятор Ссли трСбуСтся
    if self.roots.len() <= i {
        self.roots.push(None);
    }

    if bucket.is_empty() {
        self.roots[i] = None;
    } else {
        self.roots[i] = Some(bucket[0]);
    }
}

Creating evidence for added elements

Evidence of the inclusion of the element in the accumulator (Proof) will serve as the Merkle Path, consisting of the chain ProofStep. If the path leads nowhere, then the proof is wrong.

/// Π•Π΄ΠΈΠ½ΠΈΡ‡Π½Ρ‹ΠΉ шаг Π½Π° ΠΏΡƒΡ‚ΠΈ ΠΊ элСмСнту Π² Π΄Π΅Ρ€Π΅Π²Π΅ ΠœΠ΅Ρ€ΠΊΠ»Π°.
#[derive(Debug, Copy, Clone)]
pub struct ProofStep {
    pub hash: Hash,
    pub is_left: bool,
}

/// Π”ΠΎΠΊΠ°Π·Π°Ρ‚Π΅Π»ΡŒΡΡ‚Π²ΠΎ Π²ΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΡ элСмСнта. Π‘ΠΎΠ΄Π΅Ρ€ΠΆΠΈΡ‚ сам элСмСнт ΠΈ ΠΏΡƒΡ‚ΡŒ ΠΊ Π½Π΅ΠΌΡƒ.
#[derive(Debug, Clone)]
pub struct Proof {
    pub steps: Vec<ProofStep>,
    pub leaf: Hash,
}

Using the information obtained earlier during the addition of the element (structure Update), you can create proof that the element was added to the accumulator. To do this, we bypass the table of changes made and add each step to the Merkle path, which will later serve as proof:

Code

impl<'a> Update<'a> {
    pub fn prove(&self, leaf: &Hash) -> Proof {
        let mut proof = Proof {
            steps: vec![],
            leaf: *leaf,
        };

        let mut item = *leaf;
        while let Some(s) = self.updated.get(&item) {
            proof.steps.push(*s);
            item = parent(&item, &s);
        }

        proof
    }
}

Proof creation process

Utreexo: compressing lots of Bitcoin UTXOs

Checking the proof for an element

Checking the proof of element inclusion (inclusion proof) is reduced to following the Merkle path until it leads to an existing root element:

pub fn verify(&self, proof: &Proof) -> bool {
    let n = proof.steps.len();
    if n >= self.roots.len() {
        return false;
    }

    let expected = self.roots[n];
    if let Some(expected) = expected {
        let mut current_parent = proof.leaf;
        for s in proof.steps.iter() {
            current_parent = if s.is_left {
                parent(&s.hash, &current_parent)
            } else {
                parent(&current_parent, &s.hash)
            };
        }

        current_parent == expected
    } else {
        false
    }
}

Clearly:

Proof verification process for A

Utreexo: compressing lots of Bitcoin UTXOs

Removing elements

Removing an element from an accumulator requires providing valid evidence that the element is there. Using the data from the proof, it is possible to compute new root elements of the accumulator for which the given proof is no longer true.

The algorithm is as follows:

  1. As with adding, we organize a set of empty baskets corresponding to Merkle trees with a height equal to the power of two from the index of the basket
  2. We insert elements from the steps of the Merkle path into the baskets; the index of the basket is equal to the number of the current step
  3. Remove the root element that the path from the proof leads to
  4. As with adding, we calculate new root elements by combining elements from buckets in pairs and moving the result of the union to the next bucket

Code

fn delete(&self, proof: &Proof, new_roots: &mut Vec<Vec<Hash>>) -> Result<(), ()> {
    if self.roots.len() < proof.steps.len() || self.roots.get(proof.steps.len()).is_none() {
        return Err(());
    }

    let mut height = 0;
    let mut hash = proof.leaf;
    let mut s;

    loop {
        if height < new_roots.len() {
            let (index, ok) = self.find_root(&hash, &new_roots[height]);
            if ok {
                // Remove hash from new_roots
                new_roots[height].remove(index);

                loop {
                    if height >= proof.steps.len() {
                        if !self.roots[height]
                            .and_then(|h| Some(h == hash))
                            .unwrap_or(false)
                        {
                            return Err(());
                        }

                        return Ok(());
                    }

                    s = proof.steps[height];
                    hash = self.parent(&hash, &s);
                    height += 1;
                }
            }
        }

        if height >= proof.steps.len() {
            return Err(());
        }

        while height > new_roots.len() {
            new_roots.push(vec![]);
        }

        s = proof.steps[height];
        new_roots[height].push(s.hash);
        hash = self.parent(&hash, &s);
        height += 1;
    }
}

The process of removing element "A":
Utreexo: compressing lots of Bitcoin UTXOs

Integration into an existing network

By using the proposed accumulator, nodes can choose not to use a database to store all UTXOs while still being able to change the UTXO-set. However, there is a problem with the evidence.

Let's call the validator node that uses the accumulator UTXO compact (compact-state node) and the validator without accumulator is complete (full node). The existence of two classes of nodes creates the problem of integrating them into a single network, since compact nodes require proof of the existence of UTXOs that are spent in transactions, while full nodes do not. If all nodes in the network do not switch to Utreexo at the same time and in a coordinated manner, then compact nodes will be left behind and will not be able to operate on the Bitcoin network.

To solve the problem of integrating compact nodes into the network, it is proposed to introduce an additional class of nodes βˆ’ bridges. A bridge node is a complete node that also stores the Utreexo battery and power-on proofs for all UTXO from UTXO-set. Bridges compute new hashes and update the accumulator and proofs as new transaction blocks arrive. Maintaining and updating the accumulator and evidence does not impose additional computational burden on such nodes. Bridges sacrifice disk space: need to keep order Utreexo: compressing lots of Bitcoin UTXOs hashes compared to Utreexo: compressing lots of Bitcoin UTXOs hashes for compact nodes, where n is the cardinality of the UTXO set.

Network architecture

Utreexo: compressing lots of Bitcoin UTXOs

Bridges make it possible to gradually add compact nodes to the network without changing the software of existing nodes. Full nodes work as before, propagating transactions and blocks among themselves. Bridge nodes are full nodes that additionally store Utreexo accumulator data and a set of inclusion proofs for all UTXO at the moment. The bridge node does not advertise itself as such, pretending to be a full node for all full nodes and a compact node for all compact ones. Although bridges connect both networks together, they really only need to be connected in one direction: from existing full nodes to compact nodes. This is possible because the transaction format does not need to be changed, and the evidence for UTXO for compact nodes can be dropped, so any compact node can just as well broadcast transactions to all network members without the participation of bridge nodes.

Conclusion

We took a look at the Utreexo accumulator and implemented its prototype in Rust. We considered the network architecture, which will allow the integration of battery-based nodes. The advantage of compact catches is the size of the stored data, which depends logarithmically on the cardinality of the UTXO set, which greatly reduces the disk space and storage performance requirements for such nodes. The downside is the additional node traffic to pass evidence, but evidence aggregation (when one proof proves the existence of multiple elements) and caching techniques can help keep the traffic within acceptable limits.

references:

Source: habr.com

Add a comment