Indexed binary tree

Indexed binary tree

I got the following task. It is necessary to implement a data storage container that provides the following functionality:

  • insert new element
  • delete element by number
  • get element by index number
  • data is stored in sorted form

Data is constantly added and removed, the structure must provide a fast speed. At first I tried to implement such a thing using standard containers from std. This path was not successful and the understanding came that you need to implement something yourself. The only thing that came to mind is to use a binary search tree. Since it meets the requirement of fast insertion, removal and storage of data in sorted form. It remains only to figure out how to index all the elements and recalculate the indices when the tree changes.

struct node_s {    
    data_t data;

    uint64_t weight; // вСс ΡƒΠ·Π»Π°

    node_t *left;
    node_t *right;

    node_t *parent;
};

The article will have more pictures and theory than code. The code can be viewed at the link below.

The weight

For this, the tree has undergone a slight modification, additional information has been added about weight node. The knot weight is number of children of this node + 1 (weight of a single element).

Function to get node weight:

uint64_t bntree::get_child_weight(node_t *node) {
    if (node) {
        return node->weight;
    }

    return 0;
}

The sheet, respectively, has a weight equal to 0.

Next, let's move on to a visual representation of an example of such a tree. Black the node key will be shown in color in it (the value will not be shown, because there is no need for this), red - node weight, green - node index.

When the tree is empty, then its weight is 0. Let's add a root element to it:

Indexed binary tree

The weight of the tree becomes 1, the weight of the root element becomes 1. The weight of the root element is the weight of the tree.

Let's add a few more elements:

Indexed binary tree
Indexed binary tree
Indexed binary tree
Indexed binary tree

Every time a new element is added, we go down the nodes and increase the weight counter of each passed node. When a new node is created, it is assigned a weight 1. If a node with such a key already exists, then we will overwrite the value and go back to the root up, canceling the weight changes for all nodes that we passed.
If a node is being removed, then we go down and decrement the weights of the passed nodes.

Indexes

Now let's move on to how to index nodes. Nodes do not explicitly store their index, it is calculated based on the weight of the nodes. If they kept their index, then it would be required He) time to update the indexes of all nodes after each change to the tree.
Let's move on to visual representation. Our tree is empty, let's add the 1st node to it:

Indexed binary tree

The first node has an index 0, and now 2 cases are possible. In the first index of the root element will change, in the second it will not change.

Indexed binary tree

At the root, the left subtree has a weight of 1.

Second case:

Indexed binary tree

The index of the root has not changed because the weight of its left subtree remains 0.

How the index of a node is calculated is the weight of its left subtree + the number passed from the parent. What is this number?, This is the index counter, initially it is equal to 0, because the root has no parent. Then it all depends on where we go down to the left child or the right one. If to the left, then nothing is added to the counter. If to the right then add the index of the current node.

Indexed binary tree

For example, how the index of the element with key 8 (the right child of the root) is calculated. It is "Root index" + "weight of the left subtree of the node with key 8" + "1" == 3 + 2 + 1 == 6
The index of the element with key 6 will be "Root Index" + 1 == 3 + 1 == 4

Accordingly, to get, delete an element by index takes time O (log n), because in order to get the desired element, we must first find it (go down from the root to this element).

Depth

Based on the weight, you can also calculate the depth of the tree. needed for balancing.
To do this, the weight of the current node must be rounded up to the first number to the power of 2 that is greater than or exactly the given weight and take the binary logarithm from it. This way we get the depth of the tree, provided that it is balanced. The tree is balanced after a new element is inserted. I will not give a theory about how to balance trees. The source codes provide a balancing function.

Weight to depth conversion code.

/*
 * Π’ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅Ρ‚ ΠΏΠ΅Ρ€Π²ΠΎΠ΅ число Π² стСпСни 2, ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ΅ большС ΠΈΠ»ΠΈ Ρ€ΠΎΠ²Π½ΠΎ x
 */
uint64_t bntree::cpl2(uint64_t x) {
    x = x - 1;
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >> 16);
    x = x | (x >> 32);

    return x + 1;
}

/*
 * Π”Π²ΠΎΠΈΡ‡Π½Ρ‹ΠΉ Π»ΠΎΠ³Π°Ρ€ΠΈΡ„ΠΌ ΠΎΡ‚ числа
 */
long bntree::ilog2(long d) {
    int result;
    std::frexp(d, &result);
    return result - 1;
}

/*
 * ВСс ΠΊ Π³Π»ΡƒΠ±ΠΈΠ½Π΅
 */
uint64_t bntree::weight_to_depth(node_t *p) {
    if (p == NULL) {
        return 0;
    }

    if (p->weight == 1) {
        return 1;
    } else if (p->weight == 2) {
        return 2;
    }

    return this->ilog2(this->cpl2(p->weight));
}

Results

  • the insertion of a new element occurs after O (log n)
  • deleting an element by serial number occurs after O (log n)
  • retrieving an element by ordinal number occurs in O (log n)

speed O (log n) we pay for the fact that all data is stored in a sorted form.

Where such a structure can be useful - I do not know. Just a puzzle to figure out once again how trees work. Thank you for your attention.

references

The project contains test data to check the speed of work. The tree is filling up 1000000 elements. And there is a sequential removal, insertion and receipt of elements 1000000 once. That is 3000000 operations. The result was quite good ~ 8 seconds.

Source: habr.com

Add a comment