Globals are treasure swords for storing data. Trees. Part 1

Globals are treasure swords for storing data. Trees. Part 1 The real swords-treasurers of databases - globals - have long been known, but so far few people know how to use them effectively or do not own this superweapon at all.

If you use globals in solving those problems in which they are really good, then you can achieve outstanding results. Either in performance or in simplifying the solution of the problem (1, 2).

Globals are a special way of storing and manipulating data, completely different from tables in SQL. They appeared in 1966 in the language M(UMPS) (evolutionary development - Cache ObjectScript, hereinafter COS) in medical databases and still there actively used, and also penetrated into some other areas where reliability and high performance are required: finance, trading, etc.

Globals in modern DBMS support transactions, logging, replication, partitioning. Those. they can be used to build modern, reliable, distributed and fast systems.

Globals don't restrict you to the relational model. They give you the freedom to develop data structures that are optimized for specific tasks. For many applications, the judicious use of globals can be truly a secret weapon, delivering performance that relational application developers can only dream of.

Globals as a way to store data can be used in many modern programming languages, both high-level and low-level. Therefore, in this article, I will focus on globals, and not on the language from which they once came out.

2. How globals work

Let's first understand how globals work and what are their strengths. Globals can be viewed from different points of view. In this part of the article, we will look at them as trees. Or like hierarchical data stores.

Simply put, a global is a persistent array. An array that is automatically saved to disk.
It's hard to imagine something simpler for data storage. In code (in COS/M languages) it differs from a regular associative array only in the symbol ^ in front of the name.

To save data in the global, you do not need to learn the SQL query language, the commands for working with them are very simple. They can be learned in an hour.

Let's start with the simplest example. Single level tree with 2 branches. The examples are written in COS.

Globals are treasure swords for storing data. Trees. Part 1

Set ^a("+7926X") = "John Sidorov"
Set ^a("+7916Y") = "Sergey Smith"



When you insert information into a global (Set command), 3 things happen automatically:

  1. Saving data to disk.
  2. Indexing. What is in brackets is the key (in English literature - “subscript”), and to the right of equals is the value (“node value”).
  3. Sort. The data is sorted by key. In the future, when traversing the array, the first element will be “Sergey Smith”, and the second “John Sidorov”. When getting a list of users from the global, the database does not waste time sorting. Moreover, you can request the output of a sorted list, starting from any key, even non-existent (the output will start from the first real key that follows the non-existent one).

All of these operations are incredibly fast. On my home computer, I got values ​​up to 750 inserts/sec in a single process. On multi-core processors, values ​​can reach tens of millions inserts/sec.

Of course, the insertion speed itself says little. You can, for example, write information very quickly to text files - so rumored Visa processing works. But in the case of globals, we get a structured indexed storage at the output, with which we can easily and quickly work with in the future.

Globals are treasure swords for storing data. Trees. Part 1

  • The biggest strength of globals is the speed at which new nodes are inserted.
  • The data in the global is always indexed. Bypassing them both at the same level and deep into the tree is always fast.

Let's add a few more branches of the second and third levels to the global.

Set ^a("+7926X", "city") = "Moscow"
Set ^a("+7926X", "city", "street") = "Req Square"
Set ^a("+7926X", "age") = 25
Set ^a("+7916Y", "city") = "London"
Set ^a("+7916Y", "city", "street") = "Baker Street"
Set ^a("+7916Y", "age") = 36

Globals are treasure swords for storing data. Trees. Part 1

Obviously, multilevel trees can be built on the basis of globals. Moreover, access to any node is almost instantaneous due to auto-indexing upon insertion. And at any level of the tree, all branches are sorted by key.

As you can see, information can be stored both in the key and in the value. The total key length (the sum of the lengths of all indexes) can reach 511 bytes, and the values 3.6 MB for Cache. The number of levels in the tree (the number of dimensions) is 31.

Another interesting point. It is possible to build a tree without specifying the values ​​of the nodes of the upper levels.

Globals are treasure swords for storing data. Trees. Part 1

Set ^b("a", "b", "c", "d") = 1
Set ^b("a", "b", "c", "e") = 2
Set ^b("a", "b", "f", "g") = 3

Empty circles are nodes that have not been assigned a value.

In order to better understand globals, let's compare them with other trees: garden trees and file system name trees.

Let's compare trees on globals with the most familiar hierarchical structures: with ordinary trees that grow in gardens and fields, as well as with file systems.

Globals are treasure swords for storing data. Trees. Part 1

As we see in garden trees, leaves and fruits are found only at the ends of the branches.
Filesystems - information is stored only at the ends of branches, which are full filenames.

And here is the global data structure.

Globals are treasure swords for storing data. Trees. Part 1Differences:

  1. Internal nodes: information in the global can be stored in each node, not just at the ends of branches.
  2. External nodes: the global must necessarily have values ​​at the ends of the branches, while the FS and garden trees do not.



In terms of internal nodes, we can say that the structure of the global is a superset of the structure of name trees in file systems and garden trees. Those. more flexible.

In general, the global is an ordered tree with the ability to store data in each node.

To better understand how globals work, imagine what would happen if the creators of file systems used an approach similar to globals to store information?

  1. Deleting a single file in a directory would automatically delete the directory, as well as all directories above that contained only the one just deleted directory.
  2. The need for directories would disappear. There would simply be files with subfiles and files without subfiles. Compared to an ordinary tree, each branch would become a fruit.

    Globals are treasure swords for storing data. Trees. Part 1

  3. Things like README.txt files would probably be gone. Everything that needed to be said about the contents of a directory could be written into the directory file itself. In path space, a file name is indistinguishable from a directory name, so you could get by with just files.
  4. The speed of deleting directories with nested subdirectories and files would increase dramatically. Many times on Habré, articles slipped about how long and difficult it is to delete millions of small files (1, 2). However, if you make a pseudo-file system on the global, then it will take seconds or fractions of them. When I tested removing subtrees on my home computer, 1-96 million nodes were removed from a two-tiered tree on HDD (not SSD) in 341 second. And we are talking about deleting a part of the tree, and not just the entire file with globals.

Globals are treasure swords for storing data. Trees. Part 1
Removing subtrees is another strong point of globals. This does not require recursion. It happens incredibly fast.

In our tree, this could be done with the command Kill.

Kill ^a("+7926X")

Globals are treasure swords for storing data. Trees. Part 1

For a better understanding of what actions are available to us on globals, I will give a brief table.

Basic commands and functions for working with globals in COS

Set
Setting branches up to a node (if not already defined) and node values

Go
Subtree Copy

Kill
Removing a subtree

ZKill
Removing the value of a specific node. The subtree leaving the node is not touched

$query
Full tree bypass with going deep

$Order
Traverse the branches of a specific node

$Data
Checking if a node is defined

$increment
Atomically incrementing a node's value. Not to do reads and writes for ACID. Recently, it is recommended to change to $Sequence

Thank you for your attention, we are ready to answer your questions.

Disclaimer: This article and my comments on it are my opinion and do not represent the official position of InterSystems Corporation.

Extension Globals are treasure swords for storing data. Trees. Part 2. You will learn what types of data can be displayed on globals and on what tasks they give the maximum benefit.

Source: habr.com

Add a comment