Globals are treasure swords for storing data. Trees. Part 2

Globals are treasure swords for storing data. Trees. Part 2Beginning - see part 1.

3. Variants of structures when using globals

A structure such as an ordered tree has various special cases. Let's consider those that have practical value when working with globals.

3.1 Special case 1. One node without branches


Globals are treasure swords for storing data. Trees. Part 2Globals can be used not only like an array, but also like ordinary variables. For example, as a counter:

Set ^counter = 0  ; установка счётчика
Set id=$Increment(^counter) ;  атомарное инкрементирование

In this case, the global, in addition to the value, can also have branches. One does not exclude the other.

3.2 Special case 2. One node and many branches

In general, this is a classic key-value base. And if we store a tuple of values ​​as a value, we get the most ordinary table with a primary key.

Globals are treasure swords for storing data. Trees. Part 2

To implement a table on globals, we will have to form rows ourselves from the values ​​of the columns, and then save them to the global by the primary key. To make it possible to split the line again into columns when reading, you can use:

  1. separator characters.
    Set ^t(id1) = "col11/col21/col31"
    Set ^t(id2) = "col12/col22/col32"
  2. a rigid scheme, according to which each field occupies a predetermined number of bytes. As well as becomes in relational DB.
  3. a special $LB function (available in Cache) that makes up a string of values.
    Set ^t(id1) = $LB("col11", "col21", "col31")
    Set ^t(id2) = $LB("col12", "col22", "col32")

Interestingly, it is not difficult to do something similar to secondary indexes in relational databases on globals. Let's call such structures index globals. An index global is an auxiliary tree for quickly searching fields that are not part of the main global's primary key. To fill it and use it, you need to write additional code.

Let's create an index global on the first column.

Set ^i("col11", id1) = 1
Set ^i("col12", id2) = 1

Now, for a quick search for information on the first column, we have to look into the global ^i and find the primary keys (id) corresponding to the desired value of the first column.

When inserting a value, we can immediately create both the value and index globals for the required fields. And for reliability, let's wrap it all in a transaction.

TSTART
Set ^t(id1) = $LB("col11", "col21", "col31")
Set ^i("col11", id1) = 1
TCOMMIT

Details how to do on M tables on globals, emulation of secondary indexes.

Such tables will work as fast as in traditional databases (or even faster) if the functions of inserting/updating/deleting rows are written in COS/M and compiled.I checked this statement with tests for bulk INSERT and SELECT into one two-column table, including using the TSTART and TCOMMIT commands (transactions).

I have not tested more complex scenarios with concurrent access and parallel transactions.

Without the use of transactions, the insert rate was at a million values ​​of 778 inserts/second.
At 300 million values, 422 inserts/second.

When using transactions - 572 inserts / second for 082M inserts. All operations were performed from compiled M-code.
Hard drives are normal, not SSD. RAID5 with Write-back. Phenom II 1100T processor.

For similar testing of a SQL database, you need to write a stored procedure that will do inserts in a loop. When testing MySQL 5.5 (InnoDB storage), using this method, I got numbers no more than 11K inserts per second.
Yes, the implementation of tables on globals looks more complicated than in relational databases. Therefore, industrial databases on globals have SQL access to simplify work with tabular data.

Globals are treasure swords for storing data. Trees. Part 2In general, if the data schema does not change often, the insertion speed is not critical, and the entire database can be easily represented as normalized tables, then it is easier to work with SQL, since it provides a higher level of abstraction.

Globals are treasure swords for storing data. Trees. Part 2In this particular case, I wanted to show that globals can act as a constructor to create other databases. Like assembler in which other languages ​​can be written. And here are examples of how you can create analogues on globals key-value, lists, sets, tabular, document-oriented databases.

If you need to create some kind of non-standard database with minimal effort, then you should look towards globals.

3.3 Special case 3. Two-level tree, each node of the second level has a fixed number of branches

Globals are treasure swords for storing data. Trees. Part 2You probably guessed it: this is an alternative implementation of tables on globals. Let's compare this implementation with the previous one.

Tables on a two-level tree vs. on a single level tree.

Cons
pros

  1. Slower for insertion, since you need to set the number of nodes equal to the number of columns.
  2. More disk space consumption. Since global indexes (in the sense of array indexes) with column names take up disk space and are duplicated for each row.

  1. Faster access to the values ​​of individual columns, since there is no need to parse the string. 11,5% faster in my tests on 2 columns and more on more columns.
  2. Easier to change schema
  3. Clearer code

Conclusion: for an amateur. Since speed is one of the key benefits of globals, it makes little sense to use this implementation, as it is likely to be no faster than tables in relational databases.

3.4 General case. Trees and ordered trees

Any data structure that can be represented as a tree lends itself well to globals.

3.4.1 Objects with subobjects

Globals are treasure swords for storing data. Trees. Part 2

This is the area of ​​traditional application of globals. In the medical field, there is a huge number of diseases, drugs, symptoms, treatments. It is irrational to create a table with a million fields for each patient. Moreover, 99% of the fields will be empty.

Imagine a SQL database of tables: "patient" ~ 100 fields, "Medicine" - 000 fields, "Therapy" - 100 fields, "Complications" - 000 fields, etc. and so on. Or you can create a database of many thousands of tables, each for a certain type of patient (and they can overlap!), treatments, medicines, and thousands more tables for relationships between these tables.

Globals are ideal for medicine, as they allow you to create for each patient an accurate description of his medical history, various therapies, drug effects, in the form of a tree, without wasting extra disk space on empty columns, as it would be in the relational case.

Globals are treasure swords for storing data. Trees. Part 2It is convenient to create a database with data about people on globalswhen it is important to accumulate and systematize the maximum of various information about the client. This is in demand in medicine, banking, marketing, archives and other areas.

.
Of course, in SQL it is also possible to emulate a tree with just a few tables (EAV, 1,2,3,4,5,6,7,8,9,10), but this is much more complicated and will be slower. In fact, one would have to write a global working on tables and hide all work with tables under an abstraction layer. It is wrong to emulate a lower level technology (globals) by means of a higher level one (SQL). Inappropriate.

It's no secret that changing the data schema on giant tables (ALTER TABLE) can take a decent amount of time. MySQL, for example, makes ALTER TABLE ADD|DROP COLUMN a full copy of information from an old table to a new table (tested MyISAM, InnoDB engines). What can hang up a working database with billions of records for days, if not weeks.

Globals are treasure swords for storing data. Trees. Part 2Changing the data structure, if we use globals, does not cost us anything. At any time, we can add any new properties we need to any object, at any level of the hierarchy. Branch renaming changes can be run in the background on a running database.


Therefore, when it comes to storing objects with a huge number of optional properties, globals are a great choice.

And, let me remind you, access to any of the properties is instant, since in the global all paths are a B-tree.

Databases on globals, in the general case, are a kind of document-oriented databases with the ability to store hierarchical information. Therefore, in the field of storing medical records, document-oriented databases can compete with globals. But still it's not reallyTake for comparison, for example, MongoDB. In this domain it loses to globals for the following reasons:

  1. Document size. The storage unit is text in JSON format (more precisely BSON) with a maximum size of about 16MB. The restriction is made on purpose so that the JSON database does not slow down when parsing if a huge JSON document is stored in it, and then they access it by fields. This document should contain all information about the patient. We all know how thick patient charts can be. The maximum map size of 16MB immediately puts an end to patients whose disease map includes MRI files, x-ray scans and other studies. In one branch of the global, you can have information on gigabytes and terabytes. In principle, this can be put an end to, but I will continue.
  2. The time of consciousness / change / removal of new properties in the patient's chart. Such a database should read the entire map into memory (this is a large amount!), parse BSON, add / change / delete a new node, update indexes, pack in BSON, save to disk. The global only needs to refer to a specific property and manipulate it.
  3. Quick access to individual properties. With many properties in the document and its multi-level structure, access to individual properties will be faster due to the fact that each path in the global is a B-tree. In BSON, you have to linearly parse the document to find the desired property.

3.3.2 Associative arrays

Associative arrays (even with nested arrays) fit nicely with globals. For example, such an array from PHP will be displayed in the first picture 3.3.1.

$a = array(
  "name" => "Vince Medvedev",
  "city" => "Moscow",
  "threatments" => array(
    "surgeries" => array("apedicectomy", "biopsy"),
    "radiation" => array("gamma", "x-rays"),
    "physiotherapy" => array("knee", "shoulder")
  )
);

3.3.3 Hierarchical documents: XML, JSON

Also easily stored in globals. For storage, it can be decomposed in different ways.

XML
The easiest way to decompose XML into globals is when we store tag attributes in nodes. And if you need quick access to tag attributes, then we can move them to separate branches.

Globals are treasure swords for storing data. Trees. Part 2

<note id=5>
<to>Вася</to>
<from>Света</from>
<heading>Напоминание</heading>
<body>Позвони мне завтра!</body>
</note>

On COS, this will correspond to the code:

Set ^xml("note")="id=5"
Set ^xml("note","to")="Саша"
Set ^xml("note","from")="Света"
Set ^xml("note","heading")="Напоминание"
Set ^xml("note","body")="Позвони мне завтра!"

Note: For XML, JSON, associative arrays, you can come up with many different ways to display on globals. In this case, we have not reflected the order of nested tags in the note tag. In the global ^xml nested tags will be displayed in alphabetical order. To strictly reflect the order, you can use, for example, such a mapping:

Globals are treasure swords for storing data. Trees. Part 2
JSON.
The first picture from section 3.3.1 shows a reflection of this JSON document:

var document = {
  "name": "Vince Medvedev",
  "city": "Moscow",
  "threatments": {
    "surgeries": ["apedicectomy", "biopsy"],
    "radiation": ["gamma", "x-rays"],
    "physiotherapy": ["knee", "shoulder"]
  },
};

3.3.4 Identical structures connected by hierarchical relationships

Examples: the structure of sales offices, the location of people in the MLM structure, the base of openings in chess.

Debut base. You can use an estimate of the strength of the move as the value of the index of the global node. Then, to choose the strongest move, it will be enough to choose the branch with the highest weight. In the global, all branches at each level will be sorted by the strength of the move.

Globals are treasure swords for storing data. Trees. Part 2

The structure of sales offices, the structure of people in MLM. Nodes can store some caching values ​​that reflect the characteristics of the entire subtree. For example, the sales volume of a given subtree. At any time, we can get a figure reflecting the achievements of any branch.

Globals are treasure swords for storing data. Trees. Part 2

4. When is it most beneficial to use globals

The first column shows cases where you will get a significant speed gain when using globals, and the second when the development or data model is simplified.

Speed
Ease of data processing/presentation

  1. Insert [with automatic sorting at each level], [indexed by master key]
  2. Removing Subtrees
  3. Objects with a lot of nested properties that need individual access
  4. Hierarchical structure with the ability to bypass child branches from any, even non-existent
  5. Traversal of subtrees in depth
  1. Objects/entities with a huge number of optional [and/or nested] properties/entities
  2. Schema-less data. When new properties can often appear and old ones disappear.
  3. You need to create a custom database.
  4. Path bases and decision trees. When it is convenient to represent paths as a tree.
  5. Removing hierarchical structures without using recursion

Extension “Globals are treasure swords for storing data. Sparse arrays. Part 3".

Disclaimer: This article and my comments on it are my opinion and do not represent the official position of InterSystems Corporation.

Source: habr.com

Add a comment