How Relational Databases Work (Part 1)

Hey Habr! I present to your attention the translation of the article
"How does a relational database work".

When it comes to relational databases, I can't help but think that something is missing. They are used everywhere. There are many different databases, from the small and useful SQLite to the powerful Teradata. But there are only a few articles that explain how the database works. You can search for "howdoesarelationaldatabasework" yourself to see how few results there are. Moreover, these articles are short. If you're looking for the latest technologies (BigData, NoSQL or JavaScript), you'll find more in-depth articles explaining how they work.

Are relational databases too old and too boring to be explained outside of university courses, research papers, and books?

How Relational Databases Work (Part 1)

As a developer, I hate using things I don't understand. And if databases have been in use for more than 40 years, there must be a reason. I have spent hundreds of hours over the years to truly understand these weird black boxes that I use every day. Relational databases very interesting because they based on useful and reusable concepts. If you are interested in understanding a database but have never had the time or inclination to dig into this broad topic, you should enjoy this article.

Although the title of this article is explicit, the purpose of this article is not to understand how to use the database. Consequently, you should already know how to write a simple join request and basic queries CRUEL; otherwise you may not understand this article. This is the only thing you need to know, I will explain the rest.

I'll start with some basics of computer science, such as the time complexity of algorithms (BigO). I know some of you hate this concept, but without it you won't be able to understand the intricacies inside a database. Since this is a huge topic, I will focus on what i think is important: how the database handles SQL request. I will only present basic database conceptsso that at the end of the article you have an idea of ​​\uXNUMXb\uXNUMXbwhat is happening under the hood.

Since this is a long and technical article that includes many algorithms and data structures, take your time to read it. Some concepts may be difficult to understand; you can skip them and still get the general idea.

For the more knowledgeable among you, this article is divided into 3 parts:

  • Overview of low-level and high-level database components
  • Overview of the query optimization process
  • Overview of transaction and buffer pool management

Back to basics

Many years ago (in a galaxy far, far away…), developers had to know exactly the number of operations they were coding. They knew their algorithms and data structures by heart because they couldn't afford to waste the CPU and memory of their slow computers.

In this part, I will remind you of some of these concepts as they are essential to understanding a database. I will also introduce the notion database index.

O(1) vs O(n2)

Nowadays, many developers don't care about the time complexity of algorithms... and they are right!

But when you're dealing with a lot of data (I'm not talking thousands) or if you're fighting for milliseconds, it becomes critical to understand this concept. And as you can imagine, databases have to deal with both situations! I won't make you spend more time than necessary to get the gist of it. This will help us understand the concept of cost-based optimization later (cost based optimization).

Concept

Time complexity of the algorithm used to see how long an algorithm will take to execute for a given amount of data. To describe this complexity, the big O mathematical notation is used. This notation is used with a function that describes how many operations an algorithm needs for a given amount of input.

For example, when I say "this algorithm has O(some_function() ) complexity", it means that the algorithm needs some_function(a_certain_amount_of_data) operations to process a certain amount of data.

In this case, it is not the amount of data that matters**, and then ** how the number of operations increases with the increase in the amount of data. Time complexity doesn't give an exact number of operations, but it's a good way to estimate execution time.

How Relational Databases Work (Part 1)

On this graph, you can see the dependence of the number of operations on the amount of input data for various types of time complexity of the algorithms. I used a logarithmic scale to display them. In other words, the amount of data is rapidly increasing from 1 billion to 1 billion. We can see that:

  • O(1) or constant complexity stays constant (otherwise it wouldn't be called constant complexity).
  • O(log(n)) remains low even with billions of data.
  • Worst Difficulty - O(n2) where the number of operations grows rapidly.
  • Two other complexities increase just as quickly.

Examples

With a small amount of data, the difference between O(1) and O(n2) is negligible. For example, let's say you have an algorithm that needs to process 2000 items.

  • O(1) algorithm will cost you 1 operation
  • O(log(n)) algorithm will cost you 7 operations
  • The O(n) algorithm will cost you 2 operations
  • O(n*log(n)) algorithm will cost you 14 operations
  • O(n2) algorithm will cost you 4 operations

The difference between O(1) and O(n2) seems big (4 million operations) but you'll lose at most 2ms, just time to blink your eyes. Indeed, modern processors can handle hundreds of millions of operations per second. This is why performance and optimization is not an issue in many IT projects.

As I said, it's still important to be aware of this concept when dealing with huge amounts of data. If this time the algorithm has to process 1 items (which is not that much for a database):

  • O(1) algorithm will cost you 1 operation
  • O(log(n)) algorithm will cost you 14 operations
  • O(n) algorithm will cost you 1 operations
  • The O(n*log(n)) algorithm will cost you 14 operations
  • The O(n2) algorithm will cost you 1 operations

I didn't do the calculations, but I would say that with an O(n2) algorithm, you have coffee time (two even!). If you add another 0 to the amount of data, you will have time to take a nap.

Going deeper

For your information:

  • A good hash table lookup finds an element in O(1).
  • Searching a well-balanced tree yields a result in O(log(n)).
  • Searching in an array gives a result in O(n).
  • The best sorting algorithms have complexity O(n*log(n)).
  • A bad sorting algorithm has O(n2) complexity.

Note: In the following parts, we will see these algorithms and data structures.

There are several types of algorithm time complexity:

  • average case scenario
  • best case scenario
  • and worst case scenario

Time complexity is often the worst case scenario.

I was only talking about the time complexity of the algorithm, but the complexity also applies to:

  • algorithm memory consumption
  • disk I/O consumption by the algorithm

Of course, there are complications worse than n2, for example:

  • n4: that's terrible! Some of the mentioned algorithms have such complexity.
  • 3n: it's even worse! One of the algorithms we'll see in the middle of this article has this complexity (and it's actually used in many databases).
  • factorial n: you will never get your results even with a small amount of data.
  • nn: if you run into this difficulty, you have to ask yourself if this is really your line of work...

Note: I didn't give you a real definition of "big O", just an idea. You can read this article at Wikipedia for the real (asymptotic) definition.

MergeSort (Merge sort)

What do you do when you need to sort a collection? What? You're calling the sort() function... Ok, good answer... But for a database, you need to understand how this sort() function works.

There are several good sorting algorithms, so I will focus on the most important: merge sort. You may not understand why data sorting is useful right now, but you should understand after the query optimization part. Moreover, understanding merge sort will help us later understand the common database join operation called go join (merge merge).

Merge (merge)

Like many useful algorithms, merge sort relies on a trick: Merging 2 sorted arrays of size N/2 into an N-element sorted array costs only N operations. This operation is called a merge.

Let's see what this means with a simple example:

How Relational Databases Work (Part 1)

In this figure, you can see that to build the final sorted array of 8 elements, you only need to iterate once in 2x 4-element arrays. Because both 4-element arrays are already sorted:

  • 1) you are comparing both current elements in two arrays (at the beginning current=first)
  • 2) then take the smallest one to put it in an array of 8 elements
  • 3) and move on to the next element in the array where you took the smallest element
  • and repeat 1,2,3 until you reach the last element of one of the arrays.
  • You then take the rest of the elements of another array to put them into an 8 element array.

This works because both 4-element arrays are sorted and therefore you don't have to "return" in those arrays.

Now that we understand this trick, here is my merge pseudocode:

array mergeSort(array a)
   if(length(a)==1)
      return a[0];
   end if

   //recursive calls
   [left_array right_array] := split_into_2_equally_sized_arrays(a);
   array new_left_array := mergeSort(left_array);
   array new_right_array := mergeSort(right_array);

   //merging the 2 small ordered arrays into a big one
   array result := merge(new_left_array,new_right_array);
   return result;

Merge sort breaks the problem into smaller problems and then finds the results of the smaller problems to get the result of the original problem (note: this kind of algorithm is called divide and conquer). If you don't understand this algorithm, don't worry; I didn't understand it the first time I saw it. If it can help you, I see this algorithm as a two-phase algorithm:

  • Division phase, where the array is divided into smaller arrays
  • The sort phase, where small arrays are concatenated (using union) to form a larger array.

Division phase

How Relational Databases Work (Part 1)

In the division step, the array is divided into unitary arrays in 3 steps. The formal number of steps is log(N) (since N=8, log(N) = 3).

How do I know this?

I'm genius! In a word, mathematics. The idea is that each step divides the size of the original array by 2. The number of steps is the number of times you can divide the original array by two. This is the exact definition of the logarithm (with base 2).

Sorting phase

How Relational Databases Work (Part 1)

In the sorting step, you start with unitary (one-element) arrays. During each step, you apply multiple merge operations, and the total cost is N = 8 operations:

  • In the first step you have 4 merges which cost 2 operations each
  • In the second step you have 2 merges which cost 4 operations each
  • In the third step you have 1 merge which costs 8 operations

Since there are log(N) steps, total cost N * log(N) operations.

Benefits of mergesort

Why is this algorithm so powerful?

Because:

  • You can change it to reduce the amount of memory, so that you do not create new arrays, but directly modify the input array.

Note: this kind of algorithm is called inplace (sort without additional memory).

  • You can change it to use disk space and a small amount of memory at the same time without significant disk I/O overhead. The idea is to only load into memory the parts that are currently being processed. This is important when you need to sort a multi-gigabyte table with only a 100-megabyte memory buffer.

Note: this kind of algorithm is called external sort.

  • You can change it to run on multiple processes/threads/servers.

For example, distributed merge sort is one of the key components Hadoop (which is a structure in big data).

  • This algorithm can turn lead into gold (really!).

This sorting algorithm is used in most (if not all) databases, but it is not the only one. If you want to know more, you can read this research work, which discusses the pros and cons of common sorting algorithms in a database.

Array, Tree and Hash Table

Now that we understand the idea of ​​time complexity and sorting, I have to tell you about 3 data structures. This is important because they are the basis of modern databases. I will also introduce the notion database index.

Array

A two-dimensional array is the simplest data structure. A table can be viewed as an array. For example:

How Relational Databases Work (Part 1)

This 2-dimensional array is a table with rows and columns:

  • Each line represents an entity
  • Columns store properties that describe an entity.
  • Each column stores data of a particular type (integer, string, date...).

It is so convenient to store and visualize data, however, when you need to find a specific value, this is not suitable.

For example, if you want to find all the guys who work in the UK, you would need to look at each row to determine if that row belongs to the UK. It will cost you N operationsWhere N - the number of lines, which is not bad, but could there be a faster way? Now it's time for us to get to know the trees.

Note: most modern databases provide extended arrays for efficient table storage: heap-organizedtables and index-organizedtables. But that doesn't change the problem of quickly finding a specific condition in a group of columns.

Database tree and index

A binary search tree is a binary tree with a special property, the key at each node must be:

  • greater than all keys stored in the left subtree
  • less than all keys stored in the right subtree

Let's see what this means visually

Idea

How Relational Databases Work (Part 1)

This tree has N = 15 elements. Let's say I'm looking for 208:

  • I start at the root whose key is 136. Since 136<208, I'm looking at the right subtree of node 136.
  • 398>208 so I'm looking at the left subtree of node 398
  • 250>208 so I'm looking at the left subtree of node 250
  • 200<208, so I'm looking at the right subtree of node 200. But 200 doesn't have a right subtree, value does not exist (because if it exists, it will be in the right subtree of 200).

Now let's say I'm looking for 40

  • I start at the root whose key is 136. Since 136 > 40, I look at the left subtree of node 136.
  • 80 > 40, so I'm looking at the left subtree of node 80
  • 40=40 node exists. I extract the row id inside the node (this is not in the picture) and look in the table for the given row id.
  • Knowing the row ID allows me to know exactly where the data is in the table, so I can retrieve it instantly.

As a result, both searches will cost me the number of levels inside the tree. If you carefully read the part about merge sort, you should see that there are log(N) levels here. It turns out, search cost log(N), not bad!

Back to our problem

But this is very abstract, so let's get back to our problem. Instead of a simple integer, imagine a string that represents the country of someone in the previous table. Suppose you have a tree that contains the field "country" (column 3) of the table:

  • If you want to know who works in the UK
  • you are looking at the tree to get the node that represents the UK
  • inside "UKnode" you will find the location of UK employee records.

This lookup will cost log(N) operations instead of N operations if you don't use an array directly. What you just presented was database index.

You can build an index tree for any group of fields (string, number, 2 lines, number and string, date...) as long as you have a function to compare keys (i.e. field groups) so you can set order among keys (which is the case for any basic types in the database).

B+TreeIndex

While this tree works well for getting a specific value, there is a BIG problem when you need to get multiple elements between two values. This will cost O(N) because you would have to look at each node in the tree and check if it is between those two values ​​(for example, with an ordered tree traversal). Moreover, this operation is not convenient for disk I/O, since you will have to read the entire tree. We need to find a way to efficiently range request. To solve this problem, modern databases use a modified version of the previous tree called B+Tree. In B+Tree tree:

  • only the lowest nodes (leaves) store information (arrangement of rows in linked table)
  • the rest of the nodes are here for routing to the correct node while searching.

How Relational Databases Work (Part 1)

As you can see, there are more nodes here (twice). Indeed, you have additional nodes, "decision nodes", which will help you find the correct node (which stores the location of the rows in the associated table). But the search complexity is still O(log(N)) (there is only one more level). The big difference is that nodes at a lower level are linked to their successors.

With this B+Tree, if you're looking for values ​​between 40 and 100:

  • You just need to look up 40 (or the nearest value after 40 if 40 doesn't exist) as you did with the previous tree.
  • Then collect 40 heirs using direct heir links until you reach 100.

Let's say you have found M successors and the tree has N nodes. Finding a particular node costs log(N) like the previous tree. But by getting this node, you will get M successors in M ​​operations with references to their successors. This search costs only M+log(N) operations compared to N operations on the previous tree. Moreover, you don't have to read the full tree (only M+log(N) nodes), which means less disk usage. If M is small (eg 200 lines) and N is large (1 lines), it will be a BIG difference.

But there are new problems here (again!). If you are adding or removing a row in the database (and therefore in the associated B+Tree index):

  • you must maintain order between nodes within a B+Tree, otherwise you cannot find nodes within an unsorted tree.
  • you must keep the number of levels in B+Tree as low as possible, otherwise the time complexity in O(log(N)) becomes O(N).

In other words, B+Tree should be self-ordering and balanced. Luckily, this is possible with smart delete and paste operations. But this comes at a cost: insertions and deletions in a B+ tree cost O(log(N)). That's why some of you have heard that using too many indexes is not a good idea. Really, you are slowing down a fast insert/update/delete of a row in a tablebecause the database needs to update the indexes of the table with an expensive O(log(N)) operation for each index. Moreover, adding indexes means more work for transaction manager (will be described at the end of the article).

For more information, you can see the Wikipedia article on B+Tree. If you want an example implementation of a B+Tree in a database, take a look this article и this article from a leading MySQL developer. They both focus on how InnoDB (the MySQL engine) handles indexes.

Note: A reader told me that, due to low-level optimizations, the B+ tree should be fully balanced.

Hashtable (Hash table)

Our last important data structure is the hash table. This is very useful when you want to quickly look up values. Moreover, understanding a hash table will help us later understand a common database join operation called a hash join ( hash join). This data structure is also used by the database to store some internal things (for example, lock table or buffer pool, we will see both of these concepts later).

A hash table is a data structure that quickly finds an element by its key. To build a hash table, you need to define:

  • key for your items
  • hash function for keys. The computed hashes of the keys give the location of the elements (called segments ).
  • function to compare keys. Once you have found the correct segment, you should find the element you are looking for within the segment using this comparison.

A simple example

Let's take an illustrative example:

How Relational Databases Work (Part 1)

This hash table has 10 buckets. Because I'm lazy, I've only drawn 5 segments, but I know you're smart, so I'll let you introduce 5 others yourself. I used a hash function modulo 10 of the key. In other words, I keep only the last digit of the element's key to find its segment:

  • if the last digit is 0, the element is in segment 0,
  • if the last digit is 1, the element is in segment 1,
  • if the last digit is 2, the element falls into area 2,
  • ...

The comparison function I used is just equality between two integers.

Let's say you want to get element 78:

  • The hash table calculates the hash code for 78, which is 8.
  • The hash table looks into segment 8 and the first element it finds is 78.
  • It returns you item 78
  • Search costs only 2 operations (one for calculating the hash value, and the other for finding the element within the segment).

Now let's say you want to get element 59:

  • The hash table calculates the hash code for 59, which is 9.
  • The hash table looks in segment 9, the first element found is 99. Since 99!=59, element 99 is not a valid element.
  • Using the same logic, the second element (9), the third (79), ..., the last (29) is taken.
  • Element not found.
  • The search cost 7 operations.

Nice hash function

As you can see, depending on the value you are looking for, the cost is not the same!

If I now change the hash function modulo 1 of the key (that is, by taking the last 000 digits), the second lookup only costs 000 operation because there are no elements in segment 6. The real challenge is to find a good hash function that will produce buckets containing a very small number of elements..

In my example, finding a good hash function is easy. But this is a simple example, finding a good hash function is harder when the key is:

  • string (for example - last name)
  • 2 lines (for example - last name and first name)
  • 2 lines and date (for example - last name, first name and date of birth)
  • ...

With a good hash function, a hash table lookup is O(1).

Array vs hash table

Why not use an array?

Hmm, good question.

  • The hash table can be partially loaded into memory, and the rest of the segments can remain on disk.
  • With an array, you must use contiguous space in memory. If you are uploading a large table very hard to find enough continuous space.
  • For a hash table, you can choose the desired key (for example, country and last name of a person).

For more information, you can read the article on Javahash map, which is an efficient hash table implementation; you don't need to understand Java to understand the concepts in this article.

Source: habr.com

Add a comment