What can weaken the transaction isolation level in databases

Hi all. Vladislav Rodin is in touch. I currently lead the High Load Architect course at OTUS and also teach software architecture courses.

In addition to teaching, as you may have noticed, I am writing author's material for the OTUS blog on HabrΓ© and I want to coincide with the launch of the course today PostgreSQL, for which a set is currently open.

What can weaken the transaction isolation level in databases

Introduction

Π’ last time we talked about the fact that transactions in databases serve to solve two problems: ensuring fault tolerance and access to data in a competitive environment. To fully perform these tasks, the transaction must have ACID properties. Today we will talk in detail about the letter I (isolation) in this abbreviation.

Insulation

Isolation solves the problem of data access in a concurrent environment, actually providing protection against race conditions. Ideally, isolation means serialization, that is, a property that ensures that the result of executing transactions in parallel is the same as if they were performed sequentially. The main problem of this property is that it is very difficult to provide technically and, as a result, hits the system performance. That is why isolation is often weakened, taking the risks of some anomalies, which will be discussed below. The possibility of the occurrence of certain anomalies characterizes the level of transaction isolation.

The most famous anomalies are: dirty read, non-repeatable read, phantom read, but in fact there are 5 more: dirty write, cursor lost update, lost update, read skew, write skew.

Dirty write

The essence of the anomaly is that transactions can overwrite uncommitted data.

What can weaken the transaction isolation level in databases

This anomaly is dangerous not only because the data may conflict after the commit of both transactions (as in the picture), but also because the atomicity is violated: because we will allow overwriting uncommitted data, it is not clear how to roll back one transaction without hitting the other .

The anomaly is treated quite simply: we hang a lock on the record before the start of the record, prohibiting other transactions from changing the record until the lock is released.

Dirty read

Dirty read means reading uncommitted data.

What can weaken the transaction isolation level in databases

Problems arise when, on the basis of a sample, it is necessary to carry out some actions or make decisions.

To fix the anomaly, you can hang a read lock, but this will hit performance hard. It is much easier to say that for the rollback of a transaction, the initial state of the data (before the start of recording) must be saved in the system. Why not read from there? It's inexpensive enough that most databases remove dirty reads by default.

Lost update

Lost update means lost updates, and the translation accurately reflects the essence of the problem:

What can weaken the transaction isolation level in databases

In fact, the result of transaction T2 has been cancelled. This situation is corrected by explicit or implicit write locks. That is, we either simply update the record, and then an implicit lock occurs, or we perform select for update, causing a read and write lock. Please note that such an operation is quite dangerous: with our β€œinnocent” read, we block other reads. Some bases offer more secure select for shareA that allows the data to be read but not modified.

Cursor lost update

For finer control, databases may offer other tools, such as a cursor. A cursor is a structure that contains a set of rows and allows you to iterate over them. declare cursor_name for select_statement. The content of the cursor is described by select.

Why do you need a cursor? The fact is that some databases offer a lock on all records selected by select (read stability), or only on the record on which the cursor is currently located (cursor stability). With cursor stability, a short lock is implemented, which allows us to reduce the number of locks if we iterate over a large data sample. Therefore, the lost update anomaly is isolated separately for the cursor.

non-repeatable read

Non-repeatable read is that during the execution of our transaction, 2 consecutive reads of the same record will result in different results, because another transaction intervened between these two reads, changed our data and was committed.

What can weaken the transaction isolation level in databases

Why is this even a problem? Imagine that the goal of transaction T2 in the picture is to select all goods whose price is less than 150 USD. Someone else updated the price to $200. Thus, the installed filter did not work.

These anomalies cease to occur when two-phase locks are added or when using the MVCC mechanism, which I would like to talk about separately.

phantom read

Phantom refers to reading data that was added by another transaction.

What can weaken the transaction isolation level in databases

As an example, we can observe the incorrect selection of the cheapest product when this anomaly occurs.

Getting rid of phantom reads is already quite difficult. Normal blocking is not enough, because we would not be able to block something that does not exist yet. 2PL systems use predictive locking, while MVCC systems have a transaction scheduler that rolls back transactions that could be violated by an insert. Both the first and second mechanisms are quite heavy.

read skew

Read skew occurs when we work with several tables, the contents of which must change in concert.

Suppose we have tables representing posts and their meta-information:

What can weaken the transaction isolation level in databases

One transaction reads from tables, another changes them:

What can weaken the transaction isolation level in databases

As a result of transaction T1, the post title = Good, and updated_by = T2, which is some inconsistency.

In fact, this is a non-repeatable read, but as part of several tables.

As a fix, T1 can hold locks on all rows it reads, which will prevent T2 from changing the information. In the case of MVCC, transaction T2 will be cancelled. Protecting against this anomaly can become important if we use cursors.

write skew

This anomaly is also easier to explain with an example: suppose that in our system at least one doctor should be on duty, but both doctors decided to cancel their duty:

What can weaken the transaction isolation level in databases

What can weaken the transaction isolation level in databases

The anomaly resulted in none of the doctors coming on duty. Why did it happen? Because the transaction was checking a condition that could be violated by another transaction, and due to isolation, we did not see this change.

This is the same non-repeatable read. As an option, select'y can hang up locks on these records.

Write skew and read skew are combinations of the previous anomalies. You can consider write skew, which is essentially a phantom read. Consider a table that has the names of employees, their salary, and the project they work on:

What can weaken the transaction isolation level in databases

What can weaken the transaction isolation level in databases

As a result, we get the following picture: each manager thought that his change would not lead to going over budget, so they made personnel changes that, in total, led to overspending.

The cause of the problem is exactly the same as in phantom reading.

Conclusions

Relaxing the transaction isolation level in a database is a trade-off between security and performance and should be chosen based on the potential risks to the business if an anomaly occurs.

Learn more about the course.

Source: habr.com

Add a comment