Transactions and their control mechanisms

  Transactions

A transaction is a sequence of data operations that has a beginning and an end.

A transaction is a sequential execution of read and write operations. The end of a transaction can be either saving changes (commit) or canceling changes (rollback). In relation to the database, a transaction is several requests that are treated as a single request.

Transactions must satisfy the ACID properties

Atomicity. A transaction is either executed completely or not executed at all.

Consistency. At the end of the transaction, the restrictions imposed on the data (for example, constraints in the database) should not be violated. Consistency implies that the system will be transferred from one correct state to another correct one.

Isolation. Transactions running in parallel should not affect each other, for example, change the data that another transaction uses. The result of executing parallel transactions should be the same as if the transactions were performed sequentially.

Sustainability. Once committed, changes should not be lost.

Transaction log

The journal stores changes made by transactions, ensures data atomicity and resilience in case of system failure

The log contains the values ​​the data had before and after it was changed by the transaction. Write-ahead log strategy obliges to add to the log a record of the previous values ​​before the start, and about the end values ​​after the completion of the transaction. In the event of a sudden shutdown of the system, the database reads the log in reverse order and undoes the changes made by the transactions. Having met an interrupted transaction, the database executes it and makes changes about it in the log. While in the state at the time of the crash, the database reads the log in direct order and reverts the changes made by the transactions. Thus, the stability of transactions that have already been committed and the atomicity of an aborted transaction are preserved.

Simply re-executing erroneous transactions is not enough to recover.

Example. The user has $500 in his account and the user decides to withdraw them through an ATM. Two transactions are in progress. The first one reads the value of the balance and, if there are enough funds on the balance sheet, gives money to the user. The second subtracts the required amount from the balance. Let's say there was a system failure and the first operation was not executed, and the second was executed. In this case, we cannot reissue money to the user without returning the system to its original state with a positive balance.

Isolation levels

Reading Committed Data

The problem of dirty reading (Dirty Read) is that a transaction can read the intermediate result of another transaction.

Example. The initial value of the balance is $0. T1 adds $50 to the balance. T2 reads the balance value ($50). T1 cancels the changes and exits. T2 continues execution with incorrect balance data.

The solution is to read the fixed data (Read Committed), which prohibits reading the data changed by the transaction. If transaction A has changed some data set, then transaction B, when accessing this data, has to wait for transaction A to complete.

Repeatable Read

The problem of lost changes (Lost Updates). T1 saves changes on top of T2 changes.

Example. The initial value of the balance is $0 and two transactions simultaneously replenish the balance. T1 and T2 read the balance equal to $0. T2 then adds $200 to $0 and stores the result. T1 adds $100 to $0 and saves the result. The final result is $100 instead of $300.

Unrepeatable read problem. Repeated reading of the same data returns different values.

Example. T1 reads the balance value equal to $0. Then T2 adds $50 to the balance and ends. T1 re-reads the data and finds a discrepancy with the previous result.

Repeatable Read ensures that repeated reading will return the same result. Data read by one transaction is forbidden to be changed in others until the transaction is completed. If transaction A has read some data set, then transaction B, when accessing this data, has to wait for transaction A to complete.

Serializable Reading

Phantom Reads problem. Two queries that select data by some condition return different values.

Example. T1 requests the number of all users whose balance is greater than 0$ but less than 100$. T2 subtracts $1 from a user with a balance of $101. T1 resubmits the request.

Ordered reading (Serializable). Transactions are executed as fully sequential. It is forbidden to update and add records that fall under the conditions of the request. If transaction A has requested the data of the entire table, then the entire table is frozen for other transactions until transaction A completes.

Scheduler

Sets the order in which operations should be performed for concurrent transactions.

Provides a specified level of isolation. If the result of performing operations does not depend on their order, then such operations are commutative (Permutable). Read operations and operations on different data are commutative. The read-write and write-write operations are not commutative. The task of the scheduler is to interleave operations performed by parallel transactions so that the result of execution is equivalent to sequential execution of transactions.

Mechanisms for controlling parallel jobs (Concurrency Control)

Optimistic based on conflict detection and resolution, pessimistic based on conflict prevention

With an optimistic approach, multiple users have copies of the data at their disposal. The first one to complete the edit saves the changes, while the others must merge the changes. The optimistic algorithm allows the conflict to occur, but the system must recover from the conflict.

In a pessimistic approach, the first user to capture the data prevents others from receiving the data. If conflicts are rare, it is wise to choose the optimistic strategy, as it provides a higher level of concurrency.

Locking

If one transaction has locked the data, then other transactions must wait for the unlock when accessing the data.

A block can overlay a database, table, row, or attribute. Shared Lock can be imposed on the same data by several transactions, allows all transactions (including the imposed one) to read, prohibits modification and exclusive capture. Exclusive lock (Exclusive Lock) can be imposed by only one transaction, allows any actions of the imposed transaction, prohibits any actions of the others.

A deadlock is a situation where transactions are in a waiting mode that lasts indefinitely.

Example. The first transaction is waiting for the release of the data captured by the second, while the second is waiting for the release of the data captured by the first.

An optimistic solution to the deadlock problem allows the deadlock to occur, but then recovers the system by rolling back one of the transactions involved in the deadlock

Deadlocks are searched for at regular intervals. One of the detection methods is by time, that is, consider that a deadlock has occurred if the transaction is taking too long. When a deadlock is found, one of the transactions is rolled back, allowing other transactions involved in the deadlock to complete. Victim selection can be based on transaction value or seniority (Wait-Die and Wound-wait schemes).

Every transaction T timestamp is assigned TS containing the start time of the transaction.

Wait Die.

If TS(Ti) < TS(Tj)then Ti waiting otherwise Ti is rolled back and restarted with the same timestamp.

If a young transaction has acquired a resource and an older one is requesting the same resource, then the older transaction is allowed to wait. If an older transaction has acquired a resource, then the young transaction requesting that resource will be rolled back.

Wound-wait.

If TS(Ti) < TS(Tj)then Tj is rolled back and restarted with the same timestamp, otherwise Ti waiting.

If a younger transaction has acquired a resource and an older transaction requests the same resource, then the young transaction will be rolled back. If an older transaction has acquired a resource, then a younger transaction requesting that resource is allowed to wait. Victim selection based on seniority prevents deadlocks from occurring, but rolls back transactions that are not deadlocked. The problem is that transactions can be rolled back many times. an older transaction may hold onto the resource for a long time.

A pessimistic solution to the deadlock problem does not allow a transaction to start executing if there is a risk of a deadlock occurring.

To detect a deadlock, a graph is built (a waiting graph, wait-for-graph), the vertices of which are transactions, and the edges are directed from transactions awaiting the release of data to a transaction that captured this data. A deadlock is considered to have occurred if the graph has a loop. Building a wait graph, especially in distributed databases, is an expensive procedure.

Two-Phase Locking - Prevent deadlocks by capturing all resources used by a transaction at the beginning of the transaction and releasing them at the end

All blocking operations must precede the first unblocking one. It has two phases - Growing Phase, during which the grips are accumulated and Shrinking Phase, during which the grips are released. If it is impossible to capture one of the resources, the transaction starts over. It is possible that a transaction will not be able to capture the required resources, for example, if several transactions compete for the same resources.

Two-phase commit ensures that the commit is performed on all database replicas

Each database enters information about the data that will be changed in the log and responds to the OK coordinator (Voting Phase). After everyone has answered OK, the coordinator sends a signal obliging everyone to commit. After the commit, the servers respond OK, if at least one did not respond OK, then the coordinator sends a signal to cancel the changes to all servers (Completion Phase).

Timestamp method

An older transaction is rolled back when trying to access data that is involved in a younger transaction

Each transaction is assigned a timestamp TS corresponding to the start time of execution. If Ti older Tjthen TS(Ti) < TS(Tj).

When a transaction is rolled back, it is assigned a new timestamp. Each data object Q involved in a transaction is marked with two labels. W-TS(Q) is the timestamp of the youngest transaction that has successfully written to Q. R-TS(Q) is the timestamp of the youngest transaction that performed a read write over Q.

When the transaction T requests to read data Q two options are possible.

If TS(T) < W-TS(Q), that is, the data was updated by a younger transaction, then the transaction T rolls back.

If TS(T) >= W-TS(Q), then the reading is performed and R-TS(Q) becomes MAX(R-TS(Q), TS(T)).

When the transaction T requests data change Q two options are possible.

If TS(T) < R-TS(Q), that is, the data has already been read by a younger transaction, and if a change is made, a conflict will arise. transaction T rolls back.

If TS(T) < W-TS(Q), i.e. the transaction tries to overwrite a newer value, transaction T is rolled back. In other cases, the change is made and W-TS(Q) becomes equal TS(T).

No costly building of a waiting graph is required. Older transactions depend on newer ones, so there are no cycles in the wait graph. There are no deadlocks because transactions do not wait, but are immediately rolled back. Cascading rollbacks are possible. If Ti rolled back, and Tj read the data that changed Tithen Tj should roll back too. If at the same time Tj has already been committed, there will be a violation of the stability principle.

One solution to cascading rollbacks. The transaction performs all write operations at the end, and the remaining transactions must wait for this operation to complete. Transactions wait for a commit before being read.

Thomas write rule - a variation of the timestamp method in which data updated by a younger transaction is not allowed to be overwritten by an older one

Transaction T requests data change Q. If TS(T) < W-TS(Q), i.e. the transaction tries to overwrite a newer value, transaction T is not rolled back as in the timestamp method.

Source: habr.com

Add a comment