Why might you need semi-synchronous replication?

Hi all. Vladislav Rodin is in touch. I currently teach courses on Software Architecture and High-Stress Software Architecture at OTUS. In anticipation of the start of a new course flow "Architect of high loads" I decided to write a short piece of original material that I want to share with you.

Why might you need semi-synchronous replication?

Introduction

Due to the fact that the HDD can only perform about 400-700 operations per second (which is incomparable with typical rps on a high-load system), the classic disk database is the bottleneck of the architecture. Therefore, it is necessary to pay special attention to the scaling patterns of this storage.

Currently, there are 2 database scaling patterns: replication and sharding. Sharding allows you to scale the write operation and, as a result, reduce the rps per write per server in your cluster. Replication allows you to do the same thing, but with read operations. It is this pattern that this article is devoted to.

Replication

If you look at replication at a very high level, it’s a simple thing: you had one server, there was data on it, and then this server could no longer cope with the load of reading this data. You add a couple more servers, synchronize data across all servers, and the user can read from any server in your cluster.

Despite its apparent simplicity, there are several options for classifying various implementations of this scheme:

  • By roles in the cluster (master-master or master-slave)
  • By objects sent (row-based, statement-based or mixed)
  • According to the node synchronization mechanism

Today we will deal with point 3.

How does a transaction commit occur?

This topic is not directly related to replication; a separate article can be written on it, but since further reading is useless without understanding the transaction commit mechanism, let me remind you of the most basic things. A transaction commit occurs in 3 stages:

  1. Logging a transaction to the database log.
  2. Using a transaction in a database engine.
  3. Returning confirmation to the client that the transaction was successfully applied.

In different databases, this algorithm may have nuances: for example, in the InnoDB engine of the MySQL database there are 2 logs: one for replication (binary log), and the other for maintaining ACID (undo/redo log), while in PostgreSQL there is one log that performs both functions (write ahead log = WAL). But what is presented above is precisely the general concept, which allows such nuances not to be taken into account.

Synchronous (sync) replication

Let's add logic to replicate the received changes to the transaction commit algorithm:

  1. Logging a transaction to the database log.
  2. Using a transaction in a database engine.
  3. Sending data to all replicas.
  4. Receiving confirmation from all replicas that a transaction has been completed on them.
  5. Returning confirmation to the client that the transaction was successfully applied.

With this approach we get a number of disadvantages:

  • the client waits for the changes to be applied to all replicas.
  • as the number of nodes in the cluster increases, we decrease the likelihood that the write operation will be successful.

If everything is more or less clear with the 1st point, then the reasons for the 2nd point are worth explaining. If during synchronous replication we do not receive a response from at least one node, we roll back the transaction. Thus, by increasing the number of nodes in the cluster, you increase the likelihood that a write operation will fail.

Can we wait for confirmation from only a certain percentage of nodes, for example, from 51% (quorum)? Yes, we can, but in the classic version, confirmation from all nodes is required, because this is how we can ensure complete data consistency in the cluster, which is an undoubted advantage of this type of replication.

Asynchronous (async) replication

Let's modify the previous algorithm. We will send data to the replicas “sometime later”, and “sometime later” the changes will be applied to the replicas:

  1. Logging a transaction to the database log.
  2. Using a transaction in a database engine.
  3. Returning confirmation to the client that the transaction was successfully applied.
  4. Sending data to replicas and applying changes to them.

This approach leads to the fact that the cluster works quickly, because we do not keep the client waiting for the data to reach the replicas and even be committed.

But the condition of dumping data onto replicas “sometime later” can lead to the loss of a transaction, and to the loss of a transaction confirmed by the user, because if the data did not have time to be replicated, a confirmation to the client about the success of the operation was sent, and the node to which the changes arrived crashed HDD, we lose the transaction, which can lead to very unpleasant consequences.

Semisync replication

Finally we get to semi-synchronous replication. This type of replication is not very well known or very common, but it is of considerable interest because it can combine the advantages of both synchronous and asynchronous replication.

Let's try to combine the 2 previous approaches. We won’t keep the client for long, but we will require that the data be replicated:

  1. Logging a transaction to the database log.
  2. Using a transaction in a database engine.
  3. Sending data to replicas.
  4. Receiving confirmation from the replica that the changes have been received (they will be applied “sometime later”).
  5. Returning confirmation to the client that the transaction was successfully applied.

Please note that with this algorithm, transaction loss occurs only if both the node receiving the changes and the replica node fail. The probability of such a failure is considered low, and these risks are accepted.

But with this approach there is a possible risk of phantom reads. Let's imagine the following scenario: in step 4, we did not receive confirmation from any replica. We must roll back this transaction and not return a confirmation to the client. Since the data was applied in step 2, there is a time gap between the end of step 2 and the rollback of the transaction, during which parallel transactions can see changes that should not be in the database.

Lose-less semisync replication

If you think a little, you can just reverse the steps of the algorithm and fix the problem of phantom reads in this scenario:

  1. Logging a transaction to the database log.
  2. Sending replica data.
  3. Receiving confirmation from the replica that the changes have been received (they will be applied “sometime later”).
  4. Using a transaction in a database engine.
  5. Returning confirmation to the client that the transaction was successfully applied.

Now we commit changes only if they have been replicated.

Hack and predictor Aviator

As always, there are no ideal solutions; there is a set of solutions, each of which has its own advantages and disadvantages and is suitable for solving different classes of problems. This is absolutely true for choosing a mechanism for synchronizing data in a replicated database. The set of advantages that semi-synchronous replication has is sufficiently solid and interesting that it can be considered worthy of attention, despite its low prevalence.

That's all. See you at course!

Source: habr.com

Add a comment