PostgreSQL and connection-specific write consistency settings

The translation of the article was prepared specifically for the students of the course "Database". Are you interested in developing in this direction? We invite you to Open Day, where we talk in detail about the program, features of the online format, competencies and career prospects that await graduates after training.

PostgreSQL and connection-specific write consistency settings

PostgreSQL and connection-specific write consistency settings
We have to deal with many databases in Compose, which gives us the opportunity to get to know their functionality and shortcomings better. As we learn to love the features of new databases, we sometimes start to think how nice it would be if these features were present in more mature tools that we have been working with for a long time. One of the new features I wanted to see in PostgreSQL was configurable write-per-connection consistency across the entire cluster. And as it turns out, we already have it, and today we want to share with you information about how you can use it.

Why do I need it?

How the cluster should behave depends on your application. Take, for example, an application for paying bills. You will need XNUMX% consistency across the cluster, so you will have to enable synchronous commits so that your database waits for all changes to be made. However, if your application is a rapidly growing social network, then you will certainly prefer a fast response to XNUMX% consistency. To achieve this, you can use asynchronous commits on your cluster.

Meet Compromise

You have to make a trade-off between data consistency and performance. PostgreSQL moves away from consistency because the default configuration is predictable and free from unexpected surprises. And now let's get acquainted with the compromises.

Tradeoff 1: Performance

If a PostgreSQL cluster does not require consistency, it may well run asynchronously. The write is made to the cluster leader, and updates will be sent to its replicas after a few milliseconds. When a PostgreSQL cluster needs consistency, it must run synchronously. The write will be done at the cluster leader, which will send an update to the replicas and wait for confirmation that each has written before sending confirmation to the client that initiated the write that it was successful. The practical difference between these approaches is that the asynchronous method requires two network hops, while the synchronous method requires four.

Tradeoff 2: Consistency

The result in case of failure in the work of the leader in these two approaches will also be different. If the work is done asynchronously, then when such an error occurs, not all records will be committed by replicas. How much will be lost? Depends on the application itself and the efficiency of replication. Compose replication will prevent a replica from becoming a leader if the amount of information in it is 1 MB less than in the leader, that is, up to 1 MB of records can potentially be lost during asynchronous operation.

This does not happen in synchronous mode. If the leader fails, all replicas are updated, since any record acknowledged on the leader must be acknowledged in the replicas. Here it is - consistency.

Synchronous behavior makes sense in a bill-paying application where consistency has a distinct advantage in tradeoffs between consistency and performance. The most important thing for such an application is valid data. Now think about a social network, in which the main task is to keep the user's attention by responding to requests as quickly as possible. In such a case, performance with fewer network hops and fewer commit waits would be prioritized. However, the trade-off between performance and consistency is not the only one to think about.

Tradeoff 3: Failures

It is very important to understand how the cluster behaves during a failure. Consider a situation where one or more replicas fail. When commits are processed asynchronously, the leader will continue to function, that is, accept and process records without waiting for missing replicas. When the replicas return to the cluster, they catch up with the leader. With synchronous replication, if the replicas do not respond, then the leader has no choice but to continue waiting for the commit to be confirmed until the replica returns to the cluster and can accept and commit the write.

One connection per transaction?

Every application needs a specific type of combination of consistency and performance. Unless, of course, it's our bill-paying app, which we imagine to be completely consistent, or our near-ephemeral social network app. In all other cases, there will be times when some operations must be synchronous and some must be asynchronous. You may not want the system to wait until the message sent to the chat is committed, but if there is a payment in the same application, then you will have to wait.

All these decisions are, of course, made by the application developer. Making the right decisions about when to apply one or the other approach will help you get the most out of your cluster. It is important that the developer can switch between them at the SQL level for connections and for transactions.

Ensuring control in practice

PostgreSQL enforces consistency by default. This is controlled by the server setting synchronous_commit. By default it is in position on, but it has three other options: local, remote_write or off.

When setting the parameter to off stops all synchronous commits, even on the local system. The setting in local specifies synchronous mode for the local system, but writes to replicas are asynchronous. Remote_write goes even further: writes to replicas are made asynchronously, but are returned when the replica accepted the write but did not write it to disk.

Considering the available range of options, we choose a behavior and, keeping in mind that on are synchronous recordings, we will choose local for asynchronous network commits, while leaving local commits synchronous.

Now, we will tell you how to set it up in an instant, but imagine that we have set synchronous_commit Π² local for the server. We wondered if it was possible to change the parameter synchronous_commit on the fly, and it turned out that not only is it possible, there are even two whole ways for this. The first is to set your connection's session as follows:

SET SESSION synchronous_commit TO ON;  
// Your writes go here

All subsequent writes to the session will acknowledge writes to replicas before returning a positive result to the connected client. Unless of course you change the setting synchronous_commit again. You can skip part SESSION in the command as it will be in the default value.

The second way is good when you just want to make sure you get synchronous replication for a single transaction. In many databases of the "NoSQL" generation, the concept of transactions does not exist, but it does exist in PostgreSQL. In this case, you start a transaction and then set synchronous_commit Π² on before committing to a transaction. COMMIT will commit the transaction using any parameter value synchronous_commit, which was set at the time, although it's best to set the variable ahead of time to make sure other developers understand that writes are not asynchronous.

BEGIN;  
SET LOCAL synchronous_commit TO ON;  
// Your writes go here
COMMIT;  

All transaction commits will now be confirmed as written to replicas before the database returns a positive response to the connected client.

Configuring PostgreSQL

So far, we have imagined a PostgreSQL system with synchronous_commitset in local. For this to be realistic on the server side, you will need to set two server configuration options. Another option synchronous_standby_names will take effect when synchronous_commit will be in on. It determines which replicas are eligible for synchronous commits, and we will set it to *, which will mean that all replicas are involved. These values ​​are usually configured in configuration file by adding:

synchronous_commit = local  
synchronous_standby_names='*'

By setting the parameter synchronous_commit in value local, we are creating a system where local drives remain synchronous, but network replica commits are asynchronous by default. Unless, of course, we decide to make these commits synchronous, as shown above.

If you have been following the development Governor project, you may have noticed some recent changes (1, 2) that allowed Governor users to test these settings and monitor their consistency.

A few more words ...

Just a week ago, I would have told you that it is impossible to fine-tune PostgreSQL so finely. It was then that Kurt, a member of the Compose platform team, insisted that there was such a possibility. He calmed my objections and found in the PostgreSQL documentation following:

PostgreSQL and connection-specific write consistency settings

This setting can be changed at any time. The behavior for any transaction is determined by the setting in effect at commit. Therefore, it is possible and useful for some transactions to be committed synchronously and for others to be committed asynchronously. For example, to force one multistatement transaction to make commits asynchronously when the default value of the parameter is opposite, set SET LOCAL synchronous_commit TO OFF in a transaction.

With this small modification to the configuration file, we have given users the ability to control their consistency and performance.

Source: habr.com

Add a comment