Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

In his talk, Andrey Borodin will tell you how they took into account the experience of PgBouncer scaling when designing a connection pooler Odyssey, how they rolled it out in production. In addition, we will discuss what functions of the pooler we would like to see in new versions: it is important for us not only to cover our needs, but to develop the user community Odyssey.

Video:

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Hi all! My name is Andrew.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

At Yandex, I am developing open source databases. And today we have a topic about connection pooler connections.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

If you know how to call connection pooler in Russian, then tell me. I really want to find a good technical term that should be established in the technical literature.

The topic is quite complicated, because in many databases the connection pooler is built-in and you don’t even need to know about it. Some settings, of course, are everywhere, but in Postgres this does not work. And in parallel (at HighLoad++ 2019) there is a report by Nikolai Samokhvalov on setting up queries in Postgres. And I understand that people have come here who have already configured requests perfectly, and these are people who are faced with rarer system problems related to the network, resource utilization. And in some places it can be quite difficult in the sense that the problems are not obvious.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Yandex has Postgres. Many Yandex services live in Yandex.Cloud. And we have several petabytes of data that generate at least a million requests per second in Postgres.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And we provide a fairly typical cluster for all services - this is the main primary node of the node, the usual two replicas (synchronous and asynchronous), backup, scaling of reading requests on the replica.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Each cluster node is Postgres, on which, in addition to Postgres and monitoring systems, a connection pooler is also installed. Connection pooler is used for fencing and for its main purpose.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

What is the main purpose of a connection pooler?

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Postgres adopts a process model for working with a database. This means that one connection is one process, one Postgres backend. And there are a lot of different caches in this backend, which are quite expensive to make different for different connections.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Also, there is an array in the Postgres code called procArray. It contains basic data about network connections. And almost all procArray processing algorithms have linear complexity, they run through the entire array of network connections. It's a pretty fast cycle, but with more incoming network connections, things get a bit more expensive. And when things get a little more expensive, you end up paying a very high price for a large number of network connections.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

There are 3 possible approaches:

  • On the application side.
  • On the database side.
  • And between, that is, all possible combinations.

Unfortunately, the built-in pooler is currently under development. Friends at PostgreSQL Professional do this mostly. When it will appear is difficult to predict. And in fact, we have two solutions for the choice of an architect. These are application-side pool and proxy pool.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Application-side pool is the easiest way. And almost all client drivers provide you with a way: to represent millions of your connections in the code as some dozens of connections to the database.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

There is a problem with the fact that at a certain point you want to scale the backend, you want to deploy it to many virtual machines.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Then you still realize that you have several more availability zones, several data centers. And the client side pooling approach leads to big numbers. Large ones are about 10 connections. This is an edge that can work fine.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

If we talk about proxy poolers, then there are two poolers who can do a lot of things. They are not only poolers. They are poolers + more cool functionality. This pgpool ΠΈ Crunchy Proxy.

But, unfortunately, not everyone needs this additional functionality. And it leads to the fact that poolers support only session pooling, i.e. one incoming client, one outgoing client to the database.

This is not very suitable for our tasks, so we use PgBouncer, which implements transaction pooling, i.e. server connections are mapped to client connections only for the duration of the transaction.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And on our load - it's true. But there are several problems.Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Problems start when you want to diagnose a session, because all incoming connections are local. Everyone came with loopback and somehow it becomes difficult to trace the session.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Of course you can use application_name_add_host. This is the Bouncer side way to add an IP address to the application_name. But application_name is set by an additional connection.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

On this chart, where the yellow line is real requests, and where the blue line is requests that fly into the database. And this difference is precisely the setting of application_name, which is only needed for tracing, but it is not at all free.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

In addition, Bouncer cannot limit one pool, i.e. the number of database connections per user, per database.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

What does this lead to? You have a loaded service written in C ++ and somewhere nearby a small service on a node that does nothing wrong with the base, but its driver goes crazy. It opens 20 connections and everything else will wait. Even your code is correct.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Of course, we wrote a small patch for Bouncer that added this setting, i.e. limiting clients to the pool.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

It would be possible to do this on the Postgres side, i.e. limit the roles in the database to the number of connections.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

But then you lose the ability to understand why you have no connections to the server. PgBouncer does not throw a connection error, it always returns the same information. And you can’t understand: maybe your password has changed, maybe the database just went down, maybe something is wrong. But there is no diagnosis. If the session can't be established, you won't know why it can't be done.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

At a certain point, you look at the graphs of the application and see that the application is not working.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Look at the top and see that Bouncer is single-threaded. This is a turning point in the life of the service. You understand that you were preparing to scale the database in a year and a half, and you need to scale the pooler.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

We have come to the conclusion that we need more PgBouncers.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

https://lwn.net/Articles/542629/

Bouncer has been slightly patched.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And they made it so that several Bouncers can be raised with the reuse of the TCP port. And already the operating system automatically transfers incoming TCP connections between them by round-robin'om.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

This is transparent to clients, i.e. it looks like you have one Bouncer, but you have fragmentation of idle connections between running Bouncers.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And at some point, you may notice that these 3 Bouncers each eat their core by 100%. You need quite a few Bouncers. Why?

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Because you have TLS. You have an encrypted connection. And if you benchmark Postgres with and without TLS, you will find that the number of established connections drops by almost two orders of magnitude with encryption enabled, because the TLS handshake consumes CPU resources.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And in the top, you can see quite a few cryptographic functions that are executed during a wave of incoming connections. Since our primary can switch between availability zones, a wave of incoming connections is a fairly typical situation. That is, for some reason, the old primary was unavailable, the entire load was sent to another data center. They will all come to say hello to TLS at the same time.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And a large number of TLS handshake may not greet Bouncer already, but squeeze his throat. A wave of incoming connections may become undamped due to the timeout. If you have a retry to the base without an exponential backoff, they won't come back over and over in a coherent wave.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Here is an example of 16 PgBouncers that load 16 cores at 100%.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

We have arrived at the cascading PgBouncer. This is the best configuration we can achieve on our Bouncer load. Our external Bouncers serve for TCP handshake, and internal Bouncers serve for real pooling, in order not to greatly fragment external connections.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

In this configuration, a soft restart is possible. You can restart all these 18 Bouncers one by one. But maintaining such a configuration is quite difficult. System administrators, DevOps, and the people who are really responsible for this server will not be very happy with this scheme.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

It would seem that all our improvements can be promoted in open source, but Bouncer does not support very well. For example, the ability to run multiple PgBouncers on the same port was committed a month ago. A pull request with this feature was a few years ago.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

https://www.postgresql.org/docs/current/libpq-cancel.html

https://github.com/pgbouncer/pgbouncer/pull/79

Or one more example. In Postgres, you can cancel a running request by sending the secret to another connection without the extra authentication. But some clients simply send a TCP-reset, i.e. they break the network connection. What will Bouncer do with this? He won't do anything. It will continue to execute the request. If you have received a huge number of connections that have laid the base with small requests, then simply disconnecting the connection from Bouncer will not be enough, you still need to complete those requests that are running in the database.

This has been patched and the issue is still not merged into Bouncer's upstream.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And so we came to the conclusion that we need our own connection pooler, which will be developed, patched, in which it will be possible to quickly fix problems and which, of course, must be multi-threaded.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

We set multithreading as the main task. We need to be able to handle the wave of incoming TLS connections well.

To do this, we had to develop a separate library called Machinarium, which is designed to describe the machine states of a network connection as a serial code. If you look at the libpq source code, you'll see pretty complex calls that can return a result to you and say, "Call me a little later. Right now I have IO for now, but when the IO passes, I have a load on the processor. And this is a multi-level scheme. Network interaction is usually described by a state machine. A lot of rules like "If I previously received a packet header of size N, then now I'm waiting for N bytes", "If I sent a SYNC packet, then now I'm waiting for a packet with result metadata." It turns out a rather difficult counter-intuitive code, as if the maze was converted into a line scan. We made it so that instead of a state machine, the programmer describes the main interaction path in the form of ordinary imperative code. Just in this imperative code, you need to insert places where the execution sequence needs to be interrupted by waiting for data from the network, passing the execution context to another coroutine (green thread). This approach is similar to the fact that we write down the most expected path in the maze in a row, and then add branches to it.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

As a result, we have one thread that makes a TCP accept and round-robin passes a TPC connection to many workers.

In this case, each client connection always runs on one processor. And this allows you to make it cache-friendly.

And besides, we have slightly improved the collection of small packets into one large packet in order to offload the system TCP stack.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

In addition, we have improved transactional pooling in the sense that Odyssey, when configured, can send CANCEL and ROLLBACK in case of a network connection failure, i.e. if no one is waiting for the request, Odyssey will tell the database not to try to fulfill the request that can waste precious resources.

And whenever possible, we keep connections to the same client. This avoids having to reinstall application_name_add_host. If possible, then we do not have an additional reset of the parameters that are needed for diagnostics.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

We work in the interests of Yandex.Cloud. And if you are using managed PostgreSQL and you have a connection pooler installed, you can create logical replication outward, i.e. leave us if you want, using logical replication. Bouncer outside the flow of logical replication will not give.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

This is an example of setting up logical replication.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

In addition, we have support for physical replication outward. In the Cloud, of course, it is impossible, because then the cluster will give you too much information about itself. But in your installations, if you need physical replication via connection pooler in Odyssey, it is possible.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Odyssey has fully compatible monitoring with PgBouncer. We have the same console that executes almost all of the same commands. If something is missing, send a pull request, or at least an issue on GitHub, we will complete the necessary commands. But we already have the main functionality of the PgBouncer console.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And of course we have error forwarding. We will return the error reported by the base. You will get information why you are not in the base, not just that you are not in it.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

This feature is disabled in case you need 100% compatibility with PgBouncer. We can behave like Bouncer, just in case.

Development

A few words about the Odyssey source code.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

https://github.com/yandex/odyssey/pull/66

For example, there are "Pause / Resume" commands. They are usually used to update the database. If you need to upgrade Postgres, you can pause it in the connection pooler, do a pg_upgrade, then resume. And from the client side, it will look like the database was just slowing down. This functionality was brought to us by people from the community. She has not yet died, but soon everything will be. (already dead)

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

https://github.com/yandex/odyssey/pull/73 - already dead

In addition, one of the new features in PgBouncer is SCRAM Authentication support, which was also brought to us by a person who does not work in Yandex.Cloud. Both are complex functionality and important.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Therefore, I would like to tell you what Odyssey is made of, in case you also want to write some code now.

You have the original Odyssey base, which relies on two main libraries. The Kiwi library is an implementation of the Postgres message protocol. That is, Postgres' native proto 3 is standard messages that frontends and backends can exchange. They are implemented in the Kiwi library.

The Machinarium library is a thread implementation library. A small fragment of this Machinarium is written in assembler. But don't worry, there are only 15 lines.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Odyssey architecture. There is a main machine running coroutines. This machine implements accepting incoming TCP connections and distributing among workers.

Within one worker, a handler for several clients can work. And also in the main thread, the console and the processing of crone tasks to remove connections that are no longer needed in the pool are spinning.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Odyssey is tested using the standard Postgres test suite. We just run install-check through Bouncer and through Odyssey, we get a null div. There are several tests related to date formatting that fail exactly the same in Bouncer and Odyssey.

In addition, there are many drivers that have their own testing. And we use their tests to test Odyssey.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Also, due to our cascading configuration, we have to test various bundles: Postgres + Odyssey, PgBouncer + Odyssey, Odyssey + Odyssey in order to be sure that if Odyssey is in any of the parts in the cascade, it also still works as expected.

Rake

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

We use Odyssey in production. And it would not be fair if I said that everything just works. No, i.e. yes, but not always. For example, in production everything just worked, then our friends from PostgreSQL Professional came and said that we had a memory leak. They really were, we fixed them. But it was simple.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Then we found that the connection pooler has incoming TLS connections and outgoing TLS connections. And connections need client certificates and server certificates.

Bouncer and Odyssey server certificates are reread by pcache, but client certificates do not need to be reread from pcache, because our scalable Odyssey eventually rests on the system performance of reading this certificate. This came as a surprise to us, because he did not immediately rest. At first it scaled linearly, and after 20 incoming simultaneous connections, this problem manifested itself.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Pluggable Authentication Method is the ability to authenticate with built-in lunux tools. In PgBouncer, it is implemented in such a way that there is a separate thread waiting for a response from PAM and there is a main PgBouncer thread that serves the current connection and can ask them to live in the PAM thread.

We did not implement this for one simple reason. We have many streams. Why do we need it?

As a result, this can create problems in that if you have PAM authentication and non-PAM authentication, then a large wave of PAM authentication can significantly delay non-PAM authentication. It's one of those things that we haven't fixed. But if you want to fix it, you can do this.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Another rake was with the fact that we have one thread that accepts all incoming connections. And then they are transferred to the worker pool, where the TLS handshake will take place.

As a result, if you have a coherent wave of 20 network connections, they will all be accepted. And on the client side, libpq will start reporting timeouts. By default, it's like 000 seconds there.

If they cannot all enter the base at the same time, then they cannot enter the base, because all this can be covered by a non-exponential retry.

We ended up copying the PgBouncer scheme here so that we have throttling the number of TCP connections we accept.

If we see that we are accepting connections, but they do not have time to handshake in the end, we put them in a queue so that they do not consume CPU resources. This leads to the fact that a simultaneous handshake may not be performed for all connections that have arrived. But at least someone will enter the database, even if the load is strong enough.

Roadmap

What would you like to see in the future in Odyssey? What are we ready to develop ourselves and what do we expect from the community?

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

For August 2019.

This is what the Odyssey roadmap looked like in August:

  • We wanted SCRAM and PAM authentication.
  • We wanted to forward read requests to standby.
  • I would like online-restart.
  • And the ability to pause on the server.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Half of this roadmap is done, and not by us. And this is good. So let's discuss what's left and add more.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

Regarding forward read-only queries to standby? We have replicas that, without fulfilling requests, will simply heat the air. We need them to provide failover and switchover. In case of problems in one of the data centers, I would like to occupy them with some useful work. Because we cannot configure the same central processors, the same memory in a different way, because replication will not work otherwise.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

In principle, in Postgres, starting from 10, it is possible to specify session_attrs when connecting. You can list all the database hosts in the connection and say why you are going to the database: write or read only. And the driver itself will choose the first host on the list that it likes best, which fulfills the requirements of session_attrs.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

But the problem with this approach is that it does not control the replication lag. You may have some kind of replica that is behind by an unacceptable time for your service. In order to make full-featured execution of read requests on a replica, in fact, we need to support in Odyssey the ability to not work when it is impossible to read.

Odyssey has to go to the database from time to time and ask for the replication distance from the primary. And if it has reached the limit, do not let new requests into the database, tell the client that you need to re-initiate connections and, possibly, select another host to execute requests. This will allow the database to quickly restore the replication lag and return again to respond with a query.

It is difficult to name the implementation dates, because it is open source. But, I hope, not 2,5 years like colleagues from PgBouncer. This is the feature I would like to see in Odyssey.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

In the community, people asked about prepared statement support. Now you can create a prepared statement in two ways. First, you can execute an SQL command, namely "prepared". In order to understand this SQL command, we need to learn how to understand SQL on the Bouncer side. This would be overkill because it's overkill since we need the entire parser. We cannot parse every SQL command.

But there is a prepared statement at the message protocol level on proto3. And this is the place when the information that the prepared statement is being created comes in a structured form. And we could support the understanding that on some server connection the client asked to create prepared statements. And even if the transaction is closed, we still need to keep the server and client connected.

But here there is a discrepancy in the dialogue, because someone says that you need to understand which prepared statements the client created and share the server connection between all the clients that created this server connection, i.e., who created such a prepared statement.

Andres Freund said that if a client came to you who had already created such a prepared statement in another server connection, then create it for him. But it seems to be a bit wrong to execute queries in the database instead of the client, but from the point of view of the developer who writes the protocol for interacting with the database, it would be convenient if he was simply given a network connection that has such a prepared request.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

And one more feature that we need to implement. We now have monitoring compatible with PgBouncer. We can return the average query execution time. But the average time is the average temperature in the hospital: someone is cold, someone is warm - on average everyone is healthy. It is not true.

We need to implement support for percentiles, which would indicate that there are slow requests that consume resources and would make monitoring more acceptable.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

The most important thing is that I want version 1.0 (Version 1.1 has already been released). The fact is that now Odyssey is in version 1.0rc, i.e. release candidate. And all the rake that I listed was fixed exactly with the same version, except for the memory leak.

What will version 1.0 mean for us? We are rolling out the Odyssey to our bases. It is already running on our databases, but when it reaches the point of 1 requests per second, then we can say that this is a release version and this is a version that can be called 000.

Several people in the community have asked for more pause and SCRAM in version 1.0. But this will mean that we will need to roll out the next version to production, because neither SCRAM nor the pause has been merged yet. But, most likely, this issue will be resolved fairly quickly.

Odyssey roadmap: what else do we want from a connection pooler. Andrey Borodin (2019)

I'm waiting for your pull request. And I would also like to hear what problems you have with Bouncer. Let's discuss them. Maybe we can implement some functions that you need.

This concludes my part, I would like to hear from you. Thank you!

Questions

If I put my own application_name, will it be correctly thrown, including in transaction pooling in Odyssey?

Odyssey or Bouncer?

In Odyssey. The Bouncer is thrown.

We'll make a set.

And if my real connection hops over other connections, will it be transmitted?

We will make a set of all the parameters that are listed. I can't tell if application_name is in this list. It seems that he saw him there. We will set all the same parameters. With one request, the set will do everything that was installed by the client during the startup.

Thanks Andrey for the report! Good report! I am glad that Odyssey is developing faster and faster every minute. I would like to continue the same. We have already asked you to have a multi data-source connection so that Odyssey can connect to different databases at the same time, i.e. the slave master, and then automatically connect to the new master after a failover.

Yes, I seem to recall that discussion. Now there are several storages. But there is no switching between them. On our side, we must query the server that it is still alive and understand that a failover has occurred, who will call pg_recovery. I have a standard way to understand that we did not come to the master. And we must understand somehow from the mistakes or how? That is, the idea is interesting, it is being discussed. Write more comments. If you have working hands that know C, then this is generally wonderful.

The issue of scaling across replicas is also of interest to us, because we want to make the adoption of replicated clusters as simple as possible for application developers. But here I would like more comments, that is, how to do it, how to do it well.

The question is also about replicas. It turns out that you have a master and several replicas. And it is clear that they go to the replica less often than to the master for connections, because they may have a difference. You said that the difference in data may be such that your business will not satisfy and you will not go there until it is replicated. At the same time, if you didn’t go there for a long time, and then started to go, then the data that you need will not be immediately available. That is, if we constantly go to the master, then the cache is warmed up there, and the cache is a little behind in the replica.

Yes it's true. There will be no data blocks in pcache that you want, in real cache there will be no information about tables that you want, there will be no parsed queries in the plans, nothing at all.

And when you have some kind of cluster, and you add a new replica there, then while it starts, everything is bad in it, i.e. it grows its cache.

I got the idea. The correct approach would be to run a small percentage of queries on the replica first, which would warm up the cache. Roughly speaking, we have a condition that we must be no more than 10 seconds behind the master. And this condition should not be included in one wave, but smoothly for some clients.

Yes, increase weight.

This is a good idea. But first you need to implement this shutdown. First we need to turn off, and then we will think about how to turn on. This is a great feature to turn on smoothly.

nginx has this option slowly start in the cluster for the server. And he gradually builds up the load.

Yes, great idea, we'll give it a try when we get to that.

Source: habr.com

Add a comment