About moving from Redis to Redis-cluster

About moving from Redis to Redis-cluster

Coming into a product that has been developing for more than a decade, it is not at all surprising to find outdated technologies in it. But what if in six months you have to keep the load 10 times higher, and the cost of falls increases hundreds of times? In this case, you need a cool Highload Engineer. But in the absence of such a maid, they entrusted me with solving the problem. In the first part of the article, I will tell you how we moved from Redis to Redis-cluster, and in the second part I will give advice on how to start using the cluster and what to look for during operation.

Technology Choice

Is it so bad separate Redis (standalone redis) in a configuration of 1 master and N slaves? Why do I call it obsolete technology?

No, Redis is not that bad... However, there are some shortcomings that cannot be ignored.

  • First, Redis does not support disaster recovery mechanisms after a master crash. To solve this problem, we used a configuration with automatic transfer of VIPs to a new master, changing the role of one of the slaves and switching the rest. This mechanism worked, but it was not a reliable solution. Firstly, there were false positives, and secondly, it was disposable, and after triggering, manual actions were required to charge the spring.

  • Secondly, the presence of only one master led to the problem of sharding. I had to create several independent clusters “1 master and N slaves”, then manually distribute the databases among these machines and hope that tomorrow one of the databases would not swell so much that it would have to be moved to a separate instance.

What are my options?

  • The most expensive and richest solution is Redis-Enterprise. This is a boxed solution with full technical support. Despite the fact that it looks ideal from a technical point of view, it did not suit us for ideological reasons.
  • redis-cluster. Out of the box, there is support for master failover and sharding. The interface is almost the same as the regular version. Looks promising, let's talk about pitfalls later.
  • Tarantool, Memcache, Aerospike and others. All of these tools do pretty much the same thing. But each has its drawbacks. We decided not to put all our eggs in one basket. We use Memcache and Tarantool for other tasks, and looking ahead, I will say that in our practice there were more problems with them.

Specificity of use

Let's take a look at what tasks we historically solved with Redis and what functionality we used:

  • Cache before requests to remote services like 2GIS | golang

    GET SET MGET MSET "SELECT DB"

  • Cache before MYSQL | PHP

    GET SET MGET MSET SCAN "KEY BY PATTERN" "SELECT DB"

  • The main storage for the service of working with sessions and coordinates of drivers | golang

    GET SET MGET MSET "SELECT DB" "ADD GEO KEY" "GET GEO KEY" SCAN

As you can see, no higher mathematics. What then is the difficulty? Let's look at each method separately.

Method
Description
Features of redis-cluster
Solution

GET SET
Write/read key

MGET MSET
Write/read multiple keys
The keys will lie on different nodes. Ready-made libraries can do Multi-operations only within one node
Replace MGET with a pipeline of N GET operations

SELECT DB
Choose a base to work with
Doesn't support multiple databases
Put everything in one base. Add prefixes to keys

SCAN
Go through all the keys in the database
Since we have one base, it is too expensive to go through all the keys in the cluster
Maintain an invariant within one key and do HSCAN on this key. Or give up entirely

GEO
Operations with geokey
Geokey is not sharded

KEY BY PATTERN
Key search by pattern
Since we have one base, we will search for all the keys in the cluster. Too costly
Refuse or maintain the invariant, as in the case of SCAN

redis vs redis-cluster

What do we lose and what do we gain when switching to a cluster?

  • Disadvantages: we lose the functionality of several databases.
    • If we want to store logically unrelated data in one cluster, we will have to make crutches in the form of prefixes.
    • We lose all database operations, such as SCAN, DBSIZE, CLEAR DB, etc.
    • Multi-operations have become much more difficult to implement, because it may require access to several nodes.
  • Advantages:
    • Failover in the form of master failover.
    • Sharding on the Redis side.
    • Transfer data between nodes atomically and without downtime.
    • Adding and redistributing capacities and loads without downtime.

I would conclude that if you do not need to provide a high level of fault tolerance, then moving to a cluster is not worth it, because this can be a non-trivial task. But if you initially choose between a separate version and a cluster one, then you should choose a cluster, since it is no worse and, in addition, will remove some of the headache from you

Preparing for relocation

Let's start with the requirements for moving:

  • It must be seamless. A full stop of the service for 5 minutes does not suit us.
  • It should be as safe and gradual as possible. I want to have some control over the situation. We do not want to thump everything at once and pray over the rollback button.
  • Minimal data loss when moving. We understand that it will be very difficult to move atomically, so we allow some desynchronization between data in regular and clustered Redis.

Cluster Maintenance

Before the move, you should think about whether we can support the cluster:

  • Graphs. We use Prometheus and Grafana for graphs of processor load, memory used, number of clients, number of GET, SET, AUTH operations, etc.
  • Expertise. Imagine that tomorrow you will be responsible for a huge cluster. If it breaks, no one but you can fix it. If he starts to slow down, everyone will run to you. If you need to add resources or redistribute the load - back to you. In order not to turn gray at 25, it is advisable to foresee these cases and check in advance how the technology will behave in certain actions. We will talk about this in more detail in the Expertise section.
  • Monitoring and alerts. When a cluster breaks, you want to be the first to know about it. Here we limited ourselves to the notification that all nodes return the same information about the state of the cluster (yes, it happens differently). And the rest of the problems can be quickly noticed by the notifications of the Redis client services.

Move

How will we move:

  • First of all, you need to prepare the library for working with the cluster. As a basis for the Go version, we took go-redis and modified it a bit for ourselves. We implemented Multi-methods through pipelines, and also slightly corrected the rules for repeating requests. The PHP version had more problems, but we eventually settled on php-redis. They recently implemented cluster support and it looks good in our opinion.
  • Next, you need to deploy the cluster itself. This is done literally in two commands based on the configuration file. The setting will be discussed in more detail below.
  • For a gradual move, we use dry-mode. Since we have two versions of the library with the same interface (one for the regular version, the other for the cluster), it doesn’t cost anything to make a wrapper that will work with a separate version and simultaneously duplicate all requests to the cluster, compare responses and write discrepancies to the logs ( in our case in NewRelic). Thus, even if the cluster version breaks during the rollout, our production will not be affected.
  • Having rolled out the cluster in dry mode, we can calmly look at the graph of response discrepancies. If the proportion of errors is slowly but surely moving towards some small constant, then everything is fine. Why are there still discrepancies? Because the recording in a separate version occurs a little earlier than in the cluster, and due to the microlag, the data may diverge. It remains only to look at the discrepancy logs, and if all of them can be explained by the non-atomic nature of the record, then we can go further.
  • Now you can switch dry-mode in the opposite direction. We will write and read from the cluster, and duplicate it into a separate version. For what? Over the next week, I want to observe the work of the cluster. If it suddenly turns out that there are problems at the peak of the load, or we didn’t take something into account, we always have an emergency rollback to the old code and up-to-date data thanks to dry-mode.
  • It remains to disable dry-mode and dismantle a separate version.

Экспертиза

First, briefly about the cluster device.

First of all, Redis is a key-value store. Arbitrary strings are used as a key. Numbers, strings, and integer structures can be used as values. There are a great many of the latter, but this is not important for us to understand the general structure.
The next level of abstraction after keys is slots (SLOTS). Each key belongs to one of 16 slots. Each slot can contain any number of keys. Thus, all keys fall into 383 disjoint sets.
About moving from Redis to Redis-cluster

Further, there should be N master nodes in the cluster. Each node can be thought of as a separate Redis instance that knows everything about the other nodes within the cluster. Each master node contains a number of slots. Each slot belongs to only one master node. All slots need to be distributed between nodes. If some slots are not allocated, then the keys stored in them will not be available. It makes sense to run each master node on a separate logical or physical machine. It is also worth remembering that each node only runs on one core, and if you want to run multiple instances of Redis on the same logical machine, then make sure they run on different cores (we have not tried this, but in theory everything should work) . Essentially, master nodes provide normal sharding, and more master nodes allow you to scale write and read requests.

After all the keys are distributed over the slots, and the slots are scattered over the master nodes, an arbitrary number of slave nodes can be added to each master node. Inside each such “master-slave” bundle, normal replication will work. Slaves are needed for scaling read requests and for failover in the event of a master failure.
About moving from Redis to Redis-cluster

Now let's talk about operations that it would be better to be able to do.

We will access the system through Redis-CLI. Since Redis does not have a single entry point, you can perform the following operations on any of the nodes. In each paragraph, I separately draw attention to the possibility of performing an operation under load.

  • The first and most important thing we need is the cluster nodes operation. It returns the state of the cluster, shows the list of nodes, their roles, allocation of slots, etc. For more information, see cluster info and cluster slots.
  • It would be nice to be able to add and remove nodes. There are cluster meet and cluster forget operations for this. Please note that cluster forget must be applied to EVERY node, both masters and replicas. And cluster meet is enough to call only on one node. This difference can be daunting, so it's best to learn about it before you put your cluster into production. Adding a node is safely performed in combat and does not affect the operation of the cluster in any way (which is logical). If you are going to remove a node from the cluster, then you should make sure that there are no slots left on it (otherwise you risk losing access to all keys on this node). Also, don't delete a master that has slaves, otherwise an unnecessary vote for a new master will be performed. If there are no slots on the nodes, then this is a small problem, but why do we need extra choices if we can first remove the slaves.
  • If you need to forcefully swap the master and slave, then the cluster failover command will do. When calling it in battle, you need to understand that the master will be unavailable during the operation. Typically, the switch occurs in less than a second, but not atomically. You can expect some requests to the master to fail at this time.
  • Before removing a node from the cluster, it should not have any slots left. It is better to redistribute them using the cluster reshard command. Slots will be transferred from one master to another. The entire operation may take several minutes, depending on the amount of data being transferred, but the transfer process is safe and does not affect the operation of the cluster in any way. Thus, all data can be transferred from one node to another right under load, and not worry about their availability. However, there are also subtleties. Firstly, the transfer of data is associated with a certain load on the node of the recipient and sender. If the recipient node is already heavily loaded on the processor, then you should not load it with the reception of new data. Secondly, as soon as there is not a single slot left on the sending master, all its slaves will immediately go to the master to which these slots were transferred. And the problem is that all these slaves will want to synchronize data at once. And you'll be lucky if it's a partial rather than a full sync. Keep this in mind and combine the operations of transferring slots and disabling / transferring slaves. Or hope that you have a sufficient margin of safety.
  • What to do if during the transfer you find that you have lost slots somewhere? I hope this problem does not affect you, but if anything, there is a cluster fix operation. At the very least, she will scatter the slots among the nodes in a random order. I recommend checking its work by first removing the node with distributed slots from the cluster. Since the data in unallocated slots is already unavailable, it's too late to worry about problems with the availability of these slots. In turn, the operation will not affect the allocated slots.
  • Another useful operation is monitor. It allows you to see in real time the entire list of requests going to the node. Moreover, you can grep it and find out if there is the necessary traffic.

Also worth mentioning is the master failover procedure. In short, it is, and, in my opinion, it works great. However, do not think that if you unplug the power cord on the machine with the master node, Redis will immediately switch and clients will not notice the loss. In my practice, switching takes a few seconds. During this time, part of the data will be unavailable: the unavailability of the master is detected, the nodes vote for the new one, the slaves are switched, the data is synchronized. The best way to make sure for yourself that the scheme is working is to conduct local exercises. Set up a cluster on your laptop, give it a minimal load, simulate a crash (for example, by blocking ports), evaluate the switching speed. In my opinion, only after playing in this way for a day or two, you can be sure that the technology will work. Well, or hope that the software used by half of the Internet probably works.

Configuration

Often, the configuration is the first thing you need to start working with the tool. And when everything is working, you don’t want to touch the config. It takes some effort to force yourself to go back to the settings and go through them thoroughly. In my memory, we had at least two serious fails due to inattention to the configuration. Pay special attention to the following points:

  • timeout 0
    Time after which inactive connections are closed (in seconds). 0 - do not close
    Not every library of ours was able to correctly close connections. By disabling this setting, we run the risk of running into a limit on the number of clients. On the other hand, if there is such a problem, then automatically breaking lost connections will mask it, and we may not notice. Also, don't enable this setting when using persist connections.
  • Save xy & appendonly yes
    Saving an RDB snapshot.
    RDB/AOF issues will be discussed in detail below.
  • stop-writes-on-bgsave-error no & slave-serve-stale-data yes
    If enabled, then if the RDB snapshot breaks, the master will stop accepting change requests. If the connection to the master is lost, the slave may continue to respond (yes). Or stop answering (no)
    We are not happy with the situation in which Redis turns into a pumpkin.
  • repl-ping-slave-period 5
    After this period of time, we will begin to worry that the master has broken and it would be time to carry out the failover procedure.
    You will have to manually find a balance between false positives and the launch of a failover. In our practice, this is 5 seconds.
  • repl-backlog-size 1024mb & epl-backlog-ttl 0
    Exactly as much data we can store in the buffer for the fallen off replica. If the buffer runs out, you will have to fully synchronize.
    Practice suggests that it is better to set a larger value. There are plenty of reasons why a replica might start to lag behind. If it lags behind, then most likely your master is already struggling to cope, and full synchronization will be the last straw.
  • maxclients 10000
    The maximum number of one-time clients.
    In our experience, it's better to put a larger value. Redis handles 10K connections just fine. Just make sure the system has enough sockets.
  • maxmemory-policy volatile-ttl
    The rule by which keys are deleted when the available memory limit is reached.
    It is not the rule itself that is important, but understanding how this will happen. Redis can be commended for its ability to perform normally when the memory limit is reached.

RDB and AOF issues

Although Redis itself stores all information in RAM, there is also a mechanism for saving data to disk. More specifically, there are three mechanisms:

  • RDB-snapshot is a complete snapshot of all data. Set using the SAVE XY configuration and read as "Save a full snapshot of all data every X seconds if at least Y keys have changed."
  • append-only file - a list of operations in the order they are performed. Adds new incoming operations to the file every X seconds or every Y operations.
  • RDB and AOF - a combination of the two previous ones.

All methods have their advantages and disadvantages, I will not list them all, I will only pay attention to points that are not obvious, in my opinion.

First, saving an RDB snapshot requires calling FORK. If there is a lot of data, this can hang the entire Redis for a period from a few milliseconds to a second. In addition, the system needs to allocate memory for such a snapshot, which leads to the need to keep double the reserve of RAM on the logical machine: if 8 GB is allocated for Redis, then 16 should be available on the virtual machine with it.

Secondly, there are problems with partial synchronization. In AOF mode, when a slave is reconnected, full synchronization can be performed instead of partial synchronization. Why this happens, I could not understand. But it's worth remembering.

These two points already make you think about whether we really need this data on the disk, if everything is duplicated by slaves anyway. You can lose data only if all the slaves fail, and this is a “fire in DC” level problem. As a compromise, you can suggest saving data only on slaves, but in this case you need to make sure that these slaves will never become a master during disaster recovery (there is a slave priority setting in their config for this). For ourselves, in each specific case, we think about whether it is necessary to save data to disk, and most often the answer is “no”.

Conclusion

In conclusion, I hope that I was able to give a general idea about the work of redis-cluster to those who have not heard about it at all, and also drew attention to some non-obvious points for those who have been using it for a long time.
Thank you for your time and, as usual, comments on the topic are welcome.

Source: habr.com

Add a comment