Distributed locks using Redis

Hey Habr!

Today we bring to your attention the translation of a complex article on the implementation of distributed locks using Redis and we propose to talk about the prospects of Redis as a topic. Analysis of the considered Redlock algorithm by Martin Kleppman, author of the book "High load applications", is given here.

Distributed locks are a very useful primitive used in many environments where different processes need to work on shared resources in a mutually exclusive manner.

There are a number of libraries and posts that describe how to implement a DLM (Distributed Lock Manager) with Redis, but each library takes a different approach and the guarantees it provides are rather weak compared to what is achievable with a slightly more complex design.

In this article, we will try to describe a conditionally canonical algorithm that demonstrates how to implement distributed locking with Redis. We'll talk about an algorithm called redlock, it implements a distributed lock manager and, in our opinion, this algorithm is safer than the usual approach with a single instance. We hope that the community will analyze it, give feedback and use it as a starting point for the implementation of more complex or alternative projects.

Implementation

Before proceeding to the description of the algorithm, we give several links to ready-made implementations. They can be used for reference.

Security and Availability Guarantees

We are going to model our project with just three properties that we think provide the minimum guarantees needed to use distributed locks effectively.

  1. Security property: Mutual exclusion. At any given time, only one client can hold a lock.
  2. Accessibility property A: No deadlocks. You can always get a lock eventually, even if the client that held the resource fails or hits a different disk segment.
  3. Availability property B: Fault tolerance. As long as the majority of Redis nodes are running, clients are able to acquire and release locks.

Why a failover-based implementation is not enough in this case
To understand what we are going to improve, let's analyze the current state of affairs that has developed with most libraries for distributed locks based on Redis.

The easiest way to lock a resource with Redis is to create a key in the instance. Usually a key is created with a limited lifetime, this is achieved using the expires feature provided in Redis, so sooner or later this key is released (property 2 in our list). When a client needs to release a resource, it deletes the key.

At first glance, this solution seems to work, but there is a problem: our architecture has a single point of failure. What happens if the lead Redis instance fails? Let's add a follower then! And we will use it if the host is not available. Unfortunately, this option is not viable. By doing so, we will not be able to correctly implement the mutual exclusion property that we need for security, because replication in Redis is asynchronous.

Obviously, a race condition occurs in such a model:

  1. Client A acquires a lock on the master.
  2. The master fails before the write to the key is transferred to the slave.
  3. Follower is promoted to leader.
  4. Client B acquires a lock on the same resource that is already locked by A. SECURITY BREACH!

Sometimes it is perfectly normal that in special circumstances, such as a failure, many clients can hold a lock at the same time. In such cases, a replication-based solution can be applied. Otherwise, we recommend the solution described in this article.

Correct implementation with a single instance

Before attempting to overcome the shortcomings of the single-instance configuration described above, let's look at how to proceed correctly in this simple case, since such a solution is actually acceptable in applications where a race condition is occasionally acceptable, and also because locking on a single instance serves as the basis that is used in the distributed algorithm described here.

To acquire a lock, do the following:

SET resource_name my_random_value NX PX 30000

This command sets the key only if it does not already exist (option NX), with a validity period of 30000 milliseconds (option PX). The key is set to “myrandomvalue". This value must be unique across all clients and across all lock requests.
Basically, a random value is used to safely release the lock, with a script telling Redis to delete the key only if it exists, and the value stored in it is exactly what was expected. This is achieved with the following Lua script:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

This is important to prevent a lock being taken by another client from being released. For example, a client might acquire a lock, then lock in some operation that lasts longer than the first lock expires (so that the key has time to expire), and later remove the lock that some other client has held.
Using a simple DEL is unsafe because a client could release a lock held by another client. On the contrary, when using the script above, each lock is “signed” with a random string, so only the client that previously placed it can remove it.

What should this random string be? I guess it should be 20 bytes from /dev/urandom, but there are less expensive ways to make the string unique enough for your purposes. For example, it will be fine to seed RC4 to /dev/urandom and then generate a pseudo-random stream based on it. A simpler solution involves a combination of unix time in microsecond resolution plus a client ID; it's not as secure, but is arguably up to the task level in most contexts.

The time we use as a measure of the lifetime of a key is called "Lock Validity". This value is both the amount of time after which the lock is automatically released and the amount of time a client has to complete an operation before another client can in turn lock the resource without actually violating the mutual exclusion guarantees. This guarantee is limited only to a certain window of time, which begins from the moment the lock is purchased.

So we've discussed a good way to acquire and release a lock. The system (if we are talking about a non-distributed system consisting of a single and always available instance) is secure. Let's extend this concept to a distributed system where we don't have such guarantees.

Redlock algorithm

The distributed version of the algorithm assumes that we have N leading Redis. These nodes are completely independent of each other, so we do not use replication or any other implicit coordination system. We've already covered how to securely acquire and release a lock on a single instance. We take it for granted that the algorithm, when working with a single instance, will use this particular method. In our examples, we set N to 5, which is a perfectly reasonable value. As such, we will need to run 5 Redis masters on different machines or virtual machines to ensure that they operate largely independently of each other.

To acquire a lock, the client performs the following operations:

  1. Gets the current time in milliseconds.
  2. Sequentially attempts to acquire a lock on all N instances, using the same key name and random values ​​in all cases. In step 2, when establishing a lock on a per-instance basis, the client uses a delay to obtain the lock, which is fairly short compared to the time after which the lock is automatically released. For example, if the block duration is 10 seconds, then the delay can be in the range of ~5-50 milliseconds. This eliminates the situation in which the client could remain blocked for a long time trying to reach the failed Redis node: if the instance is not available, then we try to connect to another instance as soon as possible.
  3. To take a lock, the client calculates how much time has elapsed; to do this, it subtracts from the current time value the timestamp that was obtained in step 1. If and only if the client was able to acquire the lock on most instances (at least 3), and the total time it took to acquire the lock, less than the duration of the lock, it is considered that the acquisition of the lock has taken place.
  4. If a lock has been acquired, then the initial value of the lock duration minus the elapsed time calculated in step 3 is taken as the duration of the lock.
  5. If the client fails to get the lock for some reason (either it was unable to lock N/2+1 instances, or the lock duration turned out to be negative), then it will try to unlock all instances (even those that, as it was thought, it could not lock).

Is the algorithm asynchronous?

This algorithm is based on the assumption that, although there is no synchronized clock for all processes to work, the local time in each process still flows at approximately the same pace, and the error is small compared to the total time after which the lock is automatically released. This assumption is very similar to the situation in ordinary computers: each computer has a local clock, and we can usually expect that the difference in time between computers is small.

At this point, we need to be more specific about our mutual exclusion rule: Mutual exclusion is guaranteed only if the client holding the lock exits in the time that the lock is valid (this value was obtained in step 3), minus some more time (total a few milliseconds to compensate for the time lag between processes).

The following interesting article tells more about such systems that require the coordination of a time difference: Leases: an efficient fault-tolerant mechanism for distributed file cache consistency.

Retry on failure

When a client fails to acquire a lock, it must try again after a random delay; this is done to keep many clients out of sync trying to acquire a lock on the same resource at the same time (which can lead to a "split-brain" situation where there are no winners). Also, the faster a client attempts to acquire a lock on most Redis instances, the narrower the window into which a split brain situation can occur (and the less need for retries). Therefore, ideally, the client should try to simultaneously send SET commands to N instances using multiplexing.

It is worth emphasizing here how important it is that clients who fail to acquire most locks release (partially) acquired locks so that they do not have to wait for the key to expire before the lock on the resource can be acquired again (though if network fragmentation occurs , and the client loses contact with the Redis instances, then you have to pay an availability violation penalty while the key is waiting to expire).

Lock release

Releasing a lock is a simple operation that simply requires you to unlock all instances, whether or not the client appears to be able to successfully lock a particular instance.

Security Considerations

Is the algorithm secure? Let's try to imagine what happens in different scenarios.

First, let's assume that the client was able to acquire a lock on most of the instances. Each of the instances will contain a key with the same lifetime for all. However, each of these keys was installed at its own time, so they will expire at different times. But, if the first key was set at a time no worse than T1 (the time we choose before contacting the first server), and the last key was set at a time no worse than T2 (the time at which the response from the last server was received), then we sure that the first key in the set that expires will last at least MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT. All other keys will expire later, so we can be sure that all keys will be valid at the same time for at least this time.

During the time that most of the keys remain valid, no other client will be able to acquire the lock, since N/2+1 SET NX operations cannot succeed if N/2+1 keys already exist. Therefore, once a lock has been acquired, it is not possible to reacquire it at the same time (this would violate the mutual exclusion property).
However, we want to make sure that multiple clients trying to acquire a lock at the same time cannot succeed at the same time.

If the client has locked out most of the instances for about or more than the maximum lock duration time, it will invalidate the lock and unlock the instances. Therefore, we have to consider only the case in which the client managed to block the majority of instances in less than the duration. In this case, with regard to the above argument, during the time MIN_VALIDITY no client should be able to reacquire the lock. Therefore, multiple clients will be able to lock N/2+1 instances in the same time (which ends at the end of stage 2) only when the time to lock the majority was greater than the TTL time, thus invalidating the lock.

And can you give a formal proof of security, indicate the existing similar algorithms, or find a bug in the above?

Accessibility Considerations

The availability of a system depends on three main characteristics:

  1. Automatic release of the lock (because the keys expire): eventually the keys will be available again to be used for locks.
  2. The fact that clients usually help each other by removing locks when the desired lock was not acquired, or was acquired and the work was completed; so it's likely that we won't have to wait for the keys to expire to reacquire the lock.
  3. The fact that when a client needs to retry to acquire a lock, it waits for a comparatively longer time than the period required to acquire most locks. This reduces the likelihood of a split-brain situation in competition for resources.

However, you have to pay a penalty for reduced availability, equal to the TTL time in network segments, so if there are continuous segments, then this penalty can become indefinite. This happens whenever a client acquires a lock and then clips to another segment before it can release it.

In principle, if there are infinite continuous network segments, the system can remain unavailable for an infinite period of time.

Performance, failover and fsync

Many people use Redis because they want high performance for the lock server, in terms of the latency needed to acquire and release locks, as well as the number of such acquisition/release operations that can be completed per second. To meet this requirement, there is a strategy for communicating with N Redis servers to reduce latency. This is a multiplexing strategy (or "poor man's multiplexing" where the socket is set to non-blocking mode, sends all the commands, and reads the commands later, assuming the round-trip time between the client and each of the instances is similar).

However, there is also the consideration of long-term data storage if we are to create a model with robust failure recovery.

Basically, to clarify the issue, let's assume we're configuring Redis with no persistence at all. The client manages to block 3 out of 5 instances. One of the instances that the client managed to block is restarted, at which point 3 instances for the same resource reappear, which we can block, and the other client can in turn block the restarted instance, violating the security property that assumes lock exclusivity.

If you enable save-ahead data (AOF), the situation improves slightly. For example, you can promote a server by issuing the SHUTDOWN command and restarting it. Since the expiration operations in Redis are semantically implemented in such a way that time continues to flow even when the server is down, all our requirements are fine. Normal as long as normal shutdown is provided. What to do in case of power outage? If Redis is configured by default, with fsync syncing on disk every second, then it is possible that after a restart we will not get enough of our key. Theoretically, if we want to guarantee the safety of locks on any restart of the instance, we should include fsync=always in the settings for long-term data storage. This will completely kill performance, to the level of such CP systems, which are traditionally used for the secure implementation of distributed locks.

But the situation is better than it seems at first glance. In principle, the security of the algorithm is preserved because when an instance is restarted after a failure, it no longer participates in any lock that is currently active.

To ensure this, we just need to ensure that after a failure, the instance remains unavailable for a time slightly exceeding the maximum TTL that we use. So we wait for the expiration and automatic release of all keys that were active at the time of the failure.

By using delayed restarts, it is in principle possible to achieve security even in the absence of any long-term persistence in Redis. Note, however, that this may result in a fine for violating accessibility. For example, if most instances fail, the system will become globally unavailable for the TTL time (and no resource can be blocked during this time).

We increase the availability of the algorithm: we extend the blocking

If the work performed by clients consists of small steps, it is possible to reduce the default lock duration and implement a mechanism to extend locks. In principle, if the client is busy with calculations and the value of the lock expiration drops dangerously, you can send a Lua script to all instances that extends the TTL of the key if the key still exists and its value is still the random value obtained when the lock was acquired.

Customer shall only consider a lock repurchased when it has successfully locked the majority of instances during the validity period.

True, the algorithm does not change technically, so the maximum number of retries to acquire locks must be limited, otherwise the accessibility properties will be violated.

Source: habr.com

Add a comment