Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

I propose to read the transcript of the lecture "Hadoop. ZooKeeper" from the series "Methods for distributed processing of large amounts of data in Hadoop"

What is ZooKeeper, its place in the Hadoop ecosystem. The lie about distributed computing. Scheme of a standard distributed system. Difficulty coordinating distributed systems. Typical coordination problems. The principles behind the design of ZooKeeper. ZooKeeper data model. znode flags. Sessions. Client API. Primitives (configuration, group membership, simple locks, leader election, locking without herd effect). ZooKeeper architecture. ZooKeeper DB. ZAB. Request handler.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Today we will talk about ZooKeeper. This thing is very useful. It, like any Apache Hadoop product, has a logo. It depicts a person.

Before that, we mainly talked about how data can be processed there, how to store it, that is, how to use it somehow and work with it somehow. And today I would like to talk a little about building distributed applications. And ZooKeeper is one of those things that makes it easy. This is a kind of service that is designed for some kind of coordination of the interaction of processes in distributed systems, in distributed applications.

The need for such applications is getting bigger and bigger every day, that's what our course is about. On the one hand, MapReduce and this ready-made framework allows you to level this complexity and free the programmer from writing primitives such as interaction, coordination of processes. But on the other hand, no one guarantees that this will not have to be done anyway. Not always MapReduce or other ready-made frameworks completely replace some cases that cannot be implemented on this one. Including MapReduce itself and a bunch of other Apache projects, they, in fact, are also distributed applications. And in order to make it easier to write, they wrote ZooKeeper.

Like all Hadoop-related applications, it was developed by Yahoo! Now it is also the official application of Apache. It is not as actively developed as HBase. If you go to JIRA HBase, then every day there are a lot of tasks for bugs, a lot of suggestions to optimize something, that is, life in the project is constantly going on. And ZooKeeper, on the one hand, is a relatively simple product, and on the other hand, this ensures its reliability. And it's pretty easy to use, which is why it's become a standard in applications within the Hadoop ecosystem. Therefore, I thought it would be useful to review it in order to understand how it works and how to use it.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

This is a picture from some lecture we had. We can say that it is orthogonal to everything that we have considered so far. And everything that is indicated here, to one degree or another, works with ZooKeeper, that is, it is a service that uses all these products. Neither HDFS nor MapReduce write their own similar services that would specifically work for them. Accordingly, ZooKeeper is used. And it simplifies development and some things related to errors.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Where does all this come from? It would seem that they launched two applications in parallel on different computers, connected them with a cord or in a grid, and everything works. But the problem is that the Network is unreliable, and if you sniff the traffic or look at what is happening there at a low level, how clients interact on the Network, you can often see that some packets are lost, resent. It is not for nothing that they came up with TCP protocols that allow you to establish a certain session and guarantee the delivery of messages. But in any case, even TCP can not always save. Everything has a timeout. The network may just fall off for a while. She can just blink. And all this leads to the fact that you can not rely on the fact that the Network is reliable. This is the main difference from writing parallel applications that run on one computer or on some one supercomputer, where there is no network, where there is a more reliable data exchange bus in memory. And this is the fundamental difference.

Among other things, using the Network, there is always some kind of latency. The disk also has it, but the Web has more. Latency is some kind of delay time, which can be both small and quite significant.

The network topology is changing. What is topology is the placement of our network equipment. There are data centers, there are racks that stand there, there are candles. All this can be reconnected, moved, etc. All this must also be taken into account. IPs are changing, routing is changing, through which we have traffic. This also needs to be taken into account.

The network can also change in terms of equipment. From practice, I can say that our network engineers are very fond of periodically updating something on the candles. All of a sudden there's a new firmware out and they're not particularly interested in some Hadoop cluster. They have their own work. For them, the main thing is that the Network works. Accordingly, they want to re-upload something there, to do a flashing on their hardware, while the hardware also changes periodically. All this must be taken into account in some way. All this affects our distributed application.

Usually, people who start working with a large amount of data, for some reason, believe that the Web is limitless. If there is a file of several terabytes, then you can take it to your server or computer, open it with cat and watch. Another of the errors is Vim watch logs. Never do that, because it's bad. Because Vim is trying to buffer everything, load everything into memory, especially when we start to navigate through this log and look for something. These are things that are forgotten, but worth considering.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

It's easier to write one program that runs on one computer with one processor.

When our system grows, we want to parallelize all this, and parallelize not only on a computer, but also on a cluster. The question arises - how to coordinate this business? Our applications may not even interact with each other, but we launched several processes in parallel on several servers. And how to monitor that everything is going well for them? They, for example, send something over the Web. They must write somewhere about their state, for example, to some database or to a log, then aggregate this log and then analyze it somewhere. Plus, you need to take into account that the process was working, working, suddenly some kind of error appeared in it or it crashed, then how quickly will we find out about it?

It is clear that all this can be quickly monitored. This is also good, but monitoring is a limited thing that allows you to monitor some things at the highest level.

When we want our processes to start interacting with each other, for example, send some data to each other, then the question also arises - how will this happen? Will there be some kind of race condition, will they overwrite each other, will the data arrive correctly, is something lost along the way? It is necessary to develop some kind of protocol, etc.

The coordination of all these processes is not a trivial thing. And it forces the developer to go even lower and write systems either from scratch or not completely from scratch, but it's not that easy.

If you came up with a cryptographic algorithm or even implemented it, then take it and throw it away right away, because most likely it will not work for you. It will most likely contain a bunch of errors that you forgot to foresee. In no case do not use it for something serious, because most likely it will be unstable. Because all the algorithms that exist, they are tested by time for a very long time. It is searched for bugs by the community. This is a separate issue. And it's the same here. If it is possible not to implement some kind of process synchronization on your own, then it is better not to do this, because it is quite difficult and leads you on the unsteady path of constantly looking for errors.

Today we are talking about ZooKeeper. On the one hand, it is a framework, on the other hand, it is a service that makes life easier for the developer and makes it as easy as possible to implement the logic and coordinate our processes.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Recall what a standard distributed system might look like. This is what we talked about - this is HDFS, HBase. There is a Master process that manages workers, slave processes. He is responsible for coordinating and distributing tasks, restarting workers, launching new ones, and balancing the load.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

A more advanced thing is the Coordination Service, i.e. move the coordination task itself into a separate process, plus run some kind of backup or stanby of the Master in parallel, because the Master may crash. And if the Master falls, then our system will not work. We are running backup. Some states that the Master has need to be replicated to backup. This can also be entrusted to the Coordination Service. But in this scheme, the Master himself coordinates the workers, here the service coordinates data replication.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

A more advanced option is when our service does all the coordination, as is usually done. He takes responsibility for making sure everything works. And if something does not work, we find out about it and try to get around this situation. In any case, we are left with the Master, which somehow interacts with the slaves, can send data, information, messages, etc. through a certain service.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

There is an even more advanced scheme, when we do not have a Master, all nodes are master-slaves, different in their behavior. But all the same, they need to interact with each other, so there is still a certain service for coordinating these actions. Probably, Cassandra, which works according to this principle, fits this scheme.

It is difficult to say which of these schemes works best. Each has its pros and cons.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

And there is no need to be afraid of some things with the Master, because, as practice shows, he is not so prone to constantly giving. The main thing here is to choose the right solution for hosting this service on a separate powerful node so that it has enough resources so that, if possible, users do not have access there, so that they do not accidentally kill this process. But at the same time, in such a scheme, it is much easier to manage workers from the Master Process, i.e., this scheme is simpler in terms of implementation.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

And this scheme (above) is probably more complicated, but more reliable.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

The main problem is partial failures. For example, when we send a message over the Network, some kind of accident occurs, and the one who sent the message will not know if his message has reached and what happened on the receiver side, will not know if the message was processed correctly, i.e. he will not receive any confirmation.

Accordingly, we must process this situation. And the simplest thing is to resend this message and wait until we receive a response. In this case, it is not taken into account anywhere whether the state of the receiver has changed. Perhaps we will send a message and add the same data twice.

ZooKeeper offers ways to deal with such failures, which also makes life easier for us.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

As mentioned earlier, this is similar to writing multi-threaded programs, but the main difference is that in distributed applications that we build on different machines, the only way to interact is the Network. In essence, this is a shared-nothing architecture. Each process or service that runs on one machine has its own memory, its own disk, its own processor, which it does not share with anyone.

If we are writing a multi-threaded program on one computer, then we can use shared memory to exchange data. We have a context switch there, processes can switch. This affects performance. On the one hand, there is no such thing in the program on the cluster, but there are problems with the Network.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Accordingly, the main problems that arise when writing distributed systems are the configuration. We are writing some application. If it is simple, then we hardcode all sorts of numbers in the code, but this is inconvenient, because if we decide that instead of a timeout of half a second, we want a timeout of one second, then we need to recompile the application, roll it all out again. It's one thing when it's on one machine, when you can just restart it, and when we have many machines, we have to constantly copy everything. We must try to make the application configurable.

Here we talk about static configuration for system processes. This is not quite, maybe, from the point of view of the operating system, it can be a static configuration for our processes, i.e. this is a configuration that cannot simply be taken and updated.

And also there is a dynamic configuration. These are the parameters that we want to change on the fly so that they are picked up there.

What is the problem here? We updated the config, rolled it out, so what? The problem may be that on the one hand we rolled out the config, but forgot about the new thing, the config remained there. Secondly, while we were rolling out, somewhere the configuration was updated, but somewhere not. And some processes of our application that run on the same machine restarted with a new config, and somewhere with the old one. This can result in our distributed application not being in a consistent state in terms of configuration. This problem is common. For dynamic configuration, it is more relevant, because it is assumed that it can be changed on the fly.

Another issue is group membership. We always have some set of workers, we always want to know which of them is alive, which of them is dead. If there is a Master, then he must understand which workers can be redirected to clients so that they start calculations or work with data, and which cannot. The problem constantly arises that you need to know who is working in our cluster.

Another typical problem is leader election when we want to know who is in charge. One example is replication, when we have some process that accepts write operations and then replicates them to other processes. He will be the leader, all the rest will obey him, will follow him. It is necessary to choose such a process that it is unambiguous for everyone, so that it does not happen that two leaders get out.

There is also mutually exclusive access. Here the problem is more complex. There is such a thing as a mutex, when you write multi-threaded programs and want access to some resource, for example, to a memory cell, to be limited and carried out by only one thread. Here, the resource could be something more abstract. And different applications from different nodes of our Network should only receive exclusive access to this resource, and not so that everyone can change it or write something there. These are the so-called locks.

ZooKeeper allows you to solve all these problems to one degree or another. And I will show with examples how it allows you to do this.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

There are no blocking primitives. When we start using something, this primitive will not wait for any event to occur. Most likely, this thing will work asynchronously, thereby allowing processes not to hang in the state that they are waiting for something. This is a very useful thing.

All client requests are processed in the order of the general queue.

And clients have the opportunity to receive notification about changes in some state, about data changes, before the client sees the changed data itself.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

ZooKeeper can work in two modes. The first is standalone, on one node. This is handy for testing. And also it can work in cluster mode, on any number of servers. If we have a cluster of 100 machines, then it is not necessary that it works on 100 machines. It is enough to select a few machines where you can run ZooKeeper. And it professes the principle of high availability. ZooKeeper keeps a full copy of the data on every running instance. I'll tell you later how he does it. It does not shard data, does not partition it. On the one hand, this is a minus that we cannot store a lot, on the other hand, this is not necessary. It's not designed for that, it's not a database.

Data can be cached on the client side. This is a standard principle so that we do not pull the service, do not load it with the same requests. A smart client usually knows about this and caches it at home.

For example, something has changed. There is some application. We chose a new leader who is responsible, for example, for processing write operations. And we want to replicate the data. One solution is to put a cycle. And we constantly poll our service - has something changed? The second option is more optimal. This is a watch mechanism that allows you to notify clients that something has changed. This is a less expensive way in terms of resources and more convenient for customers.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Client is the user who is using ZooKeeper.

Server is the ZooKeeper process itself.

Znode is the core feature of ZooKeeper. All znodes are stored in memory by ZooKeeper and organized in a hierarchical scheme, in the form of a tree.

There are two types of operations. The first is update/write, when some operation changes the state of our tree. The tree is general.

And it is possible that the client does not fulfill one request and is cut off, but can establish a session with which it interacts with ZooKeeper.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

ZooKeeper's data model resembles a file system. There is a standard root, and then we went as if through the directories that come from the root. And then the catalog of the first level, the second level. It's all znodes.

Each znode can store some data, usually not very large, for example, 10 kilobytes. And each znode can have some number of children.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Znodes come in several types. They can be created. And when creating a znode, we specify the type to which it should belong.

There are two types. The first is the ephemeral flag. Znode lives within a session. For example, the client has established a session. And while this session is alive, it will exist. This is necessary in order not to produce something superfluous. It is also suitable for such moments when it is important for us to store data primitives within the session.

The second type is the sequential flag. It increments a counter on the way to the znode. For example, we had a directory with application 1_5. And when we created the first node, it received p_1, the second - p_2. And when we call this method every time, we pass the full path, indicating only a part of the empty, and this number is automatically incremented, because we specify the node type - sequential.

Regular znode. She will live forever and have the name that we will tell her.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Another useful thing is the watch flag. If we set it, then the client can subscribe to some events for a particular node. I will show later with an example how this is done. ZooKeeper itself notifies the client that the data on the node has changed. At the same time, notifications do not guarantee that some new data has arrived. They just say that something has changed, so later you will have to compare the data anyway with separate calls.

And as I said, the data order is determined by kilobytes. There is no need to store large text data there, because this is not a database, this is a server for coordinating actions.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

I'll tell you a little about the sessions. If we have several servers, then we can transparently move from server to server using the session ID. It's pretty convenient.

Each session has some kind of timeout. A session is defined by whether the client sends something to the server during that session. If he did not transmit anything during the timeout, the session falls off, or the client can close it himself.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

It doesn't have that many features, but you can do a lot of stuff with this API. The call we saw create creates a znode and takes three parameters. This is the path to the znode and must be qualified from the root. And also this is some data that we want to transfer there. And the type of flag. And after creation, it returns the path to znode.

The second one can be removed. Here the trick is that the second parameter, in addition to the path to znode, you can specify the version. Accordingly, that znode will be deleted if its version, which we passed, is equivalent to the one that actually is.

If we want to not check this version, then we simply pass the "-1" argument.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Third, it checks for the existence of znode. Returns true if the node exists, false otherwise.

And then flag watch appears, which allows you to set monitoring for this node.

You can set this flag even on a non-existent node and get a notification when it appears. This is also useful.

A couple more calls are getData. It is clear that we can get data by znode. You can also use flag watch. In this case, it will not be installed if there is no node. Therefore, you need to understand that it exists, and then get the data.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

There is also SetData. Here we are passing version. And if we pass it, then the data on the znode of a certain version will be updated.

You can also specify "-1" to exclude this check.

Another useful method is getChildren. We can also get a list of all znodes that belong to it. We can watch this by setting flag watch.

And method sync allows all changes to be sent at once, thereby ensuring that they are saved and all data is completely changed.

If you draw analogies with conventional programming, then when you use methods such as write that write something to disk, and after it returns a response to you, there is no guarantee that you have data written to disk. And even when the operating system is sure that everything has been written, there are mechanisms in the disk itself where the process goes through buffer levels, and only after that the data is placed on the disk.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Mostly asynchronous calls are used. This allows the client to work on different requests in parallel. You can use the synchronous approach, but it is less productive.

The two operations we talked about are update/write, which change the data. These are create, setData, sync, delete. And read is exists, getData, getChildren.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Now a few examples of how you can make primitives work in a distributed system. For example, related to the configuration of something. A new worker has appeared. We added a car, started the process. And there are three questions. How does it request configuration from ZooKeeper? And if we want to change the configuration, then how do we change it? And after we changed it, how do the workers that we had get it?

In ZooKeeper, this can be done relatively easily. For example, there is our tree znode. There is a node for our application here, we create an additional node in it, which contains data from the configuration. These may or may not be separate parameters. Since the size is small, the size of the configuration is usually also small, so it is quite possible to store it here.

You are using the method getData to get the configuration for the worker from the node. Set to true. If this node does not exist for some reason, we will be informed about this when it appears, or when it changes. If we want to know that something has changed, then we set it to true. And if the data in this node changes, we will know about it.

SetData. We set the data, set "-1", i.e. we do not check the version, we believe that we always have the same configuration, we do not need to store many configurations. If you need to store a lot, then you will need to add another level. Here we consider that it is the only one, so we update only the latest one, so we do not check the version. At this moment, all clients who have previously subscribed, they receive a notification that something has changed in this node. And after they received it, they also have to request the data again. The notification lies in the fact that they do not receive the data itself, but only a notification of changes. After that, they must ask for new data.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

The second use of the primitive is group membership. We have a distributed application, we have a bunch of workers and we want to understand that they are all in place. Therefore, they must register themselves that they work in our application. And also we want to learn about all the active workers that we currently have, either from the Master process, or from somewhere else.

How do we do it? For the application, we create a workers node and add a sublevel there using the create method. I have an error on the slide. Here you need sequential specify, then all workers will be created one at a time. And the application, requesting all the data about the children of this node, gets all the active workers that are.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Here is such a terrible implementation of how this can be done in Java code. Let's start from the end, with the main method. This is our class, we create its method. As the first argument, we use host, where we connect, i.e. we set it as an argument. And as the second argument - the name of the group.

How is the connection going? This is a simple example of the API that is being used. Everything is relatively simple here. There is a standard ZooKeeper class. We give it hosts. And set a timeout, for example, 5 seconds. And we have a member like connectedSignal. In essence, we create a group by the path passed. We do not write data there, although something could be written. And the node here is of type persistent. In fact, this is an ordinary regular node that will exist all the time. This is where the session is created. This is the implementation of the client itself. Our client will periodically report that the session is alive. And when we complete the session, we call close and that's it, the session falls off. This is in case something falls off with us, so that ZooKeeper finds out about it and cuts off the session.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

How to lock some resource? Here everything is somewhat more complicated. We have a set of workers, there is some kind of resource that we want to lock. To do this, we create a separate node, for example, called lock1. If we were able to create it, then we got a lock here. And if we could not create it, then the worker tries to get GetData from here, and since the node has already been created, then we put a watcher here and at the moment when the state of this node changes, we will know about it. And we can try to have time to recreate it. If we took this node, took this lock, then after we no longer need the lock, we will refuse it, because the node exists only within the session. Accordingly, it will disappear. And another client within another session will be able to pick up a lock on this node, or rather, he will receive a notification that something has changed and he can try to do it in time.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Another example of how you can choose the main leader. This is a little more complicated, but also relatively simple. What do we have going on here? There is a main node that aggregates all the workers. We're trying to get data about the leader. If this happened successfully, i.e. we received some data, then our worker starts to follow this leader. He believes that the leader is already there.

If the leader died for some reason, for example, fell off, then we try to create a new leader. And if we succeeded, then our worker becomes the leader. And if someone at this moment managed to create a new leader, then we are trying to understand who it is and then follow him.

Here the so-called herd effect arises, i.e. the effect of the herd, because when the leader dies, the one who is the first to be in time will become the leader.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

When capturing a resource, you can try to use a slightly different approach, which is as follows. For example, we want to get a lock, but without a hert effect. It will consist in the fact that our application requests lists of all node ids for an already existing node with lock. And if before that the node for which we created the lock is the minimum of the set that we received, then this means that we have captured the lock. We check that we got a lock. As a check, there will be a condition that the id that we received when creating a new lock is the minimum one. And if we have received, then we work further.

If there is some id that is less than our lock, then we put a watcher on this event and wait for notification until something changes. That is, we received this lock. And until it falls off, we will not become the minimum id and we will not get the minimum lock, and thus we will be able to lock up. And if this condition is not met, then we immediately go here and try to get this lock again, because something could have changed during this time.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

What is ZooKeeper made of? There are 4 main things. This is the processing of processes - Request. And also ZooKeeper Atomic Broadcast. There is a Commit Log where all operations are recorded. And the In-memory Replicated DB itself, i.e. the database itself, where this entire tree is stored.

It is worth noting that all write operations go through the Request Processor. And read operations go directly to the In-memory database.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

The database itself is fully replicated. All instances of ZooKeeper store a complete copy of the data.

In order to restore the database after a crash, there is a Commit log. It's a standard practice to write data there before it gets into memory, so that in the event of a crash, this log can be replayed and the state of the system restored. And periodic database snapshots are also applied.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

ZooKeeper Atomic Broadcast is such a thing that is used to maintain replicated data.

ZAB internally chooses a leader from the point of view of the ZooKeeper node. Other nodes become her followers, waiting for some action from her. If records come to them, then they all redirect them to the leader. He pre-performs the write operation and then sends a message about what has changed to his followers. This, in fact, must be performed atomically, i.e. the recording and broadcasting operation of this whole thing must be performed atomically, thereby guaranteeing data consistency.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop" It only processes write requests. Its main task is that it transforms the operation into a transactional update. This is a specially crafted request.

And here it is worth noting that the idempotency of updates for the same operation is guaranteed. What it is? This thing, if you run it twice, it will have the same state, i.e. the request itself will not change from this. And you need to do this so that in the event of a crash, you can restart the operation, thereby rolling over the changes that have fallen off at the moment. In this case, the state of the system will become the same, i.e., it should not be such that a series of the same, for example, update processes, led to different final states of the system.

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Hadoop. ZooKeeper" from the Mail.Ru Group Technostrim series "Methods for distributed processing of large amounts of data in Hadoop"

Source: habr.com

Add a comment