Do databases live in Kubernetes?

Do databases live in Kubernetes?

It somehow happened historically that the IT industry, for any reason, is divided into two conditional camps: those who are “for” and those who are “against”. Moreover, the subject of disputes can be absolutely arbitrary. Which OS is better: Win or Linux? On an Android or iOS smartphone? Store everything in the clouds or upload it to cold RAID storages and put screws in a safe? Do PHP-shniks have the right to be called programmers? These disputes are, at times, exclusively existential in nature and have no basis other than sports interest.

It just so happened that with the advent of containers and all this kitchen we love with docker and conditional k8s, disputes “for” and “against” the use of new features in various areas of the backend began. (Let's make a reservation in advance that although Kubernetes will be most often indicated as the orchestrator in this discussion, the choice of this particular instrument is of fundamental importance. Instead, you can substitute any other that seems most convenient and familiar to you.)

And, it would seem, it would be a simple dispute between two sides of the same coin. As senseless and merciless as the eternal confrontation between Win vs Linux, in which adequate people exist somewhere in the middle. But in the case of containerization, not everything is so simple. Usually there is no right side in such disputes, but in the case of “use” or “not use” database containers, everything turns upside down. Because in a certain sense, both supporters and opponents of this approach are right.

Bright side

You can briefly describe the argument of the Light Side with one phrase: “Hello, 2k19 is outside the window!” It sounds like populism, of course, but if you delve into the situation in detail, it has its advantages. Let's take a look at them now.

Let's say you have a large web project. It could initially be built on the basis of a microservice approach, or at some point it came to it in an evolutionary way - this is not very important, in fact. You scattered our project into separate microservices, set up orchestration, load balancing, and scaling. And now, with a clear conscience, you sip mojitos in a hammock during habra effects instead of raising fallen servers. But in all actions you must be consistent. Very often, only the application itself is containerized - the code. What else do we have besides the code?

That's right, data. The heart of any project is its data: it can be either a typical DBMS - MySQL, Postgre, MongoDB, or storage used for search (ElasticSearch), key-value storage for caching - for example, redis, etc. Now we are not we will talk about crooked backend implementation options when the database crashes due to poorly written queries, but instead we will talk about ensuring the fault tolerance of this very database under client load. After all, when we containerize our application and allow it to freely scale to handle any number of incoming requests, this naturally increases the load on the database.

In fact, the database access channel and the server it runs on become the eye of a needle in our beautiful containerized backend. At the same time, the main motive of container virtualization is the mobility and plasticity of the structure, which makes it possible to organize the distribution of peak load over the entire infrastructure available to us as efficiently as possible. That is, if we do not containerize and roll out all the available elements of the system across the cluster, we make a very serious mistake.

It is much more logical to cluster not only the application itself, but also the services responsible for storing data. When clustering and deploying independently working and distributing the load of web servers in k8s, we are already solving the problem of data synchronization - the same comments on posts, if we cite some media or blogging platform as an example. In any case, we have an intra-cluster, even if virtual, representation of the database as an ExternalService. The question is that the database itself is not yet clustered - the web servers deployed in the cube take information about changes from our static combat base, which spins separately.

Do you feel a trick? We use k8s or Swarm to balance the load and avoid crashing the main web server, but we don't do that for the database. But after all, if the database falls, then there is no point in our entire clustered infrastructure - what is the use of empty web pages that return an error accessing the database?

That is why it is necessary to cluster not only web servers, as is usually done, but also the database infrastructure. Only in this way can we provide a structure that fully works in one team, but at the same time is independent of each other. At the same time, even if half of our backend “collapses” under load, the rest will survive, and the database synchronization system among themselves within the cluster and the possibility of infinite scaling and deployment of new clusters will help to quickly reach the required capacities - there would be racks in the data center .

In addition, the database model distributed in clusters allows you to take this very database to where it is needed; if we are talking about a global service, then it is rather illogical to spin a web cluster somewhere in the San Francisco area and at the same time drive packets when accessing a database in the suburbs and back.

Also, containerization of the database allows you to build all the elements of the system at the same level of abstraction. Which, in turn, makes it possible to manage this very system directly from the code, by developers, without the active involvement of admins. The developers thought that they needed a separate DBMS for a new subproject - easy! wrote a yaml file, uploaded it to the cluster and you're done.

Well, of course, internal operation is simplified at times. Tell me, how many times did you close your eyes when a new team member put his hands into the combat database at work? Which one do you have, in fact, one and is spinning right now? Of course, we are all adults here, and somewhere we have a fresh backup, and even further away - behind a shelf with grandmother's cucumbers and old skis - another backup, perhaps even in cold storage, because once your office was on fire. But anyway, each introduction of a new team member who has access to the combat infrastructure and, of course, to the combat database is a bucket of validol for everyone around. Well, who knows him, a newcomer, maybe he is squint-handed? Scary, agree.

Containerization and, in fact, the distributed physical topology of the database of your project helps to avoid such valid moments. Don't trust a newbie? OK! We will raise our own cluster for it to work and disconnect it from the rest of the database clusters - synchronization only by manual push and synchronous rotation of two keys (one for the team leader, the second for the admin). And everyone is happy.

And now it's time to change into opponents of database clustering.

Dark side

Arguing why it is not worth containerizing the database and continuing to spin it on one central server, we will not stoop to orthodox rhetoric and statements like “grandfathers ran the database on hardware, and we will!” Instead, let's try to come up with a situation in which containerization would really bring tangible dividends.

Agree, projects that really need a base in a container can be counted on the fingers of one hand by not the best milling machine. For the most part, even the use of k8s or Docker Swarm itself is redundant - quite often these tools are resorted to because of the general hype of technologies and the attitudes of the "almighty" in the face of geners to drive everything into clouds and containers. Well, because now it's fashionable and everyone does it.

In at least half of the cases, using Kubernetis or just Docker on a project is redundant. The issue is that not all teams or outsourcing companies hired to maintain the client's infrastructure are aware of this. Worse - when containers are imposed, because it costs a certain amount of coins to the client.

In general, there is an opinion that the docker / mafia cube stupidly crushes clients who outsource these infrastructure issues. Indeed, in order to work with clusters, engineers are needed who are capable of this and understand the architecture of the implemented solution in general. We somehow already described our case with the Republic edition - there we trained the client's team to work in the realities of kubernetis, and everyone was satisfied. And it was decent. Often, the “implementers” of k8s take the client’s infrastructure hostage - after all, now only they understand how everything works there, there are no specialists on the client side.

Now imagine that in this way we give not only the web server part to outsourcing, but also the maintenance of the database. We said that the database is the heart, and the loss of the heart is fatal for any living organism. In short, the prospects are not the best. So, instead of hype kubernetis, many projects should just not go for the normal AWS tariff, which will solve all the problems with the load on their site / project. But AWS is no longer fashionable, and show-offs are more expensive than money - unfortunately, in the IT environment too.

OK. Perhaps the project really needs clustering, but if everything is clear with stateless applications, then how to organize decent network connectivity for a clustered database?

If we are talking about a seamless engineering solution, which is the transition to k8s, then our main headache is data replication in a clustered database. Some DBMS are initially quite loyal to the distribution of data between their individual instances. Many others are not so welcoming. And quite often the main argument in choosing a DBMS for our project is by no means the ability to replicate with minimal resource and engineering costs. Especially if the project was not originally planned as a microservice, but simply evolved in this direction.

We think there is no need to talk about the speed of network drives - they are slow. Those. we still don’t have a real opportunity, in which case, to re-raise the DBMS instance somewhere, where, for example, there is more processor power or free RAM. We will very quickly run into the performance of the virtualized disk subsystem. Accordingly, the DBMS must be nailed to its own personal set of machines in close proximity. Or it is necessary to somehow separate a sufficiently fast synchronization of data to the estimated reserves.

Continuing the topic of virtual FS: Docker Volumes, unfortunately, are not problem-free. In general, in such a case as long-term reliable data storage, I would like to get by with the simplest technical schemes. And adding a new layer of abstraction from the container FS to the FS of the parent host is a risk in itself. But when even in the work of the containerization system there are difficulties with translating data between these layers, then it’s really a disaster. At the moment, most of the problems known to progressive mankind seem to have been eradicated. But you yourself understand, the more complex the mechanism, the easier it breaks.

In the light of all these “adventures”, it is much more profitable and easier to keep the database in one place, and even if you need to containerize the application, let it spin on its own and get a simultaneous connection with the database through the distribution gateway, which will be read and written only once and In one place. This approach reduces the likelihood of errors and desynchronizations to a minimum.

What are we leading to? Moreover, database containerization is appropriate where there is a real need for it. You can’t cram a full-app base and spin it like you have two dozen microservices - it doesn’t work that way. And this must be clearly understood.

Instead of deducing

If you are waiting for a clear conclusion “virtualize or not the database”, then we are sorry: it will not be here. Because when creating any infrastructure solution, one must be guided not by fashion and progress, but, first of all, by common sense.

There are projects for which the principles and tools that come with kubernetis fit perfectly, and in such projects peace comes at least in the backend area. And there are projects that do not need containerization, but a normal server infrastructure, because they fundamentally cannot rescale to the microservice cluster model, because they will fall.

Source: habr.com

Add a comment