Designing Kubernetes Clusters: How Many Should There Be?
Note. transl.: this material is from an educational project learn8s - the answer to a popular question when designing infrastructure based on Kubernetes. We hope that sufficiently detailed descriptions of the pros and cons of each of the options will help you make the best choice for your project.
TL; DR: the same set of workloads can be run on several large clusters (each cluster will have a large number of workloads) or on many small ones (with a small number of workloads in each cluster).
Below is a table that evaluates the pros and cons of each approach:
When using Kubernetes as a platform for running applications, there are often several fundamental questions about the intricacies of configuring clusters:
How many clusters to use?
How big should you make them?
What should each cluster include?
In this article, I will try to answer all these questions by analyzing the pros and cons of each approach.
Statement of a question
As a software developer, you are likely to develop and operate many applications in parallel.
In addition, many instances of these applications are likely to run in different environments - for example, these can be giant, test ΠΈ prod.
The result is a whole matrix of applications and environments:
Applications and environments
In the example above, there are 3 applications and 3 environments, resulting in a total of 9 possible options.
Each application instance is a self-contained deployment unit that you can work with independently of the others.
Note that application instance may consist of many componentssuch as frontend, backend, database, etc. In the case of a microservice application, the instance will include all microservices.
As a result, Kubernetes users have several questions:
Should all application instances be hosted on the same cluster?
Is it worth starting a separate cluster for each application instance?
Or perhaps a combination of the above approaches should be used?
All of these options are viable, since Kubernetes is a flexible system that does not limit the user's options.
Here are some of the possible ways:
one large common cluster;
many small highly specialized clusters;
one cluster per application;
one cluster per environment.
As shown below, the first two approaches are at opposite ends of the scale of options:
From a few large clusters (left) to many small ones (right)
In general, it is considered that one cluster is "bigger" than another if it has more than the sum of nodes and pods. For example, a cluster with 10 nodes and 100 pods is larger than a cluster with 1 node and 10 pods.
Well, let's get started!
1. One big shared cluster
The first option is to place all workloads in one cluster:
One big cluster
Within this approach, the cluster is used as a universal infrastructure platform - you simply deploy everything you need in an existing Kubernetes cluster.
Namespaces Kubernetes allows you to logically separate parts of a cluster from each other, so that each application instance can use its own namespace.
Let's look at the pros and cons of this approach.
+ Efficient use of resources
In the case of a single cluster, only one copy of all the resources needed to run and manage the Kubernetes cluster will be needed.
For example, this is true for master nodes. Typically, there are 3 master nodes per Kubernetes cluster, so for a single cluster, their number will remain that way (for comparison, 10 clusters would need 30 master nodes).
The above subtlety also applies to other services that operate across the entire cluster, such as load balancers, Ingress controllers, authentication, logging and monitoring systems.
In a single cluster, all these services can be used for all workloads at once (no need to create copies of them, as in the case of multiple clusters).
+ cheap
As a consequence of the above, fewer clusters are usually cheaper because there is no cost for excess resources.
This is especially true for masternodes, which can cost a significant amount of money whether hosted on-premises or in the cloud.
There are also managed services that charge a fixed fee for the operation of each Kubernetes cluster (for example, Amazon Elastic Kubernetes Service, EKS).
+ Efficient administration
It is easier to manage one cluster than several.
Administration may include the following tasks:
updating the version of Kubernetes;
setting up a CI/CD pipeline;
installing the CNI plugin;
setting up a user authentication system;
installation of the admission controller;
and many othersβ¦
In the case of a single cluster, you will have to do all this only once.
For many clusters, the operations will have to be repeated many times, which will probably require some automation of processes and tools to ensure a systematic and uniform process.
And now a few words about the cons.
β Single point of failure
In case of refusal the only clusters will stop working immediately all workloads!
There are many options for when something can go wrong:
Kubernetes upgrade leads to unexpected side effects
a cluster-wide component (for example, a CNI plugin) does not start working as expected;
one of the cluster components is configured incorrectly;
failure in the underlying infrastructure.
One such incident can cause serious damage to all workloads hosted in a common cluster.
β Lack of rigid insulation
Running in a shared cluster means that applications share the hardware, networking capabilities, and operating system on the cluster nodes.
In a sense, two containers with two different applications running on the same host are like two processes running on the same machine running the same OS kernel.
Linux containers provide some form of isolation, but it's nowhere near as strong as what, say, virtual machines provide. In essence, a process in a container is the same process running on the host operating system.
This can be a security problem: this organization theoretically allows unrelated applications to communicate with each other (intentionally or accidentally).
In addition, all workloads in a Kubernetes cluster share some cluster-wide services such as DNS - this allows applications to find Services of other applications in the cluster.
All of the above points can have different meanings depending on the application security requirements.
Kubernetes provides various tools to prevent security issues such as PodSecurityPolicies ΠΈ NetworkPolicies. However, they require some experience to properly configure, and they are not able to close absolutely all security holes.
It is important to always remember that Kubernetes was originally designed to sharing, not for isolation and security.
β Lack of hard multi-tenancy
Given the abundance of shared resources in a Kubernetes cluster, there are many ways in which different applications can step on each other's toes.
For example, an application can monopolize a shared resource (such as a processor or memory) and deny other applications running on the same host access to it.
In the case of a single cluster, you have to open access to it to many people. And the larger their number, the higher the risk that they will βbreakβ something.
However, in real life, problems can begin much earlier - for example, just when 500 knots.
The fact is that large clusters put a high load on the Kubernetes control layer. In other words, careful tuning is required to keep the cluster up and running efficiently.
But let's consider the opposite approach: many small clusters.
2. Many small, specialized clusters
With this approach, you use a separate cluster for each item you deploy:
Many small clusters
For the purposes of this article under deployable element is understood as an instance of the application - for example, the dev version of a separate application.
In this strategy, Kubernetes is used as a specialized runtime environment for individual application instances.
Let's look at the pros and cons of this approach.
+ Limited "explosion radius"
When a cluster "breaks", the negative consequences are limited to only those workloads that were deployed in this cluster. All other workloads remain untouched.
+ Insulation
Workloads hosted in individual clusters do not share common resources such as processor, memory, operating system, network, or other services.
As a result, we get a strong isolation between unrelated applications, which can be beneficial for their security.
+ Small number of users
Given that each cluster contains only a limited set of workloads, the number of users with access to it is reduced.
The fewer people who have access to the cluster, the lower the risk that something will "break".
Let's look at the cons.
β Inefficient use of resources
As mentioned earlier, each Kubernetes cluster requires a certain set of management resources: master nodes, control layer components, monitoring and logging solutions.
In the case of a large number of small clusters, a larger share of resources must be allocated to management.
β Expensive
Inefficient use of resources automatically entails high costs.
For example, maintaining 30 masternodes instead of three with the same computing power will definitely affect the costs.
β Difficulties of administration
Administering multiple Kubernetes clusters is much more difficult than managing one.
For example, you will have to configure authentication and authorization for each cluster. Upgrading the Kubernetes version will also have to be done several times.
Most likely, you will need to apply automation to increase the efficiency of all these tasks.
Now consider less extreme scenarios.
3. One cluster per application
With this approach, you create a separate cluster for all instances of a particular application:
Cluster per application
This way can be considered as a generalization of the principle "separate cluster per teamβ because usually a team of engineers is involved in the development of one or more applications.
Let's look at the pros and cons of this approach.
+ Cluster can be customized to the application
If an application has special needs, they can be implemented in a cluster without affecting other clusters.
Such needs may include GPU workers, certain CNI plugins, a service mesh, or some other service.
Each cluster can be tailored to the application running on it to contain only what is needed.
β Different environments in one cluster
The disadvantage of this approach is that application instances from different environments coexist in the same cluster.
For example, the prod version of an application runs on the same cluster as the dev version. This also means that the developers operate in the same cluster as the production version of the application.
If the cluster fails due to the actions of developers or glitches in the dev version, then the prod version can potentially suffer as well - a huge disadvantage of this approach.
And finally, the last scenario on our list.
4. One cluster per environment
This scenario provides for the allocation of a separate cluster for each environment:
One cluster per environment
For example, you may have clusters giant, test ΠΈ prod, in which you will run all instances of the application that are intended for a particular environment.
Here are the pros and cons of this approach.
+ Isolation of the prod-environment
Within this approach, all environments are isolated from each other. However, in practice, this is especially important for the prod environment.
Production versions of the application are now independent of what is happening in other clusters and environments.
Thus, if a problem suddenly arises in the dev cluster, the prod versions of the applications will continue to work as if nothing had happened.
+ Cluster can be adjusted to the environment
Each cluster can be tailored to its environment. For example, you can:
install tools for development and debugging in the dev cluster;
install test frameworks and tools on the cluster test;
use more powerful hardware and network links in the cluster prod.
This improves the efficiency of both application development and operation.
+ Restricting access to the production cluster
The need to work directly with a prod cluster does not often arise, so you can significantly limit the circle of people who have access to it.
You can go even further and deprive people of access to this cluster altogether, and do all deployments using an automated CI / CD tool. Such an approach will minimize the risk of human errors exactly where it is most relevant.
And now a few words about the cons.
β Lack of isolation between applications
The main disadvantage of the approach is the lack of hardware and resource isolation between applications.
Unrelated applications share cluster resources: system core, processor, memory, and some other services.
As already mentioned, this can be potentially dangerous.
β Inability to localize application dependencies
If the application has special requirements, then they must be satisfied in all clusters.
For example, if an application needs a GPU, then each cluster must contain at least one worker with a GPU (even if it is used only by this application).
As a result, we risk higher costs and inefficient use of resources.
Conclusion
If you have a certain set of applications, they can be placed in several large clusters or in many small ones.
The article discusses the pros and cons of various approaches, ranging from one global cluster to several small and highly specialized ones:
one large common cluster;
many small highly specialized clusters;
one cluster per application;
one cluster per environment.
So which approach should you choose?
As usual, the answer depends on the use case: you need to weigh the pros and cons of different approaches and choose the best option.
However, the choice is not limited to the examples above - you can use any combination of them!
For example, you can organize a pair of clusters for each team: a cluster for development (which will have environments giant ΠΈ test) and a cluster for production (where the production environment will be).
Based on the information in this article, you will be able to optimize the pros and cons accordingly for a specific scenario. Good luck!