Designing Kubernetes Clusters: How Many Should There Be?

Note. transl.: this material is from an educational project learn8s - the answer to a popular question when designing infrastructure based on Kubernetes. We hope that sufficiently detailed descriptions of the pros and cons of each of the options will help you make the best choice for your project.

Designing Kubernetes Clusters: How Many Should There Be?

TL; DR: the same set of workloads can be run on several large clusters (each cluster will have a large number of workloads) or on many small ones (with a small number of workloads in each cluster).

Below is a table that evaluates the pros and cons of each approach:

Designing Kubernetes Clusters: How Many Should There Be?

When using Kubernetes as a platform for running applications, there are often several fundamental questions about the intricacies of configuring clusters:

  • How many clusters to use?
  • How big should you make them?
  • What should each cluster include?

In this article, I will try to answer all these questions by analyzing the pros and cons of each approach.

Statement of a question

As a software developer, you are likely to develop and operate many applications in parallel.

In addition, many instances of these applications are likely to run in different environments - for example, these can be giant, test ΠΈ prod.

The result is a whole matrix of applications and environments:

Designing Kubernetes Clusters: How Many Should There Be?
Applications and environments

In the example above, there are 3 applications and 3 environments, resulting in a total of 9 possible options.

Each application instance is a self-contained deployment unit that you can work with independently of the others.

Note that application instance may consist of many componentssuch as frontend, backend, database, etc. In the case of a microservice application, the instance will include all microservices.

As a result, Kubernetes users have several questions:

  • Should all application instances be hosted on the same cluster?
  • Is it worth starting a separate cluster for each application instance?
  • Or perhaps a combination of the above approaches should be used?

All of these options are viable, since Kubernetes is a flexible system that does not limit the user's options.

Here are some of the possible ways:

  • one large common cluster;
  • many small highly specialized clusters;
  • one cluster per application;
  • one cluster per environment.

As shown below, the first two approaches are at opposite ends of the scale of options:

Designing Kubernetes Clusters: How Many Should There Be?
From a few large clusters (left) to many small ones (right)

In general, it is considered that one cluster is "bigger" than another if it has more than the sum of nodes and pods. For example, a cluster with 10 nodes and 100 pods is larger than a cluster with 1 node and 10 pods.

Well, let's get started!

1. One big shared cluster

The first option is to place all workloads in one cluster:

Designing Kubernetes Clusters: How Many Should There Be?
One big cluster

Within this approach, the cluster is used as a universal infrastructure platform - you simply deploy everything you need in an existing Kubernetes cluster.

Namespaces Kubernetes allows you to logically separate parts of a cluster from each other, so that each application instance can use its own namespace.

Let's look at the pros and cons of this approach.

+ Efficient use of resources

In the case of a single cluster, only one copy of all the resources needed to run and manage the Kubernetes cluster will be needed.

For example, this is true for master nodes. Typically, there are 3 master nodes per Kubernetes cluster, so for a single cluster, their number will remain that way (for comparison, 10 clusters would need 30 master nodes).

The above subtlety also applies to other services that operate across the entire cluster, such as load balancers, Ingress controllers, authentication, logging and monitoring systems.

In a single cluster, all these services can be used for all workloads at once (no need to create copies of them, as in the case of multiple clusters).

+ cheap

As a consequence of the above, fewer clusters are usually cheaper because there is no cost for excess resources.

This is especially true for masternodes, which can cost a significant amount of money whether hosted on-premises or in the cloud.

Some managed Kubernetes services such as Google Kubernetes Engine (GKE) or Azure Kubernetes Service (AKS), provide the control layer for free. In this case, the issue of costs is less acute.

There are also managed services that charge a fixed fee for the operation of each Kubernetes cluster (for example, Amazon Elastic Kubernetes Service, EKS).

+ Efficient administration

It is easier to manage one cluster than several.

Administration may include the following tasks:

  • updating the version of Kubernetes;
  • setting up a CI/CD pipeline;
  • installing the CNI plugin;
  • setting up a user authentication system;
  • installation of the admission controller;

and many others…

In the case of a single cluster, you will have to do all this only once.

For many clusters, the operations will have to be repeated many times, which will probably require some automation of processes and tools to ensure a systematic and uniform process.

And now a few words about the cons.

βˆ’ Single point of failure

In case of refusal the only clusters will stop working immediately all workloads!

There are many options for when something can go wrong:

  • Kubernetes upgrade leads to unexpected side effects
  • a cluster-wide component (for example, a CNI plugin) does not start working as expected;
  • one of the cluster components is configured incorrectly;
  • failure in the underlying infrastructure.

One such incident can cause serious damage to all workloads hosted in a common cluster.

βˆ’ Lack of rigid insulation

Running in a shared cluster means that applications share the hardware, networking capabilities, and operating system on the cluster nodes.

In a sense, two containers with two different applications running on the same host are like two processes running on the same machine running the same OS kernel.

Linux containers provide some form of isolation, but it's nowhere near as strong as what, say, virtual machines provide. In essence, a process in a container is the same process running on the host operating system.

This can be a security problem: this organization theoretically allows unrelated applications to communicate with each other (intentionally or accidentally).

In addition, all workloads in a Kubernetes cluster share some cluster-wide services such as DNS - this allows applications to find Services of other applications in the cluster.

All of the above points can have different meanings depending on the application security requirements.

Kubernetes provides various tools to prevent security issues such as PodSecurityPolicies ΠΈ NetworkPolicies. However, they require some experience to properly configure, and they are not able to close absolutely all security holes.

It is important to always remember that Kubernetes was originally designed to sharing, not for isolation and security.

βˆ’ Lack of hard multi-tenancy

Given the abundance of shared resources in a Kubernetes cluster, there are many ways in which different applications can step on each other's toes.

For example, an application can monopolize a shared resource (such as a processor or memory) and deny other applications running on the same host access to it.

Kubernetes provides various mechanisms to control this behavior, such as resource requests and limits (see also the article " CPU limits and aggressive throttling in Kubernetes "- approx. transl.), ResourceQuotas ΠΈ LimitRanges. However, as in the case of security, their configuration is rather non-trivial and they are not able to prevent absolutely all unforeseen side effects.

βˆ’ Large number of users

In the case of a single cluster, you have to open access to it to many people. And the larger their number, the higher the risk that they will β€œbreak” something.

Inside the cluster, you can control who can do what with role-based access control (RBAC) (see article " Users and Authorization RBAC in Kubernetes "- approx. transl.). However, it will not prevent users from "breaking" something within their area of ​​responsibility.

βˆ’ Clusters cannot grow indefinitely

The cluster that is used for all workloads is likely to be quite large (in terms of number of nodes and pods).

But here another problem arises: clusters in Kubernetes cannot grow indefinitely.

There is a theoretical limit on the cluster size. In Kubernetes, it is approximately 5000 nodes, 150k pods and 300k containers.

However, in real life, problems can begin much earlier - for example, just when 500 knots.

The fact is that large clusters put a high load on the Kubernetes control layer. In other words, careful tuning is required to keep the cluster up and running efficiently.

This issue is explored in a related article on the original blog titled "Architecting Kubernetes clusters - choosing a worker node sizeΒ».

But let's consider the opposite approach: many small clusters.

2. Many small, specialized clusters

With this approach, you use a separate cluster for each item you deploy:

Designing Kubernetes Clusters: How Many Should There Be?
Many small clusters

For the purposes of this article under deployable element is understood as an instance of the application - for example, the dev version of a separate application.

In this strategy, Kubernetes is used as a specialized runtime environment for individual application instances.

Let's look at the pros and cons of this approach.

+ Limited "explosion radius"

When a cluster "breaks", the negative consequences are limited to only those workloads that were deployed in this cluster. All other workloads remain untouched.

+ Insulation

Workloads hosted in individual clusters do not share common resources such as processor, memory, operating system, network, or other services.

As a result, we get a strong isolation between unrelated applications, which can be beneficial for their security.

+ Small number of users

Given that each cluster contains only a limited set of workloads, the number of users with access to it is reduced.

The fewer people who have access to the cluster, the lower the risk that something will "break".

Let's look at the cons.

βˆ’ Inefficient use of resources

As mentioned earlier, each Kubernetes cluster requires a certain set of management resources: master nodes, control layer components, monitoring and logging solutions.

In the case of a large number of small clusters, a larger share of resources must be allocated to management.

βˆ’ Expensive

Inefficient use of resources automatically entails high costs.

For example, maintaining 30 masternodes instead of three with the same computing power will definitely affect the costs.

βˆ’ Difficulties of administration

Administering multiple Kubernetes clusters is much more difficult than managing one.

For example, you will have to configure authentication and authorization for each cluster. Upgrading the Kubernetes version will also have to be done several times.

Most likely, you will need to apply automation to increase the efficiency of all these tasks.

Now consider less extreme scenarios.

3. One cluster per application

With this approach, you create a separate cluster for all instances of a particular application:

Designing Kubernetes Clusters: How Many Should There Be?
Cluster per application

This way can be considered as a generalization of the principle "separate cluster per team” because usually a team of engineers is involved in the development of one or more applications.

Let's look at the pros and cons of this approach.

+ Cluster can be customized to the application

If an application has special needs, they can be implemented in a cluster without affecting other clusters.

Such needs may include GPU workers, certain CNI plugins, a service mesh, or some other service.

Each cluster can be tailored to the application running on it to contain only what is needed.

βˆ’ Different environments in one cluster

The disadvantage of this approach is that application instances from different environments coexist in the same cluster.

For example, the prod version of an application runs on the same cluster as the dev version. This also means that the developers operate in the same cluster as the production version of the application.

If the cluster fails due to the actions of developers or glitches in the dev version, then the prod version can potentially suffer as well - a huge disadvantage of this approach.

And finally, the last scenario on our list.

4. One cluster per environment

This scenario provides for the allocation of a separate cluster for each environment:

Designing Kubernetes Clusters: How Many Should There Be?
One cluster per environment

For example, you may have clusters giant, test ΠΈ prod, in which you will run all instances of the application that are intended for a particular environment.

Here are the pros and cons of this approach.

+ Isolation of the prod-environment

Within this approach, all environments are isolated from each other. However, in practice, this is especially important for the prod environment.

Production versions of the application are now independent of what is happening in other clusters and environments.

Thus, if a problem suddenly arises in the dev cluster, the prod versions of the applications will continue to work as if nothing had happened.

+ Cluster can be adjusted to the environment

Each cluster can be tailored to its environment. For example, you can:

  • install tools for development and debugging in the dev cluster;
  • install test frameworks and tools on the cluster test;
  • use more powerful hardware and network links in the cluster prod.

This improves the efficiency of both application development and operation.

+ Restricting access to the production cluster

The need to work directly with a prod cluster does not often arise, so you can significantly limit the circle of people who have access to it.

You can go even further and deprive people of access to this cluster altogether, and do all deployments using an automated CI / CD tool. Such an approach will minimize the risk of human errors exactly where it is most relevant.

And now a few words about the cons.

βˆ’ Lack of isolation between applications

The main disadvantage of the approach is the lack of hardware and resource isolation between applications.

Unrelated applications share cluster resources: system core, processor, memory, and some other services.

As already mentioned, this can be potentially dangerous.

βˆ’ Inability to localize application dependencies

If the application has special requirements, then they must be satisfied in all clusters.

For example, if an application needs a GPU, then each cluster must contain at least one worker with a GPU (even if it is used only by this application).

As a result, we risk higher costs and inefficient use of resources.

Conclusion

If you have a certain set of applications, they can be placed in several large clusters or in many small ones.

The article discusses the pros and cons of various approaches, ranging from one global cluster to several small and highly specialized ones:

  • one large common cluster;
  • many small highly specialized clusters;
  • one cluster per application;
  • one cluster per environment.

So which approach should you choose?

As usual, the answer depends on the use case: you need to weigh the pros and cons of different approaches and choose the best option.

However, the choice is not limited to the examples above - you can use any combination of them!

For example, you can organize a pair of clusters for each team: a cluster for development (which will have environments giant ΠΈ test) and a cluster for production (where the production environment will be).

Based on the information in this article, you will be able to optimize the pros and cons accordingly for a specific scenario. Good luck!

PS

Read also on our blog:

Source: habr.com

Add a comment