When creating a Kubernetes cluster, questions may arise: how many worker nodes to configure and what type? What is better for an on-premise cluster: buy a few powerful servers or use a dozen old machines in your data center? And in the cloud, is it better to take eight single-core or two quad-core instances?
The answers to these questions are in the article.
Cluster capacity
In general, a Kubernetes cluster can be thought of as a large "supernode". Its total computing power is the sum of the powers of all the constituent nodes.
There are several ways to achieve the desired cluster capacity target. For example, we need a cluster with a total capacity of 8 processor cores and 32 GB of RAM, because a set of applications requires so many resources. Then you can install two nodes of 16 GB of memory or four nodes of 8 GB of memory, two quad-core processors or four dual-core ones.
Here are just two possible ways to create a cluster:
Both options result in a cluster with the same capacity, but the bottom configuration has four smaller nodes and the top configuration has two large nodes.
Which option is better?
To answer this question, consider the advantages of both options. We have summarized them in a table.
Several large knots
Lots of small knots
Easier cluster management (if it is on-premise)
Smooth autoscaling
Cheaper (if on-premise)
The price is little different (in the cloud)
Can run resource intensive applications
Full replication
Resources are used more efficiently (less overhead on system daemons
Higher cluster failover
Please note that we are only talking about working nodes. Choosing the number and size of master nodes is a completely different topic.
So, let's discuss each item from the table in more detail.
First option: several large nodes
The most extreme option is one worker node for the entire cluster capacity. In the example above, this would be a single worker node with 16 CPU cores and 16 GB of RAM.
pros
Plus #1 Easier Management
It is easier to manage several machines than a whole fleet. Roll out updates and fixes faster, it's easier to synchronize. The number of failures in absolute numbers is also less.
Please note that all of the above applies to your hardware, your servers, and not to cloud instances.
In the cloud, the situation is different. It is managed by the cloud service provider. Thus, managing ten nodes in the cloud is not much different from managing one node.
Traffic routing and load balancing between pods in the cloud
Plus #2: Less cost per node
A powerful car is more expensive, but the price increase is not necessarily linear. In other words, one ten-core server with 10 GB of memory is usually cheaper than ten single-core servers with the same amount of memory.
But note that this rule usually does not work in cloud services. In the current pricing schemes of all major cloud service providers, prices increase linearly with capacity.
Thus, in the cloud, you usually cannot save on more powerful servers.
Pro #3: Can run resource-hungry apps
Some applications require powerful servers in a cluster. For example, if a machine learning system requires 8 GB of memory, you will not be able to run it on 1 GB nodes, but only if there is at least one large worker node.
Cons
Cons #1: Lots of pods per node
If the same task is running on fewer nodes, then each of them will naturally have more pods.
This could be a problem.
The reason is that each module introduces some overhead to the container runtime (like Docker) as well as kubelet and cAdvisor.
For example, kubelet regularly probes all containers on a node for liveness - the more containers, the more work the kubelet has to do.
CAdvisor collects resource usage statistics for all containers on the host, and kubelet queries this information regularly and provides it via an API. Again, more containers means more work for both cAdvisor and kubelet.
If the number of modules grows, it can slow down the system and even undermine its reliability.
In the Kubernetes repository, some
For this reason Kubernetes
Cons #2: Replication Limit
Too few nodes limits the effective degree of application replication. For example, if you have a highly available application with five replicas, but only two nodes, then the effective replication degree of the application is reduced to two.
Five replicas can only be distributed over two nodes, and if one of them is down, it immediately disables several replicas.
If you have five nodes or more, each replica will run on a separate node, and a single node failure will remove at most one replica.
Thus, high availability requirements may require a certain minimum number of nodes in the cluster.
Cons #3: Worse consequences of failure
With a small number of nodes, each failure has more serious consequences. For example, if you have only two nodes, and one of them fails, half of your modules immediately disappear.
Of course, Kubernetes will offload the workload from the failed node to others. But if there are few of them, then the free capacity may not be enough. As a result, some of your applications will be unavailable until you bring the failed node up.
Thus, the more nodes, the less the impact of hardware failures.
Cons #4: More autoscaling steps
Kubernetes has a cluster autoscaling system for cloud infrastructure, which allows you to automatically add or remove nodes depending on current needs. With larger nodes, autoscaling becomes sharper and clunkier. For example, on two nodes, adding an additional node will increase the cluster capacity by 50% at once. And you have to pay for these resources, even if you don't need them.
Thus, if you plan to use automatic cluster scaling, then the smaller the nodes, the more flexible and cost-effective scaling you will get.
Now consider the advantages and disadvantages of a large number of small nodes.
Second option: many small nodes
The advantages of this approach, in fact, follow from the disadvantages of the opposite option with several large nodes.
pros
Plus #1: Less crash impact
The more nodes, the fewer pods on each node. For example, if you have one hundred modules per ten nodes, then each node will have an average of ten modules.
Thus, if one of the nodes fails, you lose only 10% of the workload. It is likely that only a small number of replicas will be affected, and the applications as a whole will remain operational.
In addition, the remaining nodes will likely have enough free resources for the failed node's workload, so Kubernetes is free to reschedule pods and your applications will return to a functional state relatively quickly.
Plus #2: Good replication
If there are enough nodes, then the Kubernetes scheduler can assign different nodes to all replicas. This way, in the event of a node failure, only one replica will be affected and the application will remain available.
Cons
Cons #1: Harder to manage
More nodes are more difficult to manage. For example, each Kubernetes node must communicate with all the others, that is, the number of connections grows quadratically, and all these connections need to be tracked.
The node controller in the Kubernetes Controller Manager routinely traverses all the nodes in the cluster to check for health - the more nodes, the greater the load on the controller.
The load on the etcd database is also growing - each kubelet and kube-proxy calls
In general, each worker node imposes additional load on the system components of the master nodes.
Kubernetes officially supports clusters with
To manage a large number of worker nodes, you should choose more powerful master nodes. For example, kube-up
To solve these specific problems, there are special developments, such as
Cons #2: More overhead
On each worker node, Kubernetes runs a set of system daemons - these include the container runtime (such as Docker), kube-proxy, and kubelet, including cAdvisor. Together they consume a certain fixed amount of resources.
If you have many small nodes, the proportion of this overhead per node is greater. For example, imagine that all the system daemons of one node together use 0,1 CPU cores and 0,1 GB of memory. If you have one ten-core node with 10 GB of memory, then the daemons consume 1% of the cluster capacity. On the other hand, on ten single-core nodes with 1 GB of memory, daemons will take 10% of the cluster capacity.
Thus, the fewer nodes, the more efficiently the infrastructure is used.
Cons #3: Inefficient use of resources
On small nodes, the situation may be that the remaining resource fragments are too small to assign any workload to them, so they remain unused.
For example, each pod requires 0,75 GB of storage. If you have ten nodes, each with 1 GB of memory, you can run ten pods - in the end, each node will have 0,25 GB of unused memory.
This means that 25% of the entire cluster's memory is wasted.
On a large node with 10 GB of memory, you can run 13 of these modules - and there will be only one unused fragment of 0,25 GB.
In this case, only 2,5% of the memory is wasted.
Thus, resources are more optimally spent on large nodes.
A few large nodes or many small ones?
So, which is better: a few large nodes in a cluster or many small ones? As always, there is no clear answer. Much depends on the type of application.
For example, if an application needs 10 GB of memory, the obvious choice is to go for large nodes. And if the application requires ten times replication for high availability, it's hardly worth the risk of placing replicas on only two nodes - there should be at least ten nodes in the cluster.
In intermediate situations, make a choice based on the advantages and disadvantages of each option. Perhaps some arguments are more relevant to your situation than others.
And it is not at all necessary to make all nodes of the same size. Nothing prevents experimenting first with nodes of the same size, then adding nodes of a different size to them, combining them in a cluster. The worker nodes of a Kubernetes cluster can be completely heterogeneous. So you can try to combine the advantages of both approaches.
There is no single recipe, and each situation has its own nuances, and only production will show the truth.
Translation prepared by the cloud platform team
More about Kubernetes:
Source: habr.com