Monitor Kubernetes Cluster Resources

Monitor Kubernetes Cluster Resources

I created Kube Eagle - Prometheus exporter. It turned out to be a cool thing that helps to better understand the resources of small and medium clusters. I ended up saving hundreds of dollars by choosing the right machine types and adjusting application resource limits to workloads.

I'll talk about the benefits Kube Eagle, but first I will explain why the fuss came out and why high-quality monitoring was needed.

I managed several clusters of 4-50 nodes. Each cluster has up to 200 microservices and applications. To make better use of the available hardware, most deployments were configured with burstable RAM and CPU resources. This allows pods to take available resources if needed, without interfering with other applications on that node. Well, isn't it great?

Even though the cluster consumed relatively little CPU (8%) and RAM (40%), we constantly had issues with pods being forced out when they tried to allocate more memory than was available on the node. Back then, we only had one panel for monitoring Kubernetes resources. Here it is:

Monitor Kubernetes Cluster Resources
Grafana dashboard with cAdvisor metrics only

With such a panel, nodes that eat a lot of memory and CPU are not a problem to see. The problem is to figure out what is the reason. To keep Pods in place, one could of course set up guaranteed resources on all Pods (resources requested equals the limit). But this is not the smartest use of iron. There were several hundred gigs of memory on the cluster, while some nodes were starving, while others had 4-10 GB left in stock.

It turns out that the Kubernetes scheduler distributed workloads across available resources unevenly. The Kubernetes scheduler takes into account different configurations: affinity rules, taints and tolerations, node selectors that can limit the available nodes. But in my case, there was nothing like that, and the pods were planned depending on the requested resources on each node.

For the pod, the node was selected, which has the most free resources and which satisfies the conditions of the request. We have found that the requested resources on the nodes do not match the actual usage, and this is where Kube Eagle and its resource monitoring capabilities come to the rescue.

I have almost all Kubernetes clusters tracked only with node exporter ΠΈ Kube State Metrics. The Node Exporter gives statistics on I/O and disk, CPU, and RAM usage, while Kube State Metrics shows Kubernetes object metrics, such as requests and limits on CPU and memory resources.

We need to combine the usage metrics with the request and limit metrics in Grafana, and then we will get all the information about the problem. It sounds simple, but the two tools actually name the labels differently, and some metrics don't have metadata labels at all. Kube Eagle does everything by itself and the panel looks like this:

Monitor Kubernetes Cluster Resources

Monitor Kubernetes Cluster Resources
Kube Eagle Dashboard

We managed to solve many problems with resources and save equipment:

  1. Some developers did not know how many resources microservices needed (or simply did not bother). We had nothing to find incorrect requests for resources - for this we need to know the consumption plus requests and limits. Now they can see Prometheus metrics, monitor actual usage, and adjust requests and limits.
  2. JVM applications take as much RAM as they take away. The garbage collector only releases memory if more than 75% is used. And since most services have burstable memory, it was always occupied by the JVM. Therefore, all these Java services ate much more RAM than expected.
  3. Some applications requested too much memory, and the Kubernetes scheduler did not give these nodes to the rest of the applications, although in fact they were freer than other nodes. One developer accidentally added an extra number in the request and grabbed a large chunk of RAM: 20 GB instead of 2. Nobody noticed. The application had 3 replicas, so as many as 3 nodes were affected.
  4. We introduced resource limits, rescheduled the pods with the right requests, and got the perfect balance of iron usage across all nodes. A couple of nodes could have been closed altogether. And then we saw that we had the wrong machines (CPU oriented, not memory oriented). We changed the type and removed a few more nodes.

Results

With burstable resources in the cluster, you use the available hardware more efficiently, but the Kubernetes scheduler schedules pods based on resource requests, which is fraught. To kill two birds with one stone: to avoid problems and use resources to the fullest, you need good monitoring. For this it will be useful Kube Eagle (Prometheus exporter and Grafana dashboard).

Source: habr.com

Add a comment