Five misses when deploying the first application on Kubernetes

Five misses when deploying the first application on KubernetesFail by Aris Dreamer

Many people think that it is enough to transfer the application to Kubernetes (either using Helm or manually) - and there will be happiness. But not everything is so simple.

Team Mail.ru Cloud Solutions translated an article by DevOps engineer Julian Gindy. He tells what pitfalls his company faced during the migration process so that you do not step on the same rake.

Step One: Set Up Pod Requests and Limits

Let's start by setting up a clean environment in which our pods will run. Kubernetes is great at pod scheduling and failover. But it turned out that the scheduler sometimes cannot place a pod if it is difficult to estimate how many resources it needs to work successfully. This is where requests for resources and limits pop up. There is a lot of debate about the best approach to setting requests and limits. Sometimes it seems that this is really more of an art than a science. Here is our approach.

Pod requests is the main value used by the scheduler to optimally place the pod.

Of Kubernetes documentation: The filter step defines a set of nodes where a Pod can be scheduled. For example, the PodFitsResources filter checks to see if a node has enough resources to satisfy specific resource requests from a pod.

We use application requests in such a way that we can estimate how many resources actually The application needs it to function properly. This way the scheduler can realistically place the nodes. Initially, we wanted to over-schedule requests to ensure enough resources for each Pod, but we noticed that the scheduling time increased significantly, and some Pods were not fully scheduled, as if there were no resource requests for them.

In this case, the scheduler would often "squeeze out" the pods and not be able to reschedule them because the control plane had no idea how much resources the application would need, which is a key component of the scheduling algorithm.

Pod limits is a clearer limit for a pod. It represents the maximum amount of resources that the cluster will allocate to the container.

Again, from official documentation: If a container has a memory limit of 4 GiB, then the kubelet (and the container runtime) will enforce it. The runtime prevents the container from using more than the specified resource limit. For example, when a process in a container tries to use more than the allowed amount of memory, the system kernel terminates the process with an "out of memory" (OOM) error.

A container can always use more resources than the resource request specifies, but it can never use more than the limit. This value is difficult to set correctly, but it is very important.

Ideally, we want the resource requirements of a pod to change during the lifecycle of a process without interfering with other processes in the system - this is the purpose of setting limits.

Unfortunately, I cannot give specific instructions on what values ​​to set, but we ourselves adhere to the following rules:

  1. Using a load testing tool, we simulate a base level of traffic and observe the use of pod resources (memory and processor).
  2. Set the pod requests to an arbitrarily low value (with a resource limit of about 5 times the requests value) and observe. When requests are at too low a level, the process cannot start, often causing cryptic Go runtime errors.

I note that higher resource limits make scheduling more difficult because the pod needs a target node with enough resources available.

Imagine a situation where you have a lightweight web server with a very high resource limit, like 4 GB of memory. This process will likely need to be scaled out horizontally, and each new pod will need to be scheduled on a node with at least 4 GB of available memory. If no such node exists, the cluster must introduce a new node to process this pod, which may take some time. It is important to achieve a minimum difference between resource requests and limits to ensure fast and smooth scaling.

Step Two: Set Up Liveness and Readiness Tests

This is another subtle topic that is often discussed in the Kubernetes community. It is important to have a good understanding of Liveness and Readiness tests as they provide a mechanism for stable operation of the software and minimize downtime. However, they can seriously impact your application's performance if not configured correctly. Below is a summary of what both samples are.

liveness shows if the container is running. If it fails, the kubelet kills the container and the restart policy is enabled for it. If the container is not equipped with a Liveness Probe, then the default state will be success - as stated in Kubernetes documentation.

Liveness probes should be cheap, i.e. not consume a lot of resources, because they run frequently and should inform Kubernetes that the application is running.

If you set the option to run every second, this will add 1 request per second, so be aware that additional resources will be required to process this traffic.

At our company, Liveness tests test the core components of an application, even if the data (for example, from a remote database or cache) is not fully available.

We've set up a "health" endpoint in the applications that simply returns a 200 response code. This is an indication that the process is running and capable of handling requests (but not traffic yet).

Sample readiness indicates whether the container is ready to serve requests. If the readiness probe fails, the endpoint controller removes the pod's IP address from the endpoints of all services matching the pod. This is also stated in the Kubernetes documentation.

Readiness probes consume more resources, as they must hit the backend in such a way as to show that the application is ready to accept requests.

There is a lot of debate in the community about whether to access the database directly. Considering the overhead (the checks are frequent, but they can be controlled), we decided that for some applications, readiness to serve traffic is only counted after checking that records are returned from the database. Well-designed readiness trials ensured higher levels of availability and eliminated downtime during deployment.

If you decide to query the database to test the readiness of your application, make sure it's as cheap as possible. Let's take this query:

SELECT small_item FROM table LIMIT 1

Here is an example of how we configure these two values ​​in Kubernetes:

livenessProbe: 
 httpGet:   
   path: /api/liveness    
   port: http 
readinessProbe:  
 httpGet:    
   path: /api/readiness    
   port: http  periodSeconds: 2

You can add some additional configuration options:

  • initialDelaySeconds - how many seconds will pass between the launch of the container and the start of the launch of the probes.
  • periodSeconds — waiting interval between sample runs.
  • timeoutSeconds — the number of seconds after which the pod is considered to be emergency. Normal timeout.
  • failureThreshold is the number of test failures before a restart signal is sent to the pod.
  • successThreshold is the number of successful trials before the pod transitions to the ready state (after a failure when the pod starts up or recovers).

Step Three: Setting the Pod's Default Network Policies

Kubernetes has a "flat" network topography, by default all pods communicate directly with each other. In some cases this is not desirable.

A potential security issue is that an attacker could use a single vulnerable application to send traffic to all pods on the network. As in many areas of security, the principle of least privilege applies here. Ideally, network policies should explicitly state which connections between pods are allowed and which are not.

For example, the following is a simple policy that denies all incoming traffic for a particular namespace:

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:  
 name: default-deny-ingress
spec:  
 podSelector: {}  
 policyTypes:  
   - Ingress

Visualization of this configuration:

Five misses when deploying the first application on Kubernetes
(https://miro.medium.com/max/875/1*-eiVw43azgzYzyN1th7cZg.gif)
In details here.

Step Four: Custom Behavior with Hooks and Init Containers

One of our main goals was to provide deployments in Kubernetes without downtime for developers. This is difficult because there are many options for shutting down applications and releasing their used resources.

Particular difficulties arose with Nginx. We noticed that when deploying these Pods in sequence, active connections were interrupted before successfully completing.

After extensive research on the Internet, it turned out that Kubernetes does not wait for Nginx connections to exhaust themselves before shutting down the pod. With the help of the pre-stop hook, we implemented the following functionality and completely got rid of the downtime:

lifecycle: 
 preStop:
   exec:
     command: ["/usr/local/bin/nginx-killer.sh"]

Rђ RІRѕS, nginx-killer.sh:

#!/bin/bash
sleep 3
PID=$(cat /run/nginx.pid)
nginx -s quit
while [ -d /proc/$PID ]; do
   echo "Waiting while shutting down nginx..."
   sleep 10
done

Another extremely useful paradigm is the use of init containers to handle the launch of specific applications. This is especially useful if you have a resource-intensive database migration process that must be run before the application starts. You can also specify a higher resource limit for this process without setting such a limit for the main application.

Another common scheme is to access secrets in the init container, which provides these credentials to the main module, which prevents unauthorized access to secrets from the main application module itself.

As usual, a quote from the documentation: init containers safely run user code or utilities that would otherwise compromise the security of the application's container image. By keeping unnecessary tools separate, you limit the attack surface of the application's container image.

Step Five: Kernel Configuration

Finally, let's talk about a more advanced technique.

Kubernetes is an extremely flexible platform that allows you to run workloads however you see fit. We have a number of highly efficient applications that are extremely resource intensive. After doing extensive load testing, we found that one of the applications was having a hard time keeping up with the expected traffic load when the default Kubernetes settings were in effect.

However, Kubernetes allows you to run a privileged container that only changes kernel parameters for a specific pod. Here's what we used to change the maximum number of open connections:

initContainers:
  - name: sysctl
     image: alpine:3.10
     securityContext:
         privileged: true
      command: ['sh', '-c', "sysctl -w net.core.somaxconn=32768"]

This is a more advanced technique that is often not needed. But if your application is struggling to cope with a heavy load, you can try tweaking some of these settings. More information about this process and setting different values ​​- as always in the official documentation.

In conclusion

While Kubernetes may seem like an out-of-the-box solution, there are a few key steps that must be taken to keep applications running smoothly.

Throughout the migration to Kubernetes, it is important to follow the “load testing cycle”: run the application, test it under load, observe the metrics and scaling behavior, adjust the configuration based on this data, then repeat this cycle again.

Be realistic about expected traffic and try to go beyond it to see which components break first. With this iterative approach, only a few of the listed recommendations may be enough to achieve success. Or it may require more in-depth customization.

Always ask yourself these questions:

  1. How many resources do applications consume and how will this amount change?
  2. What are the real scaling requirements? How much traffic will the app handle on average? What about peak traffic?
  3. How often will the service need to scale out? How quickly do new pods need to be up and running to receive traffic?
  4. How gracefully do pods shut down? Is it necessary at all? Is it possible to achieve deployment without downtime?
  5. How to minimize security risks and limit damage from any compromised pods? Do any services have permissions or accesses that they don't need?

Kubernetes provides an incredible platform that allows you to use best practices to deploy thousands of services in a cluster. However, all applications are different. Sometimes implementation requires a little more work.

Fortunately, Kubernetes provides the necessary settings to achieve all technical goals. By using a combination of resource requests and limits, Liveness and Readiness probes, init containers, network policies, and custom kernel tuning, you can achieve high performance along with fault tolerance and rapid scalability.

What else to read:

  1. Best Practices and Best Practices for Running Containers and Kubernetes in Production Environments.
  2. 90+ Useful Tools for Kubernetes: Deployment, Management, Monitoring, Security and More.
  3. Our channel Around Kubernetes in Telegram.

Source: habr.com

Add a comment