A Visual Guide to Troubleshooting Kubernetes

Note. transl.: This article is part of the freely available project materials learn8swho teaches companies and individual administrators how to work with Kubernetes. In it, Daniele Polencic, project leader, shares a visual guide on what steps to take if there are general problems with applications running on a K8s cluster.

A Visual Guide to Troubleshooting Kubernetes

TL;DR: Here's a diagram to help you debug your Kubernetes deployment:

A Visual Guide to Troubleshooting Kubernetes

Flowchart for finding and fixing errors in a cluster. The original (in English) is available in PDF ΠΈ as picture.

When deploying an application to Kubernetes, there are usually three components that need to be defined:

  • Deployment - this is a recipe for creating copies of the application, called pods;
  • Service - an internal load balancer that distributes traffic across pods;
  • income - a description of how traffic will get from the outside world to the Service.

Here is a short graphical summary:

1) In Kubernetes, applications receive traffic from the outside world through two layers of load balancers: internal and external.

A Visual Guide to Troubleshooting Kubernetes

2) The internal balancer is called Service, the external is Ingress.

A Visual Guide to Troubleshooting Kubernetes

3) Deployment creates pods and monitors them (they are not created manually).

A Visual Guide to Troubleshooting Kubernetes

Suppose you want to deploy a simple application a la Bonjour Monde. The YAML configuration for it will look like this:

apiVersion: apps/v1
kind: Deployment # <<<
metadata:
  name: my-deployment
  labels:
    track: canary
spec:
  selector:
    matchLabels:
      any-name: my-app
  template:
    metadata:
      labels:
        any-name: my-app
    spec:
      containers:
      - name: cont1
        image: learnk8s/app:1.0.0
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service # <<<
metadata:
  name: my-service
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    name: app
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress # <<<
metadata:
  name: my-ingress
spec:
  rules:
  - http:
    paths:
    - backend:
        serviceName: app
        servicePort: 80
      path: /

The definition is quite long and it's easy to get confused about how the components are related to each other.

For example:

  • When should I use port 80 and when should I use 8080?
  • Should a new port be created for each service so they don't conflict?
  • Do label names matter? Should they be the same everywhere?

Before we focus on debugging, let's recap how the three components relate to each other. Let's start with Deployment and Service.

Relationship between Deployment and Service

You will be surprised, but Deployments and Services are not connected in any way. Instead, the Service directly points to the Pods, bypassing the Deployment.

Thus, we are interested in how Pods and Services are related to each other. Three things to remember:

  1. Selector (selector) for a Service must match at least one Pod label.
  2. targetPort must match containerPort container inside the Pod.
  3. port Service'a can be anything. Different services can use the same port because they have different IP addresses.

The following diagram represents all of the above in graphical form:

1) Imagine that the service directs traffic to a certain pod:

A Visual Guide to Troubleshooting Kubernetes

2) When creating a pod, you must set containerPort for each container in pods:

A Visual Guide to Troubleshooting Kubernetes

3) When creating a service, you must specify port ΠΈ targetPort. But through which of them is the connection to the container going?

A Visual Guide to Troubleshooting Kubernetes

4) Through targetPort. It must match with containerPort.

A Visual Guide to Troubleshooting Kubernetes

5) Suppose port 3000 is open in the container. Then the value targetPort should be the same.

A Visual Guide to Troubleshooting Kubernetes

In the YAML file, the labels and ports / targetPort must match:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    track: canary
spec:
  selector:
    matchLabels:
      any-name: my-app
  template:
    metadata:
     labels:  # <<<
        any-name: my-app  # <<<
   spec:
      containers:
      - name: cont1
        image: learnk8s/app:1.0.0
        ports:
       - containerPort: 8080  # <<<
---
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  ports:
  - port: 80
   targetPort: 8080  # <<<
 selector:  # <<<
    any-name: my-app  # <<<

What about the label track: canary at the top of the Deployment section? Should it match?

This label is deployment specific and is not used by the service to route traffic. In other words, it can be removed or assigned a different value.

What about the selector matchLabels?

It should always match the Pod's labels, as it is used by Deployment to keep track of pods.

Let's assume you've made the correct edits. How to check them?

You can check the label of pods with the following command:

kubectl get pods --show-labels

Or, if the pods are owned by multiple applications:

kubectl get pods --selector any-name=my-app --show-labels

Where any-name=my-app is a label any-name: my-app.

Are there any difficulties left?

You can connect to a pod! For this you need to use the command port-forward in kubectl. It allows you to connect to the service and check the connection.

kubectl port-forward service/<service name> 3000:80

Here:

  • service/<service name> β€” service name; in our case it is my-service;
  • 3000 is the port that you want to open on the computer;
  • 80 - port specified in the field port service.

If the connection was successfully established, then the settings are correct.

If the connection could not be established, then there is a problem with the labels or the ports do not match.

Relationship between Service and Ingress

The next step in providing access to the application is related to setting up the Ingress. Ingress needs to know how to find the service, then find the pods and send traffic to them. Ingress finds the required service by name and open port.

In the description of Ingress and Service, two parameters must match:

  1. servicePort in Ingress must match the parameter port in service;
  2. serviceName in Ingress must match the field name in Service.

The following diagram summarizes the port connections:

1) As you already know, Service listens to some port:

A Visual Guide to Troubleshooting Kubernetes

2) Ingress has a parameter called servicePort:

A Visual Guide to Troubleshooting Kubernetes

3) This parameter (servicePort) must always match port in the Service definition:

A Visual Guide to Troubleshooting Kubernetes

4) If port 80 is specified in Service, then it is necessary that servicePort was also equal to 80:

A Visual Guide to Troubleshooting Kubernetes

In practice, you need to pay attention to the following lines:

apiVersion: v1
kind: Service
metadata:
 name: my-service  # <<<
spec:
  ports:
 - port: 80  # <<<
   targetPort: 8080
  selector:
    any-name: my-app
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: my-ingress
spec:
  rules:
  - http:
    paths:
    - backend:
       serviceName: my-service  # <<<
       servicePort: 80  # <<<
     path: /

How to check if Ingress is working?

You can use the method with kubectl port-forward, but instead of a service, you need to connect to the Ingress controller.

First you need to find out the name of the pod with the Ingress controller:

kubectl get pods --all-namespaces
NAMESPACE   NAME                              READY STATUS
kube-system coredns-5644d7b6d9-jn7cq          1/1   Running
kube-system etcd-minikube                     1/1   Running
kube-system kube-apiserver-minikube           1/1   Running
kube-system kube-controller-manager-minikube  1/1   Running
kube-system kube-proxy-zvf2h                  1/1   Running
kube-system kube-scheduler-minikube           1/1   Running
kube-system nginx-ingress-controller-6fc5bcc  1/1   Running

Find the Ingress pod (it may be in a different namespace) and run the command describeto find out the port numbers:

kubectl describe pod nginx-ingress-controller-6fc5bcc 
--namespace kube-system 
 | grep Ports
Ports:         80/TCP, 443/TCP, 18080/TCP

Finally, connect to the pod:

kubectl port-forward nginx-ingress-controller-6fc5bcc 3000:80 --namespace kube-system

Now every time you send a request to port 3000 on the machine, it will be redirected to port 80 of the Ingress pod. Going to http://localhost:3000, you should see the page generated by the application.

Summary by ports

Let's remember again which ports and labels must match:

  1. The selector in the Service definition must match the pod's label;
  2. targetPort in the Service definition must match containerPort a container inside a pod;
  3. port in the Service definition can be anything. Different services can use the same port because they have different IP addresses;
  4. servicePort Ingress must match port in the Service definition;
  5. The service name must match the field serviceName in Ingress.

Alas, it is not enough to know how to properly structure a YAML configuration.

What happens when something goes wrong?

The pod may not start or it may be crashing.

3 steps to troubleshoot applications in Kubernetes

Before you start debugging your deployment, you need to have a good understanding of how Kubernetes works.

Since there are three components in every downloaded application in K8s, debug them in a certain order, starting from the bottom.

  1. First you need to make sure that the pods are working, then ...
  2. Check if the service is delivering traffic to the pods, and then…
  3. Check if Ingress is configured correctly.

Visual representation:

1) Start looking for problems from the bottom. First check that the pods have statuses Ready ΠΈ Running:

A Visual Guide to Troubleshooting Kubernetes

2) If the pods are ready (Ready), you should find out if the service distributes traffic between pods:

A Visual Guide to Troubleshooting Kubernetes

3) Finally, you need to analyze the connection between the service and Ingress:

A Visual Guide to Troubleshooting Kubernetes

1. Pod diagnostics

In most cases, the problem is related to the pod. Make sure the pods are listed as Ready ΠΈ Running. You can check this with the command:

kubectl get pods
NAME                    READY STATUS            RESTARTS  AGE
app1                    0/1   ImagePullBackOff  0         47h
app2                    0/1   Error             0         47h
app3-76f9fcd46b-xbv4k   1/1   Running           1         47h

In the command output above, the last pod is listed as Running ΠΈ Ready, but this is not the case for the other two.

How to understand what went wrong?

There are four useful commands for diagnosing pods:

  1. kubectl logs <имя pod'а> allows you to extract logs from containers in a pod;
  2. kubectl describe pod <имя pod'а> allows you to view the list of events associated with the pod;
  3. kubectl get pod <имя pod'а> allows you to get the YAML configuration of a pod stored in Kubernetes;
  4. kubectl exec -ti <имя pod'а> bash allows you to run an interactive command shell in one of the pod's containers

Which one to choose?

The fact is that there is no universal command. A combination of them should be used.

Common pod problems

There are two main types of pod errors: startup errors and runtime errors.

Launch errors:

  • ImagePullBackoff
  • ImageInspectError
  • ErrImagePull
  • ErrImageNeverPull
  • RegistryUnavailable
  • InvalidImageName

Runtime errors:

  • CrashLoopBackOff
  • RunContainerError
  • KillContainerError
  • VerifyNonRootError
  • RunInitContainerError
  • CreatePodSandboxError
  • ConfigPodSandboxError
  • KillPodSandboxError
  • SetupNetworkError
  • TeardownNetworkError

Some errors are more common than others. Here are some of the most common errors and how to fix them.

ImagePullBackOff

This error occurs when Kubernetes is unable to get an image for one of the pod's containers. Here are the three most common reasons for this:

  1. The name of the image is incorrect - for example, you made a mistake in it, or the image does not exist;
  2. An invalid tag was specified for the image;
  3. The image is stored in a private registry and Kubernetes does not have permission to access it.

The first two reasons are easy to fix - just fix the image name and tag. In the case of the latter, it is necessary to enter credentials for the private registry in Secret and add links to it in pods. In the Kubernetes documentation there is an example how it can be done.

Crash Loop Back Off

Kubenetes throws an error CrashLoopBackOffif the container cannot start. This usually happens when:

  1. The application has a bug that prevents it from running;
  2. Container configured incorrectly;
  3. The Liveness test has failed too many times.

It is necessary to try to get to the logs from the container in order to find out the reason for its failure. If accessing the logs is difficult because the container restarts too quickly, you can use the following command:

kubectl logs <pod-name> --previous

It prints error messages from the container's previous reincarnation.

RunContainerError

This error occurs when the container is unable to start. It corresponds to the moment before the application starts. It is usually caused by a misconfiguration, such as:

  • trying to mount a non-existent volume such as ConfigMap or Secrets;
  • trying to mount a read-only volume as read-write.

The command is well suited for analyzing such errors. kubectl describe pod <pod-name>.

Pods in the Pending state

After creation, the pod remains in the state Pending.

Why is this happening?

Here are the possible causes (I'm assuming the scheduler works fine):

  1. The cluster does not have enough resources, such as processing power and memory, to run the pod.
  2. An object is set in the appropriate namespace ResourceQuota and creating a pod will cause the namespace to go out of quota.
  3. Pod tied to Pending PersistentVolumeClaim.

In this case, it is recommended to use the command kubectl describe and check section Events:

kubectl describe pod <pod name>

In case of errors related to ResourceQuotas, it is recommended to view the cluster logs using the command

kubectl get events --sort-by=.metadata.creationTimestamp

Pods Not Ready

If pod is listed as Running, but is not in the state Ready, means checking its readiness (readiness probe) fails.

When this happens, the pod does not connect to the service and no traffic is sent to it. The failure of the readiness test is caused by problems in the application. In this case, to find the error, you need to analyze the section Events in command output kubectl describe.

2. Service diagnostics

If pods are listed as Running ΠΈ Ready, but there is still no response from the application, you should check the service settings.

Services are engaged in routing traffic to pods depending on their labels. Therefore, the first thing to do is to check how many pods are working with the service. To do this, you can check the endpoints in the service:

kubectl describe service <service-name> | grep Endpoints

Endpoint is a pair of values ​​of the form <IP-адрСс:ΠΏΠΎΡ€Ρ‚>, and at least one such pair must be present in the output (that is, at least one pod is working with the service).

If section Endpoins empty, there are two options:

  1. there are no pods with the correct label (hint: check if the namespace is correct);
  2. there is an error in the service labels in the selector.

If you see a list of endpoints but still cannot access the application, then the likely culprit is a bug in targetPort in the service description.

How to check if the service is working?

Regardless of the type of service, you can use the command kubectl port-forward to connect to it:

kubectl port-forward service/<service-name> 3000:80

Here:

  • <service-name> β€” service name;
  • 3000 is the port that you open on the computer;
  • 80 - port on the service side.

3. Ingress Diagnostics

If you've read this far, then:

  • pods are listed as Running ΠΈ Ready;
  • the service successfully distributes traffic among pods.

However, you still cannot "get through" to the application.

This means that the Ingress controller is most likely misconfigured. Since the Ingress controller is a third party component in the cluster, there are different debugging methods depending on its type.

But before resorting to the help of special tools to configure Ingress'a, you can do something very simple. Ingress uses serviceName ΠΈ servicePort to connect to the service. You need to check if they are configured correctly. You can do this with the command:

kubectl describe ingress <ingress-name>

If column Backend is empty, there is a high probability of a configuration error. If the backends are in place, but the application is still not accessible, then the problem may be related to:

  • Ingress accessibility settings from the public Internet;
  • cluster accessibility settings from the public Internet.

You can identify problems with the infrastructure by connecting directly to the Ingress pod. To do this, first find the pod of the Ingress controller (it may be in a different namespace):

kubectl get pods --all-namespaces
NAMESPACE   NAME                              READY STATUS
kube-system coredns-5644d7b6d9-jn7cq          1/1   Running
kube-system etcd-minikube                     1/1   Running
kube-system kube-apiserver-minikube           1/1   Running
kube-system kube-controller-manager-minikube  1/1   Running
kube-system kube-proxy-zvf2h                  1/1   Running
kube-system kube-scheduler-minikube           1/1   Running
kube-system nginx-ingress-controller-6fc5bcc  1/1   Running

Use the team describeto set the port:

kubectl describe pod nginx-ingress-controller-6fc5bcc
--namespace kube-system 
 | grep Ports

Finally, connect to the pod:

kubectl port-forward nginx-ingress-controller-6fc5bcc 3000:80 --namespace kube-system

Now all requests on port 3000 on the computer will be redirected to port 80 of the pod.

Does it work now?

  • If so, then the problem is with the infrastructure. It is necessary to find out exactly how traffic is routed to the cluster.
  • If not, then the problem is with the Ingress controller.

If you can't get the Ingress controller to work, you'll have to debug it.

There are many varieties of Ingress controllers. The most popular are Nginx, HAProxy, Traefik, etc. (for more information about existing solutions, see our review - approx. transl.) Refer to the troubleshooting guide in the documentation for the appropriate controller. Because the Ingress Nginx is the most popular Ingress controller, we've included a few tips on how to deal with it in this article.

Debugging an Nginx Ingress Controller

The Ingress-nginx project has an official plugin for kubectl. team kubectl ingress-nginx can be used for:

  • analysis of logs, backends, certificates, etc.;
  • connection to Ingress'u;
  • examining the current configuration.

The following three commands will help you with this:

  • kubectl ingress-nginx lint - checks nginx.conf;
  • kubectl ingress-nginx backend - explores the backend (similar to kubectl describe ingress <ingress-name>);
  • kubectl ingress-nginx logs - checks the logs.

Please note that in some cases it may be necessary to specify the correct namespace for the Ingress controller using the flag --namespace <name>.

Summary

Troubleshooting Kubernetes can be tricky if you don't know where to start. The problem should always be approached from the bottom up: start with pods, and then move on to the service and Ingress. The debugging methods described in the article can be applied to other objects, such as:

  • idle Jobs and CronJobs;
  • StatefulSets and DaemonSets.

Thank you Gergely Risko, Daniel Weibel ΠΈ Charles Christyraj for valuable comments and additions.

PS from translator

Read also on our blog:

Source: habr.com

Add a comment