Note. transl.: This article is part of the freely available project materials learn8swho teaches companies and individual administrators how to work with Kubernetes. In it, Daniele Polencic, project leader, shares a visual guide on what steps to take if there are general problems with applications running on a K8s cluster.
TL;DR: Here's a diagram to help you debug your Kubernetes deployment:
Flowchart for finding and fixing errors in a cluster. The original (in English) is available in PDF ΠΈ as picture.
When deploying an application to Kubernetes, there are usually three components that need to be defined:
Deployment - this is a recipe for creating copies of the application, called pods;
Service - an internal load balancer that distributes traffic across pods;
income - a description of how traffic will get from the outside world to the Service.
Here is a short graphical summary:
1) In Kubernetes, applications receive traffic from the outside world through two layers of load balancers: internal and external.
2) The internal balancer is called Service, the external is Ingress.
3) Deployment creates pods and monitors them (they are not created manually).
Suppose you want to deploy a simple application a la Bonjour Monde. The YAML configuration for it will look like this:
The definition is quite long and it's easy to get confused about how the components are related to each other.
For example:
When should I use port 80 and when should I use 8080?
Should a new port be created for each service so they don't conflict?
Do label names matter? Should they be the same everywhere?
Before we focus on debugging, let's recap how the three components relate to each other. Let's start with Deployment and Service.
Relationship between Deployment and Service
You will be surprised, but Deployments and Services are not connected in any way. Instead, the Service directly points to the Pods, bypassing the Deployment.
Thus, we are interested in how Pods and Services are related to each other. Three things to remember:
Selector (selector) for a Service must match at least one Pod label.
targetPort must match containerPort container inside the Pod.
port Service'a can be anything. Different services can use the same port because they have different IP addresses.
The following diagram represents all of the above in graphical form:
1) Imagine that the service directs traffic to a certain pod:
2) When creating a pod, you must set containerPort for each container in pods:
3) When creating a service, you must specify port ΠΈ targetPort. But through which of them is the connection to the container going?
4) Through targetPort. It must match with containerPort.
5) Suppose port 3000 is open in the container. Then the value targetPort should be the same.
In the YAML file, the labels and ports / targetPort must match:
What about the label track: canary at the top of the Deployment section? Should it match?
This label is deployment specific and is not used by the service to route traffic. In other words, it can be removed or assigned a different value.
What about the selector matchLabels?
It should always match the Pod's labels, as it is used by Deployment to keep track of pods.
Let's assume you've made the correct edits. How to check them?
You can check the label of pods with the following command:
kubectl get pods --show-labels
Or, if the pods are owned by multiple applications:
kubectl get pods --selector any-name=my-app --show-labels
Where any-name=my-app is a label any-name: my-app.
Are there any difficulties left?
You can connect to a pod! For this you need to use the command port-forward in kubectl. It allows you to connect to the service and check the connection.
service/<service name> β service name; in our case it is my-service;
3000 is the port that you want to open on the computer;
80 - port specified in the field port service.
If the connection was successfully established, then the settings are correct.
If the connection could not be established, then there is a problem with the labels or the ports do not match.
Relationship between Service and Ingress
The next step in providing access to the application is related to setting up the Ingress. Ingress needs to know how to find the service, then find the pods and send traffic to them. Ingress finds the required service by name and open port.
In the description of Ingress and Service, two parameters must match:
servicePort in Ingress must match the parameter port in service;
serviceName in Ingress must match the field name in Service.
The following diagram summarizes the port connections:
1) As you already know, Service listens to some port:
2) Ingress has a parameter called servicePort:
3) This parameter (servicePort) must always match port in the Service definition:
4) If port 80 is specified in Service, then it is necessary that servicePort was also equal to 80:
In practice, you need to pay attention to the following lines:
Now every time you send a request to port 3000 on the machine, it will be redirected to port 80 of the Ingress pod. Going to http://localhost:3000, you should see the page generated by the application.
Summary by ports
Let's remember again which ports and labels must match:
The selector in the Service definition must match the pod's label;
targetPort in the Service definition must match containerPort a container inside a pod;
port in the Service definition can be anything. Different services can use the same port because they have different IP addresses;
servicePort Ingress must match port in the Service definition;
The service name must match the field serviceName in Ingress.
Alas, it is not enough to know how to properly structure a YAML configuration.
What happens when something goes wrong?
The pod may not start or it may be crashing.
3 steps to troubleshoot applications in Kubernetes
Before you start debugging your deployment, you need to have a good understanding of how Kubernetes works.
Since there are three components in every downloaded application in K8s, debug them in a certain order, starting from the bottom.
First you need to make sure that the pods are working, then ...
Check if the service is delivering traffic to the pods, and thenβ¦
Check if Ingress is configured correctly.
Visual representation:
1) Start looking for problems from the bottom. First check that the pods have statuses Ready ΠΈ Running:
2) If the pods are ready (Ready), you should find out if the service distributes traffic between pods:
3) Finally, you need to analyze the connection between the service and Ingress:
1. Pod diagnostics
In most cases, the problem is related to the pod. Make sure the pods are listed as Ready ΠΈ Running. You can check this with the command:
kubectl get pods
NAME READY STATUS RESTARTS AGE
app1 0/1 ImagePullBackOff 0 47h
app2 0/1 Error 0 47h
app3-76f9fcd46b-xbv4k 1/1 Running 1 47h
In the command output above, the last pod is listed as Running ΠΈ Ready, but this is not the case for the other two.
How to understand what went wrong?
There are four useful commands for diagnosing pods:
kubectl logs <ΠΈΠΌΡ pod'Π°> allows you to extract logs from containers in a pod;
kubectl describe pod <ΠΈΠΌΡ pod'Π°> allows you to view the list of events associated with the pod;
kubectl get pod <ΠΈΠΌΡ pod'Π°> allows you to get the YAML configuration of a pod stored in Kubernetes;
kubectl exec -ti <ΠΈΠΌΡ pod'Π°> bash allows you to run an interactive command shell in one of the pod's containers
Which one to choose?
The fact is that there is no universal command. A combination of them should be used.
Common pod problems
There are two main types of pod errors: startup errors and runtime errors.
Launch errors:
ImagePullBackoff
ImageInspectError
ErrImagePull
ErrImageNeverPull
RegistryUnavailable
InvalidImageName
Runtime errors:
CrashLoopBackOff
RunContainerError
KillContainerError
VerifyNonRootError
RunInitContainerError
CreatePodSandboxError
ConfigPodSandboxError
KillPodSandboxError
SetupNetworkError
TeardownNetworkError
Some errors are more common than others. Here are some of the most common errors and how to fix them.
ImagePullBackOff
This error occurs when Kubernetes is unable to get an image for one of the pod's containers. Here are the three most common reasons for this:
The name of the image is incorrect - for example, you made a mistake in it, or the image does not exist;
An invalid tag was specified for the image;
The image is stored in a private registry and Kubernetes does not have permission to access it.
The first two reasons are easy to fix - just fix the image name and tag. In the case of the latter, it is necessary to enter credentials for the private registry in Secret and add links to it in pods. In the Kubernetes documentation there is an example how it can be done.
Crash Loop Back Off
Kubenetes throws an error CrashLoopBackOffif the container cannot start. This usually happens when:
The application has a bug that prevents it from running;
It is necessary to try to get to the logs from the container in order to find out the reason for its failure. If accessing the logs is difficult because the container restarts too quickly, you can use the following command:
kubectl logs <pod-name> --previous
It prints error messages from the container's previous reincarnation.
RunContainerError
This error occurs when the container is unable to start. It corresponds to the moment before the application starts. It is usually caused by a misconfiguration, such as:
trying to mount a non-existent volume such as ConfigMap or Secrets;
trying to mount a read-only volume as read-write.
The command is well suited for analyzing such errors. kubectl describe pod <pod-name>.
Pods in the Pending state
After creation, the pod remains in the state Pending.
Why is this happening?
Here are the possible causes (I'm assuming the scheduler works fine):
The cluster does not have enough resources, such as processing power and memory, to run the pod.
An object is set in the appropriate namespace ResourceQuota and creating a pod will cause the namespace to go out of quota.
Pod tied to Pending PersistentVolumeClaim.
In this case, it is recommended to use the command kubectl describe and check section Events:
kubectl describe pod <pod name>
In case of errors related to ResourceQuotas, it is recommended to view the cluster logs using the command
kubectl get events --sort-by=.metadata.creationTimestamp
Pods Not Ready
If pod is listed as Running, but is not in the state Ready, means checking its readiness (readiness probe) fails.
When this happens, the pod does not connect to the service and no traffic is sent to it. The failure of the readiness test is caused by problems in the application. In this case, to find the error, you need to analyze the section Events in command output kubectl describe.
2. Service diagnostics
If pods are listed as Running ΠΈ Ready, but there is still no response from the application, you should check the service settings.
Services are engaged in routing traffic to pods depending on their labels. Therefore, the first thing to do is to check how many pods are working with the service. To do this, you can check the endpoints in the service:
kubectl describe service <service-name> | grep Endpoints
Endpoint is a pair of values ββof the form <IP-Π°Π΄ΡΠ΅Ρ:ΠΏΠΎΡΡ>, and at least one such pair must be present in the output (that is, at least one pod is working with the service).
If section Endpoins empty, there are two options:
there are no pods with the correct label (hint: check if the namespace is correct);
there is an error in the service labels in the selector.
If you see a list of endpoints but still cannot access the application, then the likely culprit is a bug in targetPort in the service description.
How to check if the service is working?
Regardless of the type of service, you can use the command kubectl port-forward to connect to it:
the service successfully distributes traffic among pods.
However, you still cannot "get through" to the application.
This means that the Ingress controller is most likely misconfigured. Since the Ingress controller is a third party component in the cluster, there are different debugging methods depending on its type.
But before resorting to the help of special tools to configure Ingress'a, you can do something very simple. Ingress uses serviceName ΠΈ servicePort to connect to the service. You need to check if they are configured correctly. You can do this with the command:
kubectl describe ingress <ingress-name>
If column Backend is empty, there is a high probability of a configuration error. If the backends are in place, but the application is still not accessible, then the problem may be related to:
Ingress accessibility settings from the public Internet;
cluster accessibility settings from the public Internet.
You can identify problems with the infrastructure by connecting directly to the Ingress pod. To do this, first find the pod of the Ingress controller (it may be in a different namespace):
Now all requests on port 3000 on the computer will be redirected to port 80 of the pod.
Does it work now?
If so, then the problem is with the infrastructure. It is necessary to find out exactly how traffic is routed to the cluster.
If not, then the problem is with the Ingress controller.
If you can't get the Ingress controller to work, you'll have to debug it.
There are many varieties of Ingress controllers. The most popular are Nginx, HAProxy, Traefik, etc. (for more information about existing solutions, see our review - approx. transl.) Refer to the troubleshooting guide in the documentation for the appropriate controller. Because the Ingress Nginx is the most popular Ingress controller, we've included a few tips on how to deal with it in this article.
Debugging an Nginx Ingress Controller
The Ingress-nginx project has an official plugin for kubectl. team kubectl ingress-nginx can be used for:
analysis of logs, backends, certificates, etc.;
connection to Ingress'u;
examining the current configuration.
The following three commands will help you with this:
kubectl ingress-nginx lint - checks nginx.conf;
kubectl ingress-nginx backend - explores the backend (similar to kubectl describe ingress <ingress-name>);
kubectl ingress-nginx logs - checks the logs.
Please note that in some cases it may be necessary to specify the correct namespace for the Ingress controller using the flag --namespace <name>.
Summary
Troubleshooting Kubernetes can be tricky if you don't know where to start. The problem should always be approached from the bottom up: start with pods, and then move on to the service and Ingress. The debugging methods described in the article can be applied to other objects, such as: