Kubernetes tips & tricks: graceful shutdown features in NGINX and PHP-FPM

A typical condition for the implementation of CI / CD in Kubernetes: the application must be able to not accept new client requests before a complete stop, and most importantly, successfully complete existing ones.

Kubernetes tips & tricks: graceful shutdown features in NGINX and PHP-FPM

Compliance with this condition allows you to achieve zero downtime during the deployment. However, even when using very popular bundles (like NGINX and PHP-FPM), you can run into difficulties that will lead to a burst of errors with each deployment…

Theory. How does a pod live

We have already published details about the life cycle of a pod this article. In the context of the topic under consideration, we are interested in the following: at the moment when the pod enters the state Ending, new requests are no longer sent to it (pod removed from the list of endpoints for the service). Thus, in order to avoid downtime during the deployment, it is enough for us to solve the problem of correctly stopping the application.

Also keep in mind that the default grace period is 30 seconds: after that, the pod will be terminated and the application should have time to process all requests before this period. Note: although any request that runs for more than 5-10 seconds is already problematic, and graceful shutdown will no longer help him ...

To better understand what happens when a pod terminates, just look at the following diagram:

Kubernetes tips & tricks: graceful shutdown features in NGINX and PHP-FPM

A1, B1 - Getting changes about the state of the pod
A2 - Send SIGTERM
B2 - Removing a pod from endpoints
B3 - Getting changes (the list of endpoints has changed)
B4 - Update iptables rules

Please note that deleting an endpoint pod and sending SIGTERM does not occur sequentially, but in parallel. And due to the fact that Ingress does not immediately receive an updated list of Endpoints, new requests from clients will be sent to the pod, which will cause 500 errors during the termination of the pod (more detailed material on this issue we translated). You need to solve this problem in the following ways:

  • Send in Connection: close response headers (if it concerns an HTTP application).
  • If it is not possible to make changes to the code, then the following article describes a solution that will allow you to process requests until the end of the graceful period.

Theory. How NGINX and PHP-FPM terminate their processes

Nginx

Let's start with NGINX, since everything is more or less obvious with it. Diving into the theory, we learn that NGINX has one master process and several "workers" - these are child processes that process client requests. There is a convenient possibility: using the command nginx -s <SIGNAL> terminate processes either in fast shutdown or graceful shutdown mode. It is obvious that we are interested in the last option.

Then everything is simple: you need to add to preStop hook a command that will send a graceful shutdown signal. This can be done in the Deployment, in the container block:

       lifecycle:
          preStop:
            exec:
              command:
              - /usr/sbin/nginx
              - -s
              - quit

Now, at the moment of shutting down the pod, we will see the following in the NGINX container logs:

2018/01/25 13:58:31 [notice] 1#1: signal 3 (SIGQUIT) received, shutting down
2018/01/25 13:58:31 [notice] 11#11: gracefully shutting down

And this will mean what we need: NGINX waits for requests to complete, after which it kills the process. However, a common problem will be considered below, due to which, even if there is a command nginx -s quit the process ends incorrectly.

And at this stage, we have finished with NGINX: at least from the logs, you can understand that everything is working as it should.

How are things with PHP-FPM? How does it handle graceful shutdown? Let's figure it out.

PHP-FPM

In the case of PHP-FPM, there is a little less information. If you focus on official manual PHP-FPM, it will tell you that the following POSIX signals are accepted:

  1. SIGINT, SIGTERM — fast shutdown;
  2. SIGQUIT - graceful shutdown (what we need).

The remaining signals are not required in this problem, so we will omit their analysis. To correctly terminate the process, you will need to write the following preStop hook:

        lifecycle:
          preStop:
            exec:
              command:
              - /bin/kill
              - -SIGQUIT
              - "1"

At first glance, this is all that is required to perform a graceful shutdown in both containers. However, the task is more difficult than it seems. Further, two cases are analyzed in which graceful shutdown did not work and caused a short-term unavailability of the project during the deployment.

Practice. Possible issues with graceful shutdown

Nginx

First of all, it is useful to remember: in addition to executing the command nginx -s quit There is one more step to pay attention to. We encountered an issue where NGINX still sent SIGTERM instead of signaling SIGQUIT, causing requests to not complete correctly. Similar cases can be found, for example, here. Unfortunately, we were unable to determine the specific reason for this behavior: there was a suspicion on the NGINX version, but it was not confirmed. The symptomatology was that messages were observed in the logs of the NGINX container "open socket #10 left in connection 5", after which the pod stopped.

We can observe such a problem, for example, from the answers to the Ingress we need:

Kubernetes tips & tricks: graceful shutdown features in NGINX and PHP-FPM
Indicators of status codes at the time of deployment

In this case, we get just a 503 error code from the Ingress itself: it cannot access the NGINX container, since it is no longer available. If you look at the logs of the container with NGINX, they contain the following:

[alert] 13939#0: *154 open socket #3 left in connection 16
[alert] 13939#0: *168 open socket #6 left in connection 13

After changing the stop signal, the container starts to stop correctly: this is confirmed by the fact that the 503 error is no longer observed.

If you encounter a similar problem, it makes sense to figure out what kind of stop signal is used in the container and what exactly the preStop hook looks like. It is quite possible that the reason lies precisely in this.

PHP-FPM... and more

The problem with PHP-FPM is described trivially: it does not wait for child processes to terminate, it terminates them, which causes 502 errors during deployment and other operations. There are several bug reports on bugs.php.net from 2005 (for example, here и here) that describe the issue. But in the logs, you most likely will not see anything: PHP-FPM will announce the end of its process without any errors or third-party notifications.

It is worth clarifying that the problem itself may depend to a lesser or greater extent on the application itself and not manifest itself, for example, in monitoring. If you still run into it, then a simple workaround comes to mind first: add a preStop hook with sleep(30). It will allow you to complete all requests that were before (and we do not accept new ones, since pod already able Ending), and after 30 seconds, the pod itself will end with a signal SIGTERM.

It turns out that lifecycle for the container will look like this:

    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sleep
          - "30"

However, due to the specification of a 30 second sleep we strongly increase the deployment time, since each pod will be terminated minimum 30 seconds, which is bad. What can be done about it?

Let's turn to the party responsible for the direct execution of the application. In our case, this PHP-FPMWhich by default does not monitor the execution of its child processes: The master process is terminated immediately. You can change this behavior with the directive process_control_timeout, which specifies the time limits for child processes to wait for signals from the master. If set to 20 seconds, this will cover most of the requests running in the container and stop the master process when they complete.

With this knowledge, let us return to our last problem. As already mentioned, Kubernetes is not a monolithic platform: it takes some time for interaction between its various components. This is especially true when we consider the operation of Ingresses and other related components, because due to such a delay at the time of deployment, it is easy to get a surge of 500 errors. For example, an error may occur at the stage of sending a request to upstream, but the "time lag" of interaction between components is quite short - less than a second.

Thus, In total with the already mentioned directive process_control_timeout You can use the following construction for lifecycle:

lifecycle:
  preStop:
    exec:
      command: ["/bin/bash","-c","/bin/sleep 1; kill -QUIT 1"]

In this case, we compensate the delay with the command sleep and we do not greatly increase the deployment time: after all, is there a noticeable difference between 30 seconds and one? .. In fact, it is precisely process_control_timeout, lifecycle used only as a "safety net" in case of a lag.

Generally speaking, the behavior described and the corresponding workaround are not limited to PHP-FPM. A similar situation can somehow arise when using other PL / frameworks. If you can’t fix graceful shutdown in other ways - for example, rewrite the code so that the application correctly processes termination signals, you can apply the described method. It may not be the prettiest, but it works.

Practice. Load testing to check the operation of the pod

Load testing is one way to test how a container works, since this procedure brings it closer to real combat conditions when users enter the site. To test the above recommendations, you can use Yandex.Tankom: it perfectly covers all our needs. The following are tips and tricks for testing with a clear — thanks to the Grafana charts and Yandex.Tank itself — an example from our experience.

The most important thing here is check changes step by step. After adding a new patch, run the test and see if the results have changed compared to the last run. Otherwise, it will be difficult to identify inefficient solutions, and in the future it can only do harm (for example, increase the deployment time).

Another nuance is to watch the container logs during its termination. Is information about graceful shutdown recorded there? Are there any errors in the logs when accessing other resources (for example, the neighboring PHP-FPM container)? Errors of the application itself (as in the case with NGINX described above)? I hope that the introductory information from this article will help you better understand what happens to the container during its termination.

So, the first test run took place without lifecycle and without additional directives for the application server (process_control_timeout in PHP-FPM). The purpose of this test was to determine the approximate number of errors (and whether there are any at all). Also, from additional information, you should know that the average deployment time for each pod was about 5-10 seconds to the state of full readiness. The results are:

Kubernetes tips & tricks: graceful shutdown features in NGINX and PHP-FPM

The Yandex.Tank information panel shows a surge of 502 errors that occurred at the time of deployment and lasted up to 5 seconds on average. Presumably this was cutting off existing requests to the old pod when it terminated. After that, 503 errors appeared, which was the result of a stopped NGINX container, which also dropped connections due to the backend (because of which Ingress could not connect to it).

Let's see how process_control_timeout in PHP-FPM will help us to wait for child processes to finish, i.e. correct such errors. Re-deploy already using this directive:

Kubernetes tips & tricks: graceful shutdown features in NGINX and PHP-FPM

No more 500 errors during deployment! Deploy is successful, graceful shutdown works.

However, it is worth remembering the moment with Ingress containers, in which we can get a small percentage of errors due to the time lag. To avoid them, it remains to add a construction with sleep and repeat the deployment. However, in our particular case, no changes were visible (no errors again).

Conclusion

To terminate the process gracefully, we expect the following behavior from the application:

  1. Wait a few seconds, then stop accepting new connections.
  2. Wait for all requests to complete and close all keepalive connections that requests are not fulfilling.
  3. End your process.

However, not all applications can work this way. One solution to the problem in Kubernetes realities is:

  • adding a pre-stop hook that will wait a few seconds;
  • examining the configuration file of our backend for the appropriate parameters.

The NGINX example makes it clear that even an application that should initially correctly process termination signals may not do this, so it is critical to check for 500 errors during application deployment. It also allows you to look at the problem more broadly and not focus on a separate pod or container, but look at the entire infrastructure as a whole.

As a testing tool, you can use Yandex.Tank in conjunction with any monitoring system (in our case, we took data from Grafana with a backend in the form of Prometheus for the test). Problems with graceful shutdown are clearly visible under heavy loads that the benchmark can generate, and monitoring helps to analyze the situation in more detail during or after the test.

Responding to feedback on the article: it is worth mentioning that the problems and ways to solve them are described here in relation to NGINX Ingress. For other cases, there are other solutions, which, perhaps, we will consider in the following materials of the cycle.

PS

Others from the K8s tips & tricks series:

Source: habr.com

Add a comment