Local files when migrating an application to Kubernetes

Local files when migrating an application to Kubernetes

When building a CI / CD process using Kubernetes, sometimes there is a problem of incompatibility between the requirements of the new infrastructure and the application transferred to it. In particular, at the stage of building an application, it is important to obtain one image to be used in all project environments and clusters. This principle underlies the correct according to Google container management (more than once about this Π³ΠΎΠ²ΠΎΡ€ΠΈΠ» and our technical director).

However, you will not see anyone in situations where a ready-made framework is used in the site code, the use of which imposes restrictions on its further operation. And while this is easy to deal with in a "normal environment", in Kubernetes this behavior can become a problem, especially when you encounter it for the first time. Although an inventive mind is able to come up with infrastructure solutions that seem obvious and even good at first glance ... it is important to remember that most situations can and should be decided architecturally.

We will analyze popular workaround solutions for storing files that can lead to unpleasant consequences during the operation of the cluster, and also point out a more correct way.

Storage of statics

To illustrate, consider a web application that uses some kind of static generator to get a set of pictures, styles, and more. For example, the Yii PHP framework has a built-in asset manager that generates unique directory names. Accordingly, the output is a set of obviously non-intersecting paths for site statics (this was done for several reasons - for example, to eliminate duplicates when using the same resource by many components). So, out of the box, when you first access the web resource module, statics are formed and unfolded (in fact, often symlinks, but more on that later) with a common root directory that is unique for this deployment:

  • webroot/assets/2072c2df/css/…
  • webroot/assets/2072c2df/images/…
  • webroot/assets/2072c2df/js/…

What does this mean for a cluster?

Simplest example

Let's take a fairly common case, when PHP is preceded by nginx for distributing static and processing simple requests. The easiest way - Deployment with two containers:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: site
spec:
  selector:
    matchLabels:
      component: backend
  template:
    metadata:
      labels:
        component: backend
    spec:
      volumes:
        - name: nginx-config
          configMap:
            name: nginx-configmap
      containers:
      - name: php
        image: own-image-with-php-backend:v1.0
        command: ["/usr/local/sbin/php-fpm","-F"]
        workingDir: /var/www
      - name: nginx
        image: nginx:1.16.0
        command: ["/usr/sbin/nginx", "-g", "daemon off;"]
        volumeMounts:
        - name: nginx-config
          mountPath: /etc/nginx/conf.d/default.conf
          subPath: nginx.conf

In a simplified form, the nginx config boils down to the following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: "nginx-configmap"
data:
  nginx.conf: |
    server {
        listen 80;
        server_name _;
        charset utf-8;
        root  /var/www;

        access_log /dev/stdout;
        error_log /dev/stderr;

        location / {
            index index.php;
            try_files $uri $uri/ /index.php?$args;
        }

        location ~ .php$ {
            fastcgi_pass 127.0.0.1:9000;
            fastcgi_index index.php;
            include fastcgi_params;
        }
    }

When you first access the site, assets appear in the container with PHP. But in the case of two containers within one pod, nginx knows nothing about these static files, which (according to the configuration) should be given to them. As a result, the client will see a 404 error for all requests to CSS and JS files. The easiest solution here is to organize a common directory for containers. Primitive option - common emptyDir:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: site
spec:
  selector:
    matchLabels:
      component: backend
  template:
    metadata:
      labels:
        component: backend
    spec:
      volumes:
        - name: assets
          emptyDir: {}
        - name: nginx-config
          configMap:
            name: nginx-configmap
      containers:
      - name: php
        image: own-image-with-php-backend:v1.0
        command: ["/usr/local/sbin/php-fpm","-F"]
        workingDir: /var/www
        volumeMounts:
        - name: assets
          mountPath: /var/www/assets
      - name: nginx
        image: nginx:1.16.0
        command: ["/usr/sbin/nginx", "-g", "daemon off;"]
        volumeMounts:
        - name: assets
          mountPath: /var/www/assets
        - name: nginx-config
          mountPath: /etc/nginx/conf.d/default.conf
          subPath: nginx.conf

Now the static files generated in the container are returned by nginx correctly. But let me remind you that this is a primitive solution, which means that it is far from ideal and has its own nuances and shortcomings, which are discussed below.

More advanced storage

Now let's imagine a situation when a user went to the site, loaded a page with the styles available in the container, and while he was reading this page, we re-deployed the container. The asset directory has become empty and a request to PHP is required to start generating new ones. However, even after doing this, references to the old static will be out of date, resulting in static display errors.

In addition, we most likely have a more or less loaded project, which means that one copy of the application will not be enough:

  • Let's scale Deployment up to two replicas.
  • At the first access to the site, assets were created in one replica.
  • At some point, ingress decided (for load balancing purposes) to send a request to the second replica, and these assets are not there yet. Or maybe they are no longer there, because we use RollingUpdate and at the moment we are doing a deployment.

In general, the result is again errors.

In order not to lose old assets, you can change emptyDir on hostPath, adding the statics physically to the cluster node. This approach is bad in that we actually have to bind to a specific cluster node your application, because - in the case of moving to other nodes - the directory will not contain the necessary files. Or some kind of background synchronization of the directory between nodes is required.

What are the solutions?

  1. If iron and resources allow, you can use cephfs to organize an equally accessible directory for the needs of statics. Official documentation recommends SSD drives, at least XNUMXx replication, and a stable thick connection between cluster nodes.
  2. A less demanding option would be to organize an NFS server. However, then you need to take into account the possible increase in the response time for processing requests by the web server, and fault tolerance will leave much to be desired. The consequences of a failure are catastrophic: the loss of a mount dooms the cluster to death under the onslaught of an LA load rushing into the sky.

Among other things, all persistent storage options will require background cleaning obsolete sets of files accumulated over a certain period of time. PHP containers can be preceded by DaemonSet from caching nginx, which will store copies of assets for a limited time. This behavior is easily configurable with proxy_cache with storage depth in days or gigabytes of disk space.

Combining this method with the distributed file systems mentioned above gives a huge field for fantasy, the only limit is the budget and technical potential of those who will implement and maintain it. From experience, we say that the simpler the system, the more stable it works. With the addition of such layers, it becomes much more difficult to maintain the infrastructure, and at the same time, the time spent on diagnostics and recovery from any failures also increases.

Recommendation

If the implementation of the proposed storage options also seems unjustified to you (difficult, expensive ...), then you should look at the situation from the other side. Namely, to dig into the architecture of the project and fix the problem in the code, by binding to some static data structure in the image, unambiguous definition of the content or procedure for β€œwarming up” and / or precompiling assets at the stage of building the image. This way we get absolutely predictable behavior and the same set of files for all environments and replicas of a running application.

If we return to a specific example with the Yii framework and do not delve into its structure (which is not the purpose of the article), it is enough to point out two popular approaches:

  1. Modify the image build process to place assets in a predictable location. So they offer / implement in extensions like yii2-static-assets.
  2. Define specific hashes for asset directories, as explained in e.g. this presentation (starting from slide #35). By the way, the author of the report ultimately (and not without reason!) advises, after assembling assets on the build server, to upload them to a central storage (like S3), before which put a CDN.

Downloads

Another case that is sure to fire when migrating an application to a Kubernetes cluster is storing user files in the file system. For example, we again have a PHP application that receives files through the upload form, does something with them in the process of work, and gives them back.

The place where these files should be placed in the realities of Kubernetes should be common to all replicas of the application. Depending on the complexity of the application and the need to organize the persistence of these files, such a place can be the shared devices mentioned above, but, as we can see, they have their drawbacks.

Recommendation

One of the solutions is using S3 compatible storage (even if it's some sort of self-hosted category like minio). Switching to S3 will require changes at the code level, and how the content will be returned on the frontend, we have already wrote.

User sessions

Separately, it is worth noting the organization of storing user sessions. Often these are also files on disk, which in the context of Kubernetes will lead to constant authorization requests from the user if his request falls into another container.

Part of the problem is solved by including stickySessions on ingress (the feature is supported in all popular ingress controllers - see details in our review)to bind the user to a specific application pod:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: nginx-test
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "route"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"

spec:
  rules:
  - host: stickyingress.example.com
    http:
      paths:
      - backend:
          serviceName: http-svc
          servicePort: 80
        path: /

But this will not get rid of problems with repeated deployments.

Recommendation

A better way would be to translate the application into storing sessions in memcached, Redis and similar solutions - in general, completely abandon the file options.

Conclusion

The infrastructure solutions discussed in the text are worthy of use only in the format of temporary β€œcrutches” (which sounds more beautiful in English as workaround). They may be relevant in the early stages of migrating an application to Kubernetes, but should not take root.

The general recommended way is to get rid of them in favor of architectural refinement of the application in accordance with the already well-known 12 Factor App. However, this - bringing the application to a stateless form - inevitably means that changes in the code will be required, and here it is important to find a balance between the capabilities / requirements of the business and the prospects for implementing and maintaining the chosen path.

PS

Read also on our blog:

Source: habr.com

Add a comment