Production-ready images for k8s

This story is about how we use containers in a production environment, especially under Kubernetes. The article is devoted to the collection of metrics and logs from containers, as well as the build of images.

Production-ready images for k8s

We are from the fintech company Exness, which develops online trading services and fintech products for B2B and B2C. There are many different teams in our R&D, 100+ employees in the development department.

We represent the team that is responsible for the platform for collecting and running code by our developers. In particular, we are responsible for collecting, storing and providing metrics, logs, and events from applications. We currently operate approximately 50 Docker containers in production, maintain our XNUMXTB big data storage, and provide architectural solutions that are built around our infrastructure: Kubernetes, Rancher, and various public cloud providers. 

Our motivation

What's on fire? Nobody can answer. Where is the hearth? It's hard to understand. When did it ignite? You can find out, but not immediately. 

Production-ready images for k8s

Why do some containers stand while others fall? Which container was the culprit? After all, outside the containers are the same, but inside everyone has their own Neo.

Production-ready images for k8s

Our developers are smart guys. They make good services that bring profit to the company. But there are fakups when containers with applications go rogue. One container consumes too much CPU, the other - the network, the third - I / O operations, the fourth is generally unclear what it does with sockets. All this falls, and the ship sinks. 

Agents

To understand what is happening inside, we decided to put agents directly into containers.

Production-ready images for k8s

These agents are containment programs that keep containers in such a state that they do not break each other. Agents are standardized, and this allows you to standardize the approach to serving containers. 

In our case, agents must provide logs in a standard format, tagged and throttled. They should also provide us with standardized metrics that are extensible in terms of business applications.

Agents also mean utilities for operation and maintenance that can work in different orchestration systems that support different images (Debian, Alpine, Centos, etc.).

Finally, agents must support simple CI/CD including Dockerfiles. Otherwise, the ship will fall apart, because the containers will be delivered along the "crooked" rails.

Build process and target image device

In order for everything to be standardized and manageable, some standard build process must be followed. Therefore, we decided to collect containers by containers - such is the recursion.

Production-ready images for k8s

Here containers are represented by solid outlines. At the same time, we decided to put distributions in them so that “life does not seem like a raspberry”. Why this was done, we will describe below.
 
The result is a build tool, a version-specific container that references specific distribution versions and script versions.

How do we apply it? We have a Docker Hub containing a container. We mirror it inside our system to get rid of external dependencies. The result is a container marked in yellow. We create a template to install all the distributions and scripts we need into the container. After that, we assemble a ready-to-use image: developers put code and some of their special dependencies into it. 

Why is this approach good? 

  • First, full version control of build tools - build container, script and distribution versions. 
  • Secondly, we have achieved standardization: we create templates in the same way, intermediate and ready-to-use images. 
  • Thirdly, containers provide us with portability. Today we use Gitlab, and tomorrow we will switch to TeamCity or Jenkins and we will be able to run our containers in the same way. 
  • Fourth, dependency minimization. It is no coincidence that we put distributions in the container, because this allows us not to download them every time from the Internet. 
  • Fifth, the assembly speed has increased - the presence of local copies of images allows you not to waste time downloading, since there is a local image. 

In other words, we have achieved a controlled and flexible assembly process. We use the same tools to build any fully versioned containers. 

How our assembly procedure works

Production-ready images for k8s

The assembly is launched with one command, the process is executed in the image (highlighted in red). The developer has a Docker file (highlighted in yellow), we render it, replacing the variables with values. And along the way we add headers and footers - these are our agents. 

Header adds distributions from the corresponding images. And the footer installs our services inside, configures the launch of the workload, logging and other agents, replaces the entrypoint, etc. 

Production-ready images for k8s

We thought for a long time whether to install a supervisor. In the end, we decided that we needed it. Chose S6. The supervisor provides container management: it allows you to connect to it if the main process crashes and provides manual management of the container without recreating it. Logs and metrics are processes that run inside a container. They also need to be controlled somehow, and we do this with the help of a supervisor. Finally, S6 takes care of housekeeping, signal processing and other tasks.

Since we use different orchestration systems, after assembly and launch, the container must understand what environment it is in and act according to the situation. For example:
This allows us to build one image and run it in different orchestration systems, and it will be launched taking into account the specifics of this orchestration system.

 Production-ready images for k8s

For the same container, we get different process trees in Docker and Kubernetes:

Production-ready images for k8s

The payload is executed under the S6 supervisor. Pay attention to collector and events - these are our agents responsible for logs and metrics. Kubernetes doesn't have them, but Docker does. Why? 

If you look at the specification of the “pod” (hereinafter - Kubernetes pod), then we will see that the events container is executed in a pod, which has a separate collector container that performs the function of collecting metrics and logs. We can use the features of Kubernetes: running containers in one pod, in a single process and / or network space. In fact, implement their agents and perform some functions. And if the same container is launched in Docker, it will get all the same features as the output, that is, it will be able to deliver logs and metrics, since the agents will be launched inside. 

Metrics and logs

Delivering metrics and logs is a difficult task. There are several aspects to its decision.
The infrastructure is created for the execution of the payload, and not for the mass delivery of logs. That is, this process should run with minimal container resource requirements. We're committed to helping our developers: "Get a Docker Hub container, run it, and we can deliver the logs." 

The second aspect is limiting the volume of logs. If a situation of a surge in the volume of logs occurs in several containers (the application outputs a stack-trace in a cycle), the load on the CPU, communication channels, the log processing system increases, and this affects the operation of the host as a whole and other containers on the host, then sometimes this leads to "fall" of the host. 

The third aspect is that you need to support as many metrics collection methods as possible out of the box. From reading files and polling the Prometheus-endpoint to using specific application protocols.

And the last aspect - it is necessary to minimize the consumption of resources.

We chose an open-source Go solution called Telegraf. This is a universal connector that supports more than 140 types of input channels (input plugins) and 30 types of output channels (output plugins). We have finalized it and now we will tell you how we use it using Kubernetes as an example. 

Production-ready images for k8s

Let's say a developer deploys a load and Kubernetes receives a request to create a pod. At this point, a container called Collector is automatically created for each Pod (we use a mutation webhook). Collector is our agent. At the start, this container sets itself up to work with Prometheus and the logging system.

  • To do this, it uses Pod annotations, and depending on its contents, creates, say, an end-point Prometheus; 
  • Based on the pod specification and specific container settings, decides how to deliver the logs.

We collect logs through the Docker API: developers just need to put them in stdout or stderr, and then Collector will figure it out. Logs are collected in chunks with some delay to prevent possible host overload. 

Metrics are collected by workload instances (processes) in containers. Everything is tagged with tags: namespace, pod, and so on, and then converted to the Prometheus format - and becomes available for collection (except for logs). Also, we send logs, metrics and events to Kafka and further:

  • Logs are available in Graylog (for visual analysis);
  • Logs, metrics, events are sent to Clickhouse for long-term storage.

Everything works exactly the same in AWS, only we replace Graylog from Kafka with Cloudwatch. We send logs there, and everything turns out very conveniently: it’s immediately clear to which cluster and container they belong. The same is true for Google Stackdriver. That is, our scheme works both on-premise with Kafka and in the cloud. 

If we do not have Kubernetes with pods, the scheme is a little more complicated, but works according to the same principles.

Production-ready images for k8s

Inside the container, the same processes are executed, they are orchestrated using S6. All the same processes are running inside the same container.

Eventually

We have created a complete solution for building and running images in operation, with options for collecting and delivering logs and metrics:

  • We developed a standardized approach to building images, based on it we developed CI templates;
  • Data collection agents are our Telegraf extensions. We tested them well in production;
  • We use mutation webhook to inject containers with agents in pods; 
  • Integrated into the Kubernetes/Rancher ecosystem;
  • We can execute the same containers in different orchestration systems and get the result we expect;
  • Created a fully dynamic container management configuration. 

Co-author: Ilya Prudnikov

Source: habr.com

Add a comment