Istio and Kubernetes in production. Part 2. Tracing

In the past article we reviewed the basic components of Service Mesh Istio, got acquainted with the system and answered the main questions that usually arise when starting to work with Istio. In this part, we will look at how to organize the collection of tracing information over the network.

Istio and Kubernetes in production. Part 2. Tracing

The first thing that comes to mind for many developers and system administrators when they hear the words Service Mesh is tracing. Indeed, we add a special proxy server to each network node, through which all TCP traffic passes. It seems that it is now possible to easily send information about all network interactions on the network. Unfortunately, in reality there are many nuances that must be taken into account. Let's take a look at them.

Misconception #XNUMX: We can get online hiking data for free

In fact, relatively free of charge, we can only get the nodes of our system connected by arrows and the data rate that passes between services (in fact, only the number of bytes per unit of time). However, in most cases, our services communicate over some application layer protocol such as HTTP, gRPC, Redis, and so on. And, of course, we want to see tracing information for these protocols, we want to see the request rate, not the data rate. We want to understand the latency of requests according to our protocol. Finally, we want to see the full path that the request takes from logging into our system to receiving a response from the user. This task is no longer so easy to solve.

First, let's look at what sending tracing spans looks like from an architectural point of view in Istio. As we remember from the first part, Istio has a separate component for collecting telemetry called Mixer. However, in the current version 1.0.*, sending is done directly from proxy servers, namely, from envoy proxy. Envoy proxy supports sending tracing spans via zipkin protocol out of the box. Other protocols can be connected, but only through the plugin. With Istio, we immediately get an assembled and configured envoy proxy, in which only the zipkin protocol is supported. If we want to use, for example, the Jaeger protocol and send tracing spans over UDP, then we will need to build our own istio-proxy image. There is support for custom plugins for istio-proxy, but it's still in alpha. Therefore, if we want to do without a large number of custom settings, the range of technologies used for storing and receiving tracing spans is reduced. Of the main systems, in fact, now you can use Zipkin itself, or Jaeger, but send everything there using a zipkin compatible protocol (which is much less efficient). The zipkin protocol itself involves sending all tracing information to collectors via the HTTP protocol, which is quite expensive.

As I said, we want to trace application layer protocols. And this means that the proxy servers that stand next to each service must understand what kind of interaction is happening now. By default, Istio sets all ports to plain TCP type, which means no traces will be sent. In order for traces to be sent, you must first enable this option in the main mesh config and, which is very important, name all ports of the service kubernetes entities in accordance with the protocol that is used in the service. That is, for example, like this:

apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - port: 80
    targetPort: 80
    name: http
  selector:
    app: nginx

You can also use compound names like http-magic (Istio will see http and recognize this port as http endpoint). The format is: proto-extra.

In order not to patch a huge number of configurations for defining the protocol, you can use a dirty workaround: patch the Pilot component at the moment when it is just executes protocol definition logic. As a result, of course, it will be necessary to change this logic to the standard one and switch to the naming convention for all ports.

In order to understand whether the protocol is really defined correctly, you need to go to any of the sidecar containers with envoy proxy and make a request to the admin port of the envoy interface from location /config_dump. In the resulting configuration, you need to look at the desired service field operation. It is used in Istio as an identifier of where the request is going. In order to customize the value of this parameter in Istio (we will later see it in our tracing system), it is necessary to specify the serviceCluster flag at the stage of starting the sidecar container. For example, it can be calculated like this from variables obtained from the kubernetes downward downward API:

--serviceCluster ${POD_NAMESPACE}.$(echo ${POD_NAME} | sed -e 's/-[a-z0-9]*-[a-z0-9]*$//g')

A good example to understand how tracing works in envoy is here.

The endpoint itself for sending tracing spans must also be specified in the envoy proxy launch flags, for example: --zipkinAddress tracing-collector.tracing:9411

Misconception number two: we can inexpensively get full traces of the passage of requests through the system out of the box

Unfortunately, it is not. The complexity of implementation depends on how you have already implemented the interaction of services. Why is that?

The fact is that in order for istio-proxy to be able to understand the correspondence of incoming requests to the service with those leaving the same service, it is not enough just to intercept all traffic. You need to have some kind of connection identifier. The HTTP envoy proxy uses special headers, by which envoy understands which particular request to the service generates specific requests to other services. List of such headers:

  • x-request-id
  • x-b3-traceid,
  • x-b3-spanid,
  • x-b3-parentspanid,
  • x-b3 sampled,
  • x-b3-flags,
  • x-ot-span-context.

If you have a single point, for example, a base client, in which you can add such logic, then everything is fine, you just need to wait for this library to be updated for all clients. But if you have a very heterogeneous system and there is no unification in the campaign from services to services over the network, then this is likely to be a big problem. Without adding such logic, all tracing information will only be β€œsingle-level”. That is, we will receive all interservice interactions, but they will not be glued into single chains of passage through the network.

Conclusion

Istio provides a convenient tool for collecting tracing information over the network, but you need to understand that for implementation you will need to adapt your system and take into account the specifics of the Istio implementation. As a result, two main points need to be resolved: defining the application layer protocol (which must be supported by envoy proxy) and setting up the forwarding of information about the connection of requests to the service from requests from the service (using headers, in the case of the HTTP protocol). When these issues are resolved, we have a powerful tool that allows us to transparently collect information from the network, even in very heterogeneous systems written in many different languages ​​and frameworks.

In the next article about Service Mesh, we will look at one of the biggest problems of Istio - the high consumption of RAM by each sidecar proxy container and discuss how to deal with it.

Source: habr.com

Add a comment