Hey Habr! I present to your attention the translation of the article Author Matt Klein.

This time, the description of both components of the service mesh, data plane and control plane “wanted and translated”. This description seemed to me the most understandable and interesting, and most importantly, leading to an understanding of “Is it necessary at all?”.
As the idea of a "Service mesh" has become more popular over the last two years (Original article Oct 10, 2017) and the number of contributors in the space has grown, I've seen a commensurate increase in confusion throughout the tech community as to how to compare and contrast different solutions.
The situation is best described by the following series of tweets I wrote in July:
Service mesh confusion #1: Linkerd ~= Nginx ~= Haproxy ~= Envoy. None of them is equal to Istio. Istio is something completely different. 1 /
The former are simply data planes. On their own, they do nothing. They must be tuned in to something more. 2 /
Istio is an example of a control plane that ties parts together. This is another layer. /end
Previous tweets mention several different projects (Linkerd, NGINX, HAProxy, Envoy and Istio), but more importantly, they introduce the general concepts of the data plane (data plane), service mesh (service mesh) and control plane (control plane). In this post, I will take a step back and explain what I mean by the terms "data plane" and "control plane" at a very high level, and then explain how the terms apply to the projects mentioned in tweets.
What is a service mesh, really?

Figure 1: Service mesh overview
Figure 1 illustrates the concept of a service mesh at its most basic level. There are four service clusters (AD). Each service instance is associated with a local proxy server. All network traffic (HTTP, REST, gRPC, Redis, etc.) from a single application instance is routed through a local proxy to the appropriate external service clusters. Thus, the application instance does not know about the network as a whole and only knows about its local proxy. In fact, the distributed system network has been removed from the service.
Data plane
In a service mesh, a proxy server located locally to the application performs the following tasks:
- Service discovery. What services/services/applications are available for your application?
- Health checking. Are the service instances returned by service discovery healthy and ready to accept network traffic? This can include both active (eg response/healthcheck) and passive (eg using 3 consecutive 5xx errors as an indication of an unhealthy service state) health checks.
- Routing. When a request for "/foo" is received from a REST service, which service cluster should the request be sent to?
- Load balancing. Once a service cluster has been selected during routing, to which service instance should the request be sent? With what timeout? What are the circuit breaking settings? If the request fails, should it be retried?
- Authentication and authorization. For incoming requests, can the calling service be cryptographically authenticated/authorized using mTLS or some other mechanism? If it is authenticated/authorized, is it allowed to invoke the requested operation (endpoint) on the service, or should an unauthenticated response be returned?
- Observability. Detailed statistics, logs/logs, and distributed tracing should be generated for each request so operators can understand distributed traffic flow and debug issues as they occur.
For all previous points in the service network (service mesh), the data plane (data plane) is responsible. In fact, the proxy local to the service (sidecar) is the data plane. In other words, the data plane is responsible for conditional translation, forwarding, and monitoring of each network packet that is sent to or from the service.
The control plane
The network abstraction that a local proxy provides in the data plane is magical (?). However, how does the proxy actually know about the "/foo" route to service B? How can service discovery data that is populated by proxy requests be used? How are the settings for load balancing, timeout, circuit breaking, etc.? How is an application deployed using the blue/green method or the gradual traffic transfer method? Who configures system-wide authentication and authorization settings?
All of the above are under the control of the control plane of the service mesh. The control plane receives a set of isolated proxiesservers stateless and turns them into a distributed system.
I think the reason many technologists find it confusing to separate the data plane from the control plane is that for most people the data plane is familiar, while the control plane is foreign/incomprehensible. We have been working with physical network routers and switches for a long time. We understand that packets/requests must go from point A to point B, and that we can use hardware and software to do this. The new generation of software proxies are just fancy versions of the tools we've been using for a long time.

Figure 2: Human control plane
However, we have been using the control plane for a long time, although most network operators may not associate this part of the system with any technological component. The reason is simple:
Most of the control planes in use today are…we.
For 2 drawing shows what I call the "Human control plane". In this type of deployment, which is still very common, the human operator, probably grumpy, creates static configurations - potentially via scripts - and deploys them through some special process to all proxies. The proxies then start using this configuration and proceed to process the data plane using the updated settings.

Figure 3: Advanced service mesh control plane
For 3 drawing shows the "extended" control plane (control plane) service network (service mesh). It consists of the following parts:
- The human: There is still a person (hopefully less angry) who makes high-level decisions regarding the entire system as a whole.
- Control plane UI: A person interacts with some type of user interface to control the system. This could be a web portal, a command line application (CLI), or some other interface. Through the user interface, the operator has access to such global system configuration parameters as:
- Deployment management, blue/green (blue/green) and / or with gradual transfer of traffic
- Authentication and Authorization Options
- Routing table specifications, for example, when application A requests information about "/foo", what happens
- Load balancer settings such as timeouts, retries, circuit breaking options, etc.
- Workload scheduler: Services are run in the infrastructure through some type of scheduling/orchestration system, such as Kubernetes or Nomad. The scheduler is responsible for loading the service along with its local proxy.
- Service discovery. As the scheduler starts and stops service instances, it reports the health status to the service discovery system.
- Sidecar proxy configuration APIs : Local proxies dynamically retrieve state from various system components in an eventually consistent fashion without operator intervention. The entire system, consisting of all currently running service instances and local proxies, eventually converges into one ecosystem. The data plane API in Envoy is one example of how this works in practice.
Essentially, the purpose of the control plane is to establish a policy that will eventually be adopted by the data plane. Better control planes will remove more details from some systems from the operator and require less manual control, provided they work correctly!..
Data plane and control plane. Summary (Data plane vs. control plane summary)
- Service mesh data plane: affects every package/request in the system. Responsible for application/service discovery, health checks, routing, load balancing, authentication/authorization, and observability.
- Service mesh control plane: Provides policy and configuration for all running data planes within the service network. Doesn't touch any packages/requests on the system. The control plane turns all data planes into a distributed system.
Current project landscape
With the above explanation out of the way, let's take a look at the current state of the service mesh project.
- Data planes: Linkerd, NGINX, HAProxy, Envoy, Traefik
- Control planes: Istio, Nelson, SmartStack
Instead of doing an in-depth analysis of each of the above solutions, I'll briefly touch on some of the points that I think are causing most of the confusion in the ecosystem right now.
In early 2016, Linkerd was one of the first data plane proxies for the service mesh and did a fantastic job of raising awareness and focus on the service mesh design model. About 6 months after that, Envoy joined Linkerd (although he has been with Lyft since late 2015). Linkerd and Envoy are the two projects most often mentioned when discussing service networks (service mesh).
Istio was announced in May 2017. The goals of the Istio project are very similar to the extended control plane shown in 3 drawing. Envoy for Istio is the default proxy. So Istio is the control plane and Envoy is the data plane. In a short time, Istio caused a lot of excitement, and other data planes began integrating as a replacement for Envoy (both Linkerd and NGINX have demonstrated integration with Istio). The fact that different data planes can be used in the same control plane means that the control plane and the data plane are not necessarily closely related. An API such as the generic data plane API Envoy can form a bridge between the two parts of the system.
Nelson and SmartStack help further illustrate the separation of the control plane and the data plane. Nelson uses Envoy as its proxy and builds a robust service mesh control plane based on the HashiCorp stack, i.e. Nomad, etc. SmartStack was perhaps the first of a new wave of service networks (service mesh). SmartStack forms a control plane around HAProxy or NGINX, demonstrating the ability to decouple the control plane from the service mesh and the data plane.
Microservice architecture with a service mesh (service mesh) is attracting more and more attention (right!), And more projects and vendors are starting to work in this direction. Over the next few years, we will see a lot of innovation in both the data plane and the control plane, as well as further mixing of different components. Ultimately, the microservice architecture should become more transparent and magical (?) for the operator.
Hopefully less and less annoyed.
Key takeaways
- The service mesh consists of two different parts: the data plane and the control plane. Both components are required and the system will not work without them.
- Everyone is familiar with the control plane, and for now, the control plane can be you!
- All data planes (data plane) compete with each other in terms of features, performance, configurability and extensibility.
- All control planes compete for features, configurability, extensibility, and usability.
- A single control plane can contain the correct abstractions and APIs so that multiple data planes can be used.
Source: habr.com
