Containers, microservices and service meshes

On the Internet pile articles ΠΎ service meshes (service mesh), and here's another one. Hooray! But why? Then, I want to state my opinion that service meshes would have been better 10 years ago, before the advent of container platforms such as Docker and Kubernetes. I'm not saying that my point of view is better or worse than others, but since service meshes are quite complex animals, multiple points of view will help to better understand them.

I will talk about the dotCloud platform, which was built on over a hundred microservices and supported thousands of applications in containers. I'll explain the challenges we encountered in developing and launching it, and how service meshes might (or might not) help.

dotcloud history

I have already written about the history of dotCloud and the choice of architecture for this platform, but have not talked much about the network layer. If you don't want to read last article about dotCloud, here's the gist of it: it's a PaaS platform-as-a-service that allows clients to run a wide range of applications (Java, PHP, Python...), with support for a wide range of data services (MongoDB, MySQL, Redis...) and a workflow like Heroku: you upload your code to the platform, it builds container images and deploys them.

I will tell you how traffic was sent to the dotCloud platform. Not because it was particularly cool (although the system worked well for its time!), but primarily because with the help of modern tools, such a design can easily be implemented in a short time by a modest team if they need a way to route traffic between a bunch of microservices or a bunch of applications. Thus, you can compare options: what happens if you develop everything yourself or use an existing service mesh. Standard choice: make your own or buy.

Traffic routing for hosted applications

Applications on dotCloud can expose HTTP and TCP endpoints.

HTTP endpoints dynamically added to the load balancer cluster configuration Hipache. This is similar to what resources are doing today income in Kubernetes and a load balancer like traefik.

Clients connect to HTTP endpoints through the appropriate domains, provided the domain name points to dotCloud load balancers. Nothing special.

TCP endpoints associated with a port number, which is then passed to all containers in that stack via environment variables.

Clients can connect to TCP endpoints using the appropriate hostname (something like gateway-X.dotcloud.com) and port number.

This hostname resolves to the "nats" server cluster (not related to NATS) that will route incoming TCP connections to the correct container (or, in the case of load-balanced services, to the correct containers).

If you are familiar with Kubernetes, this will probably remind you of services Node Port.

There was no services equivalent on the dotCloud platform Cluster IP: for simplicity, access to services happened the same way both from inside and outside the platform.

Everything was organized quite simply: the original implementations of HTTP and TCP routing networks were probably only a few hundred lines of Python. Simple (I would say, naive) algorithms that have been refined with the growth of the platform and the emergence of additional requirements.

Extensive refactoring of existing code was not required. In particular, 12 Factor Apps can directly use the address obtained through environment variables.

How is this different from a modern service mesh?

Limited visibility. We didn't have any metrics for the TCP routing mesh at all. When it comes to HTTP routing, more recent versions have detailed HTTP metrics with error codes and response times, but modern service meshes go even further, providing integration with metrics collection systems like Prometheus, for example.

Visibility is important not only from an operational point of view (to help troubleshoot problems), but also when new features are released. Talking about safe blue-green deployment ΠΈ deployment of canaries.

Routing efficiency is also limited. In the dotCloud routing mesh, all traffic had to pass through a cluster of dedicated routing nodes. This meant potentially crossing multiple AZ (availability zone) boundaries and a significant increase in latency. I remember troubleshooting code that made over a hundred SQL queries per page and opened a new connection to the SQL server for each query. When run locally, the page loads instantly, but on dotCloud it takes a few seconds to load because each TCP connection (and subsequent SQL query) takes tens of milliseconds. In this particular case, persistent connections solved the problem.

Modern service meshes are better at dealing with such problems. First of all, they check that the connections are routed at source. The logical flow is the same: ΠΊΠ»ΠΈΠ΅Π½Ρ‚ β†’ мСш β†’ сСрвис, but now the mesh works locally and not on remote nodes, so the connection ΠΊΠ»ΠΈΠ΅Π½Ρ‚ β†’ мСш is local and very fast (microseconds instead of milliseconds).

Modern service meshes also implement smarter load balancing algorithms. By monitoring the health of the backends, they can send more traffic to faster backends, resulting in better overall performance.

Security is also better. The dotCloud routing mesh ran entirely on EC2 Classic and did not encrypt the traffic (under the assumption that if someone managed to put a sniffer on EC2 network traffic, you are already in big trouble). Modern service meshes transparently protect all our traffic, for example, with mutual TLS authentication and subsequent encryption.

Traffic routing for platform services

Okay, we've discussed traffic between applications, but what about the dotCloud platform itself?

The platform itself consisted of about a hundred microservices responsible for various functions. Some were accepting requests from others, and some were background workers that connected to other services but didn't accept connections themselves. In either case, each service must know the endpoints of the addresses it needs to connect to.

Many high-level services may use the routing mesh described above. In fact, many of the more than XNUMX dotCloud microservices have been deployed as regular applications on the dotCloud platform itself. But a small number of low-level services (in particular, those that implement this routing mesh) needed something simpler, with fewer dependencies (because they could not depend on themselves to work - the good old chicken and egg problem).

These low-level, essential services were deployed by running containers directly on a few key nodes. At the same time, the standard platform services were not involved: linker, scheduler, and runner. If you want to compare with modern container platforms, it's like launching a control plane with docker run directly on the nodes, instead of delegating the task to Kubernetes. It's pretty similar in concept static modules (pods), which uses kubeadm or bootkube when booting a standalone cluster.

These services were exposed in a simple and crude way: a YAML file listed their names and addresses; and each client had to take a copy of this YAML file for deployment.

On the one hand, this is extremely reliable, because it does not require the support of an external key/value store, such as Zookeeper (remember, etcd or Consul did not exist at that time). On the other hand, it made it difficult to move services. Each time a move was made, all clients had to get an updated YAML file (and potentially reload). Not very comfortable!

Subsequently, we began to implement a new scheme, where each client connected to a local proxy server. Instead of the address and port, it only needs to know the port number of the service, and connect via localhost. The local proxy handles this connection and forwards it to the actual server. Now when moving the backend to another machine or scaling, instead of updating all clients, only all these local proxies need to be updated; and a reboot is no longer required.

(It was also planned to encapsulate traffic in TLS connections and install another proxy server on the receiving side, as well as check TLS certificates without the participation of the receiving service, which is configured to accept connections only on localhost. More on that later).

This is very similar to smartstack from Airbnb, but the significant difference is that SmartStack is implemented and deployed to production, while dotCloud's internal routing system was boxed up when dotCloud turned into Docker.

I personally consider SmartStack one of the predecessors of systems like Istio, Linkerd and Consul Connect because they all follow the same pattern:

  • Run a proxy on each node.
  • Clients connect to the proxy.
  • The control plane updates the proxy configuration when backends change.
  • … Profit!

Modern Service Mesh Implementation

If we need to implement a similar grid today, we can use similar principles. For example, set up an internal DNS zone by mapping service names to addresses in space 127.0.0.0/8. Then run HAProxy on each cluster node, accepting connections on each service address (on that subnet 127.0.0.0/8) and redirecting/balancing the load to the appropriate backends. HAProxy configuration can be managed confd, allowing you to store backend information in etcd or Consul and automatically push updated configuration to HAProxy when needed.

This is how Istio works! But with some differences:

  • Uses Envoy Proxy instead of HAProxy.
  • Saves backend configuration via Kubernetes API instead of etcd or Consul.
  • Services are allocated addresses on the internal subnet (Kubernetes ClusterIP addresses) instead of 127.0.0.0/8.
  • Has an additional component (Citadel) to add mutual TLS authentication between client and servers.
  • Supports new features such as circuit breaking, distributed tracing, canary deployment, etc.

Let's take a quick look at some of the differences.

Envoy Proxy

Envoy Proxy was written by Lyft [Uber's competitor in the taxi market - approx. per.]. It is similar in many ways to other proxies (e.g. HAProxy, Nginx, Traefik...), but Lyft wrote their own because they needed features that other proxies don't have and it seemed more sensible to make a new one rather than extend an existing one.

Envoy can be used by itself. If I have a specific service that needs to connect to other services, I can set it up to connect to Envoy and then dynamically configure and reconfigure Envoy with the location of other services, while getting a lot of great extras like visibility. Instead of a custom client library or injection into the call tracing code, we send traffic to Envoy, and it collects metrics for us.

But Envoy is also able to work as data plane (data plane) for the service mesh. This means that now for this service mesh, Envoy is configured control plane (control plane).

Control plane

In the control plane, Istio relies on the Kubernetes API. This is not very different from using confd, which relies on etcd or Consul to look up a set of keys in the datastore. Istio looks through a set of Kubernetes resources through the Kubernetes API.

Between this and then: I personally found this useful Description of the Kubernetes APIwhich reads:

The Kubernetes API Server is a "stupid server" that offers storage, versioning, validation, updating, and API resource semantics.

Istio is designed to work with Kubernetes; and if you want to use it outside of Kubernetes, then you need to start an instance of the Kubernetes API server (and the etcd helper service).

Service Addresses

Istio relies on the ClusterIP addresses that Kubernetes allocates, so Istio services get an internal address (not in the range 127.0.0.0/8).

Traffic to a ClusterIP address for a specific service in a Kubernetes cluster without Istio is intercepted by kube-proxy and sent to the proxy's backend. If you're interested in the technical details, kube-proxy sets up iptables rules (or IPVS load balancers, depending on how it's configured) to rewrite the destination IP addresses of connections going to the ClusterIP address.

Once Istio is installed on a Kubernetes cluster, nothing changes until it is explicitly enabled for a given consumer, or even the entire namespace, by introducing a container sidecar to custom pods. This container will start an Envoy instance and set up a set of iptables rules to intercept traffic going to other services and redirect that traffic to Envoy.

When integrated with Kubernetes DNS, this means that our code can connect by service name, and everything β€œjust works”. In other words, our code issues queries like http://api/v1/users/4242, then api resolve the request for 10.97.105.48, the iptables rules intercept connections from 10.97.105.48 and redirect them to the local Envoy proxy, which will forward the request to the actual API backend. Phew!

Additional frills

Istio also provides end-to-end encryption and authentication via mTLS (mutual TLS). The component called Citadel.

There is also a component Mixer, which Envoy can request for each request to make a special decision about that request depending on various factors such as headers, backend loading, etc... (don't worry: there are many tools to keep Mixer working, and even if it crashes, Envoy will continue to work as a proxy) .

And, of course, we mentioned visibility: Envoy collects a huge amount of metrics while providing distributed tracing. In a microservices architecture, if a single API request needs to go through microservices A, B, C, and D, then upon login, distributed tracing will add a unique identifier to the request and store this identifier through subrequests to all these microservices, allowing you to capture all related calls, their delays, etc.

Develop or buy

Istio has a reputation for being a complex system. In contrast, building the routing mesh that I described at the beginning of this post is relatively easy with existing tools. So, does it make sense to create your own service mesh instead?

If we have modest needs (we don’t need visibility, a circuit breaker and other subtleties), then thoughts come about developing our own tool. But if we're using Kubernetes, it might not even be necessary because Kubernetes already provides the basic tools for service discovery and load balancing.

But if we have advanced requirements, then "buying" a service mesh seems like a much better option. (This isn't always a "purchase" since Istio is open source, but we still need to invest engineering time to understand, deploy, and manage it.)

What to choose: Istio, Linkerd or Consul Connect?

So far we've only talked about Istio, but it's not the only service mesh. The popular alternative is linkerd, but there is more Consul Connect.

What to choose?

To be honest, I don't know. At the moment I do not consider myself competent enough to answer this question. There are a few interesting articles with a comparison of these tools and even benchmarks.

One promising approach is to use a tool like supergloo. It implements an abstraction layer to simplify and unify the APIs provided by service meshes. Instead of learning the specific (and, in my opinion, relatively complex) APIs of various service meshes, we can use the simpler SuperGloo constructs - and easily switch from one to another, as if we had an intermediate configuration format describing HTTP interfaces and backends capable of generating the actual configuration for Nginx, HAProxy, Traefik, Apache…

I played around with Istio and SuperGloo a little, and in the next article I want to show how to add Istio or Linkerd to an existing cluster using SuperGloo, and how the latter will do its job, that is, it allows you to switch from one service mesh to another without rewriting configurations.

Source: habr.com

Add a comment