Envoy. 1. Introduction

Greetings! This is a small article that answers the questions: "what is envoy?", "why is it needed?" and "where to start?".

What is it

Envoy is a L4-L7 balancer written in C++, focused on high performance and availability. On the one hand, it is in some way an analogue of nginx and haproxy, commensurate with them in terms of performance. On the other hand, it is more focused on microservice architecture and has the same functionality as java and go balancers, such as zuul or traefik.

Haproxy / nginx / envoy comparison table, it does not claim to be absolute truth, but gives a general picture.


haproxy
sent
traefik

stars on github
11.2k/mirror
1.1k/mirror
12.4 k
27.6 k

written in
C
C
C++
go

API
no
socket only/push
dataplane/pull
pull

active healthcheck
no
Yes
Yes
Yes

open tracing
external plugin
no
Yes
Yes

JWT
external plugin
no
Yes
no

Expansion
Lua/C
Lua/C
Lua/C++
no

Why

This is a young project, there is a lot missing in it, something in the early alpha. But sent, including due to its youth, is rapidly developing and already has many interesting features: dynamic configuration, many ready-made filters, a simple interface for writing your own filters.
This gives rise to areas of application, but first, 2 anti-patterns:

  • Return of static.

The point is that currently sent no caching support. The google guys are trying this to fix. The idea to be implemented once in sent all the subtleties (a zoo of headers) of RFC compliance, and for specific implementations to make an interface. But while it's not even alpha, the architecture is under discussion, PR open (while I was writing the PR article they were stuck, but this item is still relevant).

In the meantime, use nginx for statics.

  • static configuration.

You can use it, but sent was not created for this. Capabilities in a static configuration will not be revealed. Lots of moments:

When editing the configuration in yaml, you will make mistakes, curse developers for verbosity and think that nginx / haproxy configs are less structured, but more concise. This is the point. The Nginx and Haproxy configuration was created for manual editing, and sent for generation from code. All configuration is described in protobuf, generating it from proto files is much more difficult to make a mistake.

The scripts canary, b/g deployment and much more are normally implemented only in a dynamic configuration. I'm not saying it can't be done statically, we all do it. But for this you need to put on crutches, in any of the balancers, in sent including.

Tasks in which Envoy is indispensable:

  • Balancing traffic in complex and dynamic systems. The service mesh gets here, but it's not necessarily the only one.
  • The need for distributed tracing functionality, complex authorization, or another that is in sent out of the box or conveniently implemented, but in nginx / haproxy you need to overlay lua and dubious plugins.

Both, if necessary, to ensure high performance.

How it works

Envoy is distributed in binaries only as a docker image. The image already has an example of a static configuration. But we are only interested in understanding the structure.

envoy.yaml static configuration

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  host_rewrite: www.google.com
                  cluster: service_google
          http_filters:
          - name: envoy.router
  clusters:
  - name: service_google
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    # Comment out the following line to test on v6 networks
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_google
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: www.google.com
                port_value: 443
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext
        sni: www.google.com

Dynamic Configuration

What problem are we looking for to solve? You can’t just take and reload the balancer configuration under load, there will be β€œsmall” problems:

  • Configuration validation.

The config can be large, it can be very large, if we overload it all at once, the chances that there is an error somewhere increase.

  • Long-term connections.

When initializing a new listener, you need to take care of connections running on the old one, if changes occur frequently and there are long-lived connections, a compromise will have to be found. Hey kubernetes ingress on nginx.

  • Active helchecks.

If we have active healthchecks, we should double-check them all on the new config before sending traffic. If there are many upstreams, it takes time. Hey haproxy.

How is this resolved in sentBy loading the config dynamically, according to the model pool, you can divide it into separate parts and not reinitialize the part that has not changed. For example, a listener that is expensive to reinitialize and rarely changes.

Configuration sent (from the file above) has the following entities:

  • listeners - listener hanging on a specific ip / port
  • virtual host - virtual host by domain name
  • road. - balancing rule
  • cluster β€” a group of upstreams with balancing parameters
  • endpoint β€” upstream instance address

Each of these entities plus some others can be filled dynamically, for this, the configuration specifies the address of the service from where the config will be received. The service can be REST or gRPC, it is preferable to use gRPC.

The services are named respectively: LDS, VHDS, RDS, CDS and EDS. It is possible to combine static and dynamic configuration, with the limitation that a dynamic resource cannot be specified in a static one.

For most tasks, it is enough to implement the last three services, they are called ADS (Aggregated Discovery Service), for Java and go has a ready-made implementation of gRPC dataplane in which you just need to fill in objects from your source.

The configuration takes the following form:

envoy.yaml dynamic configuration

dynamic_resources:
  ads_config:
    api_type: GRPC
    grpc_services:
      envoy_grpc:
        cluster_name: xds_clr
  cds_config:
    ads: {}
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          stat_prefix: ingress_http
          rds:
            route_config_name: local_route
            config_source:
              ads: {}
          http_filters:
          - name: envoy.router
  clusters:
  - name: xds_clr
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: xds_clr
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: xds
                port_value: 6565

When you run sent with this config, it will connect to the control-plane and try to query the RDS, CDS and EDS configuration. How the interaction process is described here.

In short, sent sends a request, indicating the type of the requested resource, the version and parameters of the node. In response, it receives a resource and version, if the version on the control-plane has not changed, it does not respond.
There are 4 interaction options:

  • One gRPC stream for all types of resources, the full state of the resource is sent.
  • Separate streams, full condition.
  • One stream, incremental state.
  • Separate streams, incremental state.

Incremental xDS allows you to reduce traffic between the control-plane and sent, this is true for large configurations. But it complicates the interaction, the request transmits a list of resources for unsubscribing and subscribing.

In our example, ADS is used - one stream for RDS, CDS, EDS and non-incremental mode. To enable incremental mode, you need to specify api_type: DELTA_GRPC

Since the request contains node parameters, we can send different resources to the control-plane for different instances sent, this is convenient for building a service mesh.

warm up

On the sent at startup or when a new configuration is received from the control-plane, the resource warmup process is started. It is divided into listener warmup and cluster warmup. The first one is triggered by changes in RDS/LDS, the second one by CDS/EDS. This means that if only upstreams change, the listener is not recreated.

While warming up, dependent resources from the control-plane are expected for a timeout. If the timeout has expired, the initialization will not succeed, the new listener will not start listening on the port.
Initialization order: EDS, CDS, active health check, RDS, LDS. With active healthchecks enabled, traffic will go upstream only after one successful healthcheck.

If the listener was recreated, the old one goes into the DRAIN state, and will be deleted after all connections are closed or the timeout expires --drain-time-s, default 10 minutes.

To be continued.

Source: habr.com

Add a comment