Performance Orchestra

It would hardly be wrong to say that the best of people
find joy through suffering.
Ludwig van Beethoven

Performance Orchestra

I'm Sergey, I work at Yandex.Money in the performance research team. I want to tell you the beginning of the story about our path to using orchestration - how we chose instruments and what we took into account. All the events from the article take place in real time, so you, dear readers, are following the development of the situation almost live.

Why do we need a conductor in the team?

Who is a conductor? From fr. diriger - to manage, direct, manage - in the world of music - this is a person who is the leader of learning and performing ensemble music. In our case, this place is occupied by orchestration and automation systems.

Their role is no different from the role of a conductor in music - they are needed to help the team, direct and organize its performance.

As a rule, the team has a certain set of capacities - let's call them servers, on which they implement their projects.

The approach to obtaining and operating these servers is varied. A few examples:

  • The team makes a request, for example to the operation group, to provide them with resources with certain parameters.
  • The operations team provides them with the necessary amount - cloud or bare metal ("bare metal") - and undertake to maintain them in good condition according to the SLA. Tuning is also done by the operations team.
  • The team receives only cloud or bare metal resources from the operation group, it makes the settings on its own.
  • The team itself "purchases" the resources and maintains/configures them completely independently.

Our team uses servers that need to be supported - updating the OS, installing new packages, etc.

For ourselves, we have identified them in two main types:

  • tank group,
  • service group.

The tank group consists of hosts with Yandex.Tank.

The service group includes everything related to maintenance - these are various services to provide support for the release cycle, generate automatic reports, etc.

At one point, all this became inconvenient to manage manually, and we thought about automating the entire process, starting from “loading” servers and ending with the development, release and launch of our internal service.

Why is a conductor needed, even if the orchestra itself can play?

To begin with, we mastered Ansible and began to “pour” our bare metal servers in order to be less dependent on system administrators - everyone wins here, we gain new skills and save administrators from part of the work that they always have enough without us. We strive to develop beyond our specialty and team autonomy as much as possible.

In the company, work with Ansible has been configured and regulated for a long time, so we easily integrated our solution into this process.

Currently, host pouring consists of three Ansible roles:

  • the first role installs the OS,
  • the second runs through the basic settings for the host, LDAP authorization, for example,
  • and the third installs Yandex.Tank and related dependencies in a docker container.

Let's move on to the services that we use within the team.

For our tasks, we equally use Kotlin and Python, and a little more Golang. To unify the development and deployment of our services, we decided to package them in docker containers. This gives you the freedom to choose a programming language and at the same time regulates a single delivery format for your application.

A small remark about ipv6 in Docker

Some of the services we interact with are only available via ipv6, so we had to figure out how to make ipv6 for containers.

According to the ipv6 documentation on the official Docker website, ipv6 is enabled by adding parameters to daemon.json:

{
  "ipv6": true,
  "fixed-cidr-v6": "2001:db8:1::/64"
}

In this case, the provider must issue an ipv6 subnet, which you will register in fixed-cidr-v6.
However, we chose another option - ipv6 NAT, and here's why:

  • docker now can not use only with ipv6.
  • Having a globally routable address in each container means that all ports (even unpublished ones) are exposed to everyone unless additional filtering is performed.
  • userland proxy for publishing ports, iptables for ipv4 only.

ipv6 NAT is docker container, which itself manages the rules in ip6tables and edits them when a new container is added.

For this solution to work correctly, it was necessary to do a number of manipulations. Be sure to initialize ip6table_nat on the system. Having a module installed on the system does not guarantee that the module will be loaded into the kernel at startup. We ran into this when we got this error when starting a container with NAT on a fresh host:

2019/01/22 14:59:54 running [/sbin/ip6tables -t filter -N DOCKER --wait]: exit status 3: modprobe: can't change directory to '/lib/modules': No such file or directory
ip6tables v1.6.2: can't initialize ip6tables table `filter': Table does not exist (do you need to insmod?)

The problem was solved after adding initialization to the Ansible role using the modprobe module and loading at OS startup using lineinfile:

- name: Add ip6table_nat module
 modprobe:
   name: ip6table_nat
   state: present
- name: Add ip6table_nat to boot
 lineinfile:
   path: /etc/modules
   line: 'ip6table_nat'

By the way, there is a good one on Habré article, which briefly and clearly describes the advantages and disadvantages of one or another method for working ipv6 in docker.

But back to our question from the beginning:
Why is a conductor needed, even if the orchestra itself can play?

Now everyone imagines how to play in our team:

  • the process of "pouring" servers has been created,
  • development and deployment of services are unified.

A reasonable question arises - how to deploy, update, and control our services in docker containers efficiently and as automatically as possible?

Despite the fact that each member of the orchestra knows his part, he can go astray, deviate from the original idea. Here we come to the conclusion that without a conductor our orchestra will not effectively rehearse and play smoothly. The conductor is responsible for all performance parameters, for ensuring that everything is united by a single tempo and mood.

How to get a good conductor with minimal investment?

The topic of orchestration is quite well developed in the market. But first, let's talk about auxiliary tools that can help the conductor.

Consul - a system that provides two main functions:

  • service discovery,
  • distributed key-value storage.

In our orchestra, Consul will be responsible for registering services and storing their configurations. There are two registration options:

  • Active is when the service registers itself using the HTTP API;
  • Passive - the service must be registered manually.

Vault is a repository that standardizes and unifies the secure storage and handling of secrets - passwords, certificates.
Here are the benefits we will get by using this tool:

  • A single center for creating and storing secrets, managing their life cycle through the HTTP API.
  • Transit Secrets Engine - encryption and decryption of data without saving it. The ability to transmit data in encrypted form over unsecured communication channels.
  • Access policies that are easy to configure.
  • Audit access to secrets.
  • The ability to create your own CA (Certificate Authority) to manage self-signed certificates within your infrastructure.

Given all our requirements, two options were suitable for the role of the conductor - Kubernetes and Nomad.

Kubernetes

How many articles and books have already been written about him (here a, for example), reports are told that I will write briefly - this is a universal combine that can do almost everything. Paying for this is not always easy to set up and maintain a cluster on Kubernetes.

Nomad

Tool from HashiCorp, the company known for the consul and vault mentioned above.

Nomad seemed to us quite simple to install and configure than Kubernetes. One binary works in both server and client mode. At the same time, Nomad covers the entire list of tasks that we want it to solve: cluster management, fast scheduler, multidatacenter support. Plus, when using consul and vault, we get tighter integration for orchestrating our services.

What's in the works now:

  • prepared servers for Consul deployment,
  • the nomad cluster configuration will be entered into Consul, with the help of which nomad should be deployed automatically,
  • in parallel, we will install vault to store secrets.

The question to the audience is whether it is worth getting a conductor for such tasks, or is the orchestra doing well without him? Tell us in the comments what you think about this.

Subscribe to our blog and stay in touch - soon we will tell you what happened in the end, and whether we set up the nomad cluster as we wanted.

Come into our cozy telegram chat, where you can always ask for advice, help colleagues and just chat about productivity research and more.

Source: habr.com