Operators for Kubernetes: how to run stateful applications

The problem of stateful applications in Kubernetes

Configuration, launch and further scaling of applications and services are simple when it comes to cases classified as stateless, i.e. without saving data. It is convenient to run such services in Kubernetes using its standard APIs, because everything happens out of the box: according to standard configurations, without involving any specifics and magic.

Simply put, to run five more copies of the PHP/Ruby/Python backend in a cluster from containers, you only need to raise a new server 5 times and copy the sources. Since both the source code and the init script are in the image, scaling a stateless application becomes quite elementary. As lovers of containers and microservice architectures are well aware, the difficulties begin for stateful applications, i.e. with persistence of data such as databases and caches (MySQL, PostgreSQL, Redis, ElasticSearch, Cassandra…). This applies to both software that independently implements a quorum cluster (for example, Percona XtraDB and Cassandra), and software that requires separate management utilities (such as Redis, MySQL, PostgreSQL ...).

Difficulties arise because the source code and the launch of the service are not enough - you need to perform some more actions. At a minimum, copy the data and / or join the cluster. To be more precise, these services require understanding how to properly scale, update and reconfigure them without data loss and temporary unavailability. Accounting for these needs is called "operational knowledge" (operational knowledge).

CoreOS Operators

In order to "program" operational knowledge, at the end of last year the CoreOS project presented "A new class of software" for the Kubernetes platform - Operators (Operators, from the English "operation", i.e. "operation").

Operators, using and expanding the basic capabilities of Kubernetes (incl. StatefulSets, see below for differences) allow DevOps to add operational knowledge to application code.

Purpose of the Operator - provide the user with an API that allows you to manage many entities of a stateful application in a Kubernetes cluster without thinking about what is under the hood (what data and what to do with it, what commands still need to be executed to maintain the cluster). In fact, the Operator is designed to simplify the work with the application within the cluster as much as possible, automating the performance of operational tasks that previously had to be solved manually.

How Operators work

ReplicaSets Kubernetes allows you to specify the desired number of running pods, and controllers ensure that their number is maintained (by creating and deleting pods). Similarly, the Operator works by adding a set of operational knowledge to the standard Kubernetes resource and controller, allowing you to perform additional actions to maintain the required number of application entities.

How does this differ from StatefulSetsfor applications that require the cluster to provide them with stateful resources such as data storage or static IPs? For such applications, Operators can use StatefulSets (Instead of ReplicaSets) as a basis, offering additional automation: perform the necessary actions in case of crashes, make backups, update the configuration, etc.

So, how does it all work? The operator is a control daemon that:

  1. subscribes to the event API in Kubernetes;
  2. receives from it data about the system (about its ReplicaSets, Pods, Services etc.);
  3. receives information about Third Party Resources (see examples below);
  4. reacts to the appearance/change Third Party Resources (for example, to resize, change the version, and so on);
  5. responds to a change in the state of the system (about its ReplicaSets, Pods, Services etc.);
  6. the most important thing:
    1. accesses the Kubernetes API to create everything you need (again, your own ReplicaSets, Pods, Services...),
    2. performs some magic (you can, for simplicity, think that the Operator goes into the pods themselves and calls commands, for example, to join the cluster or to upgrade the data format when updating the version).

Operators for Kubernetes: how to run stateful applications
In fact, as you can see from the picture, a separate application is simply added to Kubernetes (the usual Deployment с Replica Set), which is called the Operator. It lives in an ordinary pod (usually a single one) and, as a rule, is responsible only for its own namespace. This operator application implements its API - however, not directly, but through Third Party Resources in Kubernetes.

Thus, after we have created namespace Operator, we can add to it Third Party Resources.

Example for etcd (see details below):

apiVersion: etcd.coreos.com/v1beta1
kind: Cluster
metadata:
  name: example-etcd-cluster
spec:
  size: 3
  version: 3.1.0

Example for Elasticsearch:

apiVersion: enterprises.upmc.com/v1
kind: ElasticsearchCluster
metadata:
  name: example-es-cluster
spec:
  client-node-replicas: 3
  master-node-replicas: 2
  data-node-replicas: 3
  zones:
  - us-east-1c
  - us-east-1d
  - us-east-1e
  data-volume-size: 10Gi
  java-options: "-Xms1024m -Xmx1024m"
  snapshot:
    scheduler-enabled: true
    bucket-name: elasticsnapshots99
    cron-schedule: "@every 2m"
  storage:
    type: gp2
    storage-class-provisioner: kubernetes.io/aws-ebs

Requirements for Operators

CoreOS formulated the main patterns obtained by engineers while working on Operators. Despite the fact that all Operators are individual (they are created for a specific application with their own characteristics and needs), their creation must be based on a kind of framework that imposes the following requirements:

  1. Installation must be done through a single Deployment: kubectl create -f SOME_OPERATOR_URL/deployment.yaml - and do not require additional action.
  2. Installing an Operator in Kubernetes should create a new foreign type (ThirdPartyResource). To launch application instances (cluster instances) and further manage them (update versions, resize, etc.), the user will use this type.
  3. Whenever possible, Kubernetes built-in primitives should be used, such as Services ΠΈ ReplicaSetsto use well-tested and understandable code.
  4. Backward compatibility of Operators and support for older versions of user-created resources is required.
  5. When the Operator is removed, the application itself should continue to function without changes.
  6. Users should be able to determine the desired application version and orchestrate application version updates. Lack of software updates is a frequent source of operational and security problems, and Operators must assist users in this matter.
  7. Operators should be tested by a tool like Chaos Monkey, which detects potential failures in pods, configurations, and the network.

etcd Operator

Operator Implementation Example - etcd Operator, prepared to the day of the announcement of this concept. Etcd cluster configuration can be complex due to the need to maintain quorum, the need to reconfigure cluster membership, create backups, and so on. For example, manually scaling an etcd cluster means creating a DNS name for a new cluster member, starting a new etcd entity, notifying the cluster about the new member (etcdctl member add). In the case of the Operator, it will be enough for the user to change the size of the cluster - everything else will happen automatically.

And since etcd was also created in CoreOS, it was quite logical to see the appearance of its Operator first. How does he work? etcd Operator Logic defined by three components:

  1. Observation. The operator monitors the state of the cluster using the Kubernetes API.
  2. Analysis (Analyze). Finds differences between the current status and the desired one (defined by user configuration).
  3. Action (Act). Eliminates detected differences using the etcd and/or Kubernetes service APIs.

Operators for Kubernetes: how to run stateful applications

To implement this logic, the Operator has prepared functions Create/Destroy (create and remove cluster members etcd) and Resize (change in the number of cluster members). Checking the correctness of its performance was checked using a utility created in the likeness of Chaos Monkey from Netflix, i.e. killing etcd pods randomly.

For the full operation of etcd, the Operator provides additional features: Backup (automatic and imperceptible for users to create backups - in the config it is enough to determine how often they should be done and how much to store - and subsequent data recovery from them) and Upgrade (updating etcd installations without downtime).

What does it look like to work with an Operator?

$ kubectl create -f https://coreos.com/operators/etcd/latest/deployment.yaml
$ kubectl create -f https://coreos.com/operators/etcd/latest/example-etcd-cluster.yaml
$ kubectl get pods
NAME                             READY     STATUS    RESTARTS   AGE
etcd-cluster-0000                1/1       Running   0          23s
etcd-cluster-0001                1/1       Running   0          16s
etcd-cluster-0002                1/1       Running   0          8s
etcd-cluster-backup-tool-rhygq   1/1       Running   0          18s

The current status of etcd Operator is a beta version that requires Kubernetes 1.5.3+ and etcd 3.0+ to run. Source code and documentation (including instructions for use) are available at GitHub.

Another implementation example from CoreOS has been created - Prometheus Operator, but it is still in alpha (not all planned features are implemented).

Status and prospects

It has been 5 months since the announcement of the Kubernetes Operators. Only two implementations are still available in the official CoreOS repository (for etcd and Prometheus). Both have not yet reached their stable versions, but they are being committed on a daily basis.

Developers expect β€œa future where users install Postgres Operators, Cassandra Operators or Redis Operators in their Kubernetes clusters and work with the scalable entities of these applications as easily as deploying replicas of stateless applications for the web today.” First Third Party Operators really started showing up.

At the largest European free software conference FOSDEM, which took place in February 2017 in Brussels, Josh Wood from CoreOS announced Operators in report (video is available at the link!) which should contribute to the growth of popularity of this concept in the wider Open Source community.

PS Thanks for your interest in the article! Subscribe to our hub, so as not to miss new materials and recipes on DevOps and GNU / Linux system administration - we will publish them regularly!

Source: habr.com

Add a comment