Deploying Applications to Multiple Kubernetes Clusters with Helm

How Dailymotion Uses Kubernetes: Application Deployment

We at Dailymotion started using Kubernetes in production 3 years ago. But deploying applications across multiple clusters is still a pleasure, so over the past few years we have been trying to improve our tools and workflows.

How did it start

Here we will explain how we deploy our applications on multiple Kubernetes clusters around the world.

To deploy multiple Kubernetes objects at once, we use Helmet, and all of our charts are stored in the same git repository. To deploy a complete application stack from multiple services, we use what is known as a summary chart. In fact, this is a chart that declares dependencies and allows you to initialize the API and its services with one command.

We also wrote a small Python script on top of Helm to do checks, create charts, add secrets, and deploy applications. All of these tasks are performed on a central CI platform using a docker image.

Let's get to the point.

Note. As you read this, the first release candidate for Helm 3 has already been announced. The main release contains a whole host of improvements to address some of the issues we've encountered in the past.

Chart Development Workflow

For applications, we use branching, and we decided to apply the same approach to charts.

  • Branch giant used to create charts that will be tested on development clusters.
  • When a pull request is submitted to master, they are checked in staging.
  • Finally, we create a pull request to push the changes to the branch prod and apply them in production.

Each environment has its own private repository that stores our charts, and we use Chartmuseum with very useful APIs. In this way, we guarantee strict isolation between environments and check the charts in real conditions before using them in production.

Chart repositories in different environments

It's worth noting that when developers push a dev branch, their chart version is automatically pushed to the dev Chartmuseum. This way, all developers use the same dev repository, and you need to carefully specify your chart version so that you don't accidentally use someone else's changes.

Moreover, our little Python script checks Kubernetes objects against the Kubernetes OpenAPI specifications with Kubevalbefore publishing them to Chartmusem.

General description of the chart development workflow

  1. Setting up pipeline tasks according to the specification gazr.io for quality control (lint, unit-test).
  2. Submitting a docker image with Python tools that deploy our applications.
  3. Setting the environment by branch name.
  4. Validate Kubernetes yaml files with Kubeval.
  5. Automatic increase of the chart version and its parent charts (charts that depend on the chart being changed).
  6. Submitting a chart to Chartmuseum that matches its environment

Managing Cluster Differences

Federation of Clusters

There was a time when we used federation of Kubernetes clusters, where you could declare Kubernetes objects from a single API endpoint. But there were problems. For example, some Kubernetes objects could not be created on a federation endpoint, making it difficult to maintain federated objects and other objects for separate clusters.

To solve the problem, we began to manage clusters independently, which greatly simplified the process (we used the first version of federation; something could change in the second).

Geo-distributed platform

Now our platform is distributed across 6 regions - 3 locally and 3 in the cloud.


Distributed Deployment

Helm Global Values

The 4 global Helm values ​​allow you to differentiate between clusters. All of our charts have minimum default values.

global:
  cloud: True
  env: staging
  region: us-central1
  clusterName: staging-us-central1

Global values

These values ​​help define the context for our applications and are used for various purposes: monitoring, tracing, logging, making external calls, scaling, etc.

  • "cloud": We have a hybrid Kubernetes platform. For example, our API is deployed in GCP zones and in our data centers.
  • "env": some values ​​may change for non-working environments. For example, resource definitions and autoscale configurations.
  • "region": This information helps determine the location of the cluster and can be used to determine the closest endpoints for external services.
  • "clusterName": if and when we want to define a value for an individual cluster.

Here is a specific example:

{{/* Returns Horizontal Pod Autoscaler replicas for GraphQL*/}}
{{- define "graphql.hpaReplicas" -}}
{{- if eq .Values.global.env "prod" }}
{{- if eq .Values.global.region "europe-west1" }}
minReplicas: 40
{{- else }}
minReplicas: 150
{{- end }}
maxReplicas: 1400
{{- else }}
minReplicas: 4
maxReplicas: 20
{{- end }}
{{- end -}}

Helm template example

This logic is defined in a helper template so as not to pollute the Kubernetes YAML.

Application Declaration

Our deployment tools are based on several YAML files. Below is an example of how we declare a service and its scaling topology (number of replicas) in a cluster.

releases:
  - foo.world

foo.world:                # Release name
  services:               # List of dailymotion's apps/projects
    foobar:
      chart_name: foo-foobar
      repo: [email protected]:dailymotion/foobar
      contexts:
        prod-europe-west1:
          deployments:
            - name: foo-bar-baz
              replicas: 18
            - name: another-deployment
              replicas: 3

Service definition

This is a diagram of all the steps that define our deployment workflow. The last step deploys the application to multiple production clusters simultaneously.


Deployment steps in Jenkins

What about secrets?

In terms of security, we track all secrets from different locations and store them in a unique vault. Vault in Paris.

Our deployment tools pull the secret values ​​from the Vault and, when it's time to deploy, insert them into Helm.

To do this, we defined a mapping between the secrets in Vault and the secrets our applications need:

secrets:                                                                                                                                                                                                        
     - secret_id: "stack1-app1-password"                                                                                                                                                                                  
       contexts:                                                                                                                                                                                                   
         - name: "default"                                                                                                                                                                                         
           vaultPath: "/kv/dev/stack1/app1/test"                                                                                                                                                               
           vaultKey: "password"                                                                                                                                                                                    
         - name: "cluster1"                                                                                                                                                                           
           vaultPath: "/kv/dev/stack1/app1/test"                                                                                                                                                               
           vaultKey: "password"

  • We have defined general rules to follow when writing secrets to the Vault.
  • If the secret is to a specific context or cluster, you need to add a specific entry. (Here, the cluster1 context has its own value for the stack-app1-password secret).
  • Otherwise, the value is used by default.
  • For each item in this list, Kubernetes secret a key-value pair is inserted. Therefore, the secret template in our charts is very simple.

apiVersion: v1
data:
{{- range $key,$value := .Values.secrets }}
  {{ $key }}: {{ $value | b64enc | quote }}
{{ end }}
kind: Secret
metadata:
  name: "{{ .Chart.Name }}"
  labels:
    chartVersion: "{{ .Chart.Version }}"
    tillerVersion: "{{ .Capabilities.TillerVersion.SemVer }}"
type: Opaque

Challenges and limitations

Working with multiple repositories

Now we share the development of charts and applications. This means that developers have to work in two git repositories, one for the application and one for defining its deployment to Kubernetes. 2 git repositories means 2 workflows and it's easy for a beginner to get confused.

Managing summary charts is troublesome

As we said, summary charts are very handy for identifying dependencies and quickly deploying multiple applications. But we use --reuse-valuesto avoid passing all the values ​​each time we deploy an application that is part of this generic chart.

In the continuous delivery workflow, we only have two values ​​that change regularly: the number of replicas and the image tag (version). Other, more stable values ​​are changed manually, and this is quite difficult. What's more, one mistake in deploying a generalized chart can lead to serious failures, as we have seen from our own experience.

Update multiple configuration files

When a developer adds a new application, he has to change several files: the application declaration, the list of secrets, adding the application depending on if it is included in the general chart.

Jenkins permissions are too extended in Vault

Now we have one AppRole, which reads all secrets from Vault.

The rollback process is not automated

To rollback, you need to run the command on several clusters, and this is fraught with errors. We perform this operation manually to ensure that the correct version ID is specified.

We are moving towards GitOps

Our Goal

We want to check the chart back into the repository of the app it deploys.

The workflow will be the same as for development. For example, when a branch is pushed to master, the deployment will run automatically. The main difference between this approach and the current workflow would be that everything will be managed in git (the application itself and how it is deployed to Kubernetes).

There are several advantages:

  • Much clearer for the developer. It's easier to learn how to apply changes to a local chart.
  • You can specify a service deployment definition where is the code Service.
  • Managing the deletion of summary charts. The service will have its own release of Helm. This will allow you to manage the application life cycle (rollback, upgrade) at the smallest level so as not to affect other services.
  • Benefits of git to manage charts: undo changes, audit log, etc. If you need to undo a change to a chart, you can do it with git. Deployment starts automatically.
  • Consider improving your development workflow with tools such as Skaffold, with which developers can test changes in a production-like context.

Two-Step Migration

Our developers have been using this workflow for 2 years now, so we want the migration to be as painless as possible. Therefore, we decided to add an intermediate stage on the way to the goal.
The first step is simple:

  • We keep a similar structure for configuring app deployments, but in a single object named DailymotionRelease.

apiVersion: "v1"
kind: "DailymotionRelease"
metadata:
  name: "app1.ns1"
  environment: "dev"
  branch: "mybranch"
spec:
  slack_channel: "#admin"
  chart_name: "app1"
  scaling:
    - context: "dev-us-central1-0"
      replicas:
        - name: "hermes"
          count: 2
    - context: "dev-europe-west1-0"
      replicas:
        - name: "app1-deploy"
          count: 2
  secrets:
    - secret_id: "app1"
      contexts:
        - name: "default"
          vaultPath: "/kv/dev/ns1/app1/test"
          vaultKey: "password"
        - name: "dev-europe-west1-0"
          vaultPath: "/kv/dev/ns1/app1/test"
          vaultKey: "password"

  • 1 release per application (no generalized charts).
  • Charts in the application's git repository.

We've talked to all the developers, so the migration process has already begun. The first stage is still controlled using the CI platform. I will write another post soon about the second step: how we migrated to the GitOps workflow from Flux. I'll tell you how we set everything up and what difficulties we encountered (multiple repositories, secrets, etc.). Follow the news.

Here we have tried to describe our progress in the application deployment workflow over the past years, which led to thoughts on the GitOps approach. We have not yet reached the goal and will report on the results, but now we are convinced that we did the right thing when we decided to simplify everything and bring it closer to the habits of developers.

Source: habr.com

Add a comment