Deploy Applications with Docker Swarm

The online video content recommendation system we are working on is a closed commercial development and is technically a multi-component cluster of proprietary and open source components. The purpose of writing this article is to describe the implementation of the docker swarm clustering system for a staging site without disrupting the established workflow of our processes in a limited time. The narrative presented to your attention is divided into two parts. The first part describes CI / CD before using docker swarm, and the second describes the process of its implementation. Those who are not interested in reading the first part can safely move on to the second.

Part I

Back in the distant, distant year, it was necessary to set up the CI / CD process as quickly as possible. One of the conditions was not to use Docker for deployment developed components for several reasons:

  • for more reliable and stable operation of components in Production (that is, in fact, the requirement not to use virtualization)
  • leading developers didn't want to work with Docker (weird, but that's how it was)
  • according to the ideological considerations of the R&D management

Infrastructure, stack and approximate initial requirements for MVP were presented as follows:

  • 4 Intel® X5650 servers with Debian (one more powerful machine is fully developed)
  • Development of own custom components is carried out in C ++, Python3
  • Main 3rd party tools used: Kafka, Clickhouse, Airflow, Redis, Grafana, Postgresql, Mysql, …
  • Pipelines for building and testing components separately for debug and release

One of the first questions that needs to be addressed at the initial stage is how custom components will be deployed in any environment (CI / CD).

We decided to install third-party components systemically and update them systemically. Custom applications developed in C++ or Python can be deployed in several ways. Among them, for example: creating system packages, sending them to the repository of built images and then installing them on servers. For an unknown reason, another method was chosen, namely: using CI, application executable files are compiled, a virtual project environment is created, py modules are installed from requirements.txt, and all these artifacts are sent along with configs, scripts and the accompanying application environment to servers. Next, applications are launched as a virtual user without administrator rights.

Gitlab-CI was chosen as the CI/CD system. The resulting pipeline looked something like this:

Deploy Applications with Docker Swarm
Structurally, gitlab-ci.yml looked like this

---
variables:
  # минимальная версия ЦПУ на серверах, где разворачивается кластер
  CMAKE_CPUTYPE: "westmere"

  DEBIAN: "MYREGISTRY:5000/debian:latest"

before_script:
  - eval $(ssh-agent -s)
  - ssh-add <(echo "$SSH_PRIVATE_KEY")
  - mkdir -p ~/.ssh && echo -e "Host *ntStrictHostKeyChecking nonn" > ~/.ssh/config

stages:
  - build
  - testing
  - deploy

debug.debian:
  stage: build
  image: $DEBIAN
  script:
    - cd builds/release && ./build.sh
    paths:
      - bin/
      - builds/release/bin/
    when: always
release.debian:
  stage: build
  image: $DEBIAN
  script:
    - cd builds/release && ./build.sh
    paths:
      - bin/
      - builds/release/bin/
    when: always

## testing stage
tests.codestyle:
  stage: testing
  image: $DEBIAN
  dependencies:
    - release.debian
  script:
    - /bin/bash run_tests.sh -t codestyle -b "${CI_COMMIT_REF_NAME}_codestyle"
tests.debug.debian:
  stage: testing
  image: $DEBIAN
  dependencies:
    - debug.debian
  script:
    - /bin/bash run_tests.sh -e codestyle/test_pylint.py -b "${CI_COMMIT_REF_NAME}_debian_debug"
  artifacts:
    paths:
      - run_tests/username/
    when: always
    expire_in: 1 week
tests.release.debian:
  stage: testing
  image: $DEBIAN
  dependencies:
    - release.debian
  script:
    - /bin/bash run_tests.sh -e codestyle/test_pylint.py -b "${CI_COMMIT_REF_NAME}_debian_release"
  artifacts:
    paths:
      - run_tests/username/
    when: always
    expire_in: 1 week

## staging stage
deploy_staging:
  stage: deploy
  environment: staging
  image: $DEBIAN
  dependencies:
    - release.debian
  script:
    - cd scripts/deploy/ &&
        python3 createconfig.py -s $CI_ENVIRONMENT_NAME &&
        /bin/bash install_venv.sh -d -r ../../requirements.txt &&
        python3 prepare_init.d.py &&
        python3 deploy.py -s $CI_ENVIRONMENT_NAME
  when: manual

It is worth noting that the assembly and testing is carried out on its own image, where all the necessary system packages have already been installed and other settings have been made.

Although each of these scripts in jobs is interesting in its own way, but of course I won’t talk about them. The description of each of them will take a lot of time and this is not the purpose of the article. I will only draw your attention to the fact that the deployment stage consists of a sequence of calling scripts:

  1. createconfig.py - creates a settings.ini file with component settings in various environments for subsequent deployment (Preproduction, Production, Testing, ...)
  2. install_venv.sh - creates a virtual environment for py components in a specific directory and copies it to remote servers
  3. prepare_init.d.py — prepares start-stop scripts for the component based on the template
  4. deploy.py - decomposes and restarts new components

Time passed. The staging stage was replaced by preproduction and production. Added support for the product on one more distribution (CentOS). Added 5 more powerful physical servers and a dozen virtual ones. And it became more and more difficult for developers and testers to test their tasks in an environment more or less close to the working state. At this time, it became clear that it was impossible to do without him ...

Part II

Deploy Applications with Docker Swarm

So, our cluster is a spectacular system of a couple of dozen separate components that are not described by Dockerfiles. You can only configure it for deployment to a specific environment in general. Our task is to deploy the cluster into a staging environment to test it before pre-release testing.

Theoretically, there can be several clusters running simultaneously: as many as there are tasks in the completed state or close to completion. The capacities of the servers at our disposal allow us to run several clusters on each server. Each staging cluster must be isolated (there must be no intersection in ports, directories, etc.).

Our most valuable resource is our time, and we didn't have much of it.

For a faster start, we chose Docker Swarm due to its simplicity and architecture flexibility. The first thing we did was create a manager and several nodes on the remote servers:

$ docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
kilqc94pi2upzvabttikrfr5d     nop-test-1     Ready               Active                                  19.03.2
jilwe56pl2zvabupryuosdj78     nop-test-2     Ready               Active                                  19.03.2
j5a4yz1kr2xke6b1ohoqlnbq5 *   nop-test-3     Ready               Active              Leader              19.03.2

Next, create a network:


$ docker network create --driver overlay --subnet 10.10.10.0/24 nw_swarm

Next, we connected Gitlab-CI and Swarm nodes in terms of remote control of nodes from CI: installing certificates, setting secret variables, and setting up the Docker service on the control server. This one article saved us a lot of time.

Next, we added stack creation and destruction jobs to .gitlab-ci .yml.

A few more jobs have been added to .gitlab-ci .yml

## staging stage
deploy_staging:
  stage: testing
  before_script:
    - echo "override global 'before_script'"
  image: "REGISTRY:5000/docker:latest"
  environment: staging
  dependencies: []
  variables:
    DOCKER_CERT_PATH: "/certs"
    DOCKER_HOST: tcp://10.50.173.107:2376
    DOCKER_TLS_VERIFY: 1
    CI_BIN_DEPENDENCIES_JOB: "release.centos.7"
  script:
    - mkdir -p $DOCKER_CERT_PATH
    - echo "$TLSCACERT" > $DOCKER_CERT_PATH/ca.pem
    - echo "$TLSCERT" > $DOCKER_CERT_PATH/cert.pem
    - echo "$TLSKEY" > $DOCKER_CERT_PATH/key.pem
    - docker stack deploy -c docker-compose.yml ${CI_ENVIRONMENT_NAME}_${CI_COMMIT_REF_NAME} --with-registry-auth
    - rm -rf $DOCKER_CERT_PATH
  when: manual

## stop staging stage
stop_staging:
  stage: testing
  before_script:
    - echo "override global 'before_script'"
  image: "REGISTRY:5000/docker:latest"
  environment: staging
  dependencies: []
  variables:
    DOCKER_CERT_PATH: "/certs"
    DOCKER_HOST: tcp://10.50.173.107:2376
    DOCKER_TLS_VERIFY: 1
  script:
    - mkdir -p $DOCKER_CERT_PATH
    - echo "$TLSCACERT" > $DOCKER_CERT_PATH/ca.pem
    - echo "$TLSCERT" > $DOCKER_CERT_PATH/cert.pem
    - echo "$TLSKEY" > $DOCKER_CERT_PATH/key.pem
    - docker stack rm ${CI_ENVIRONMENT_NAME}_${CI_COMMIT_REF_NAME}
    # TODO: need check that stopped
  when: manual

From the above code snippet, you can see that two buttons (deploy_staging, stop_staging) have been added to Pipelines, requiring manual action.

Deploy Applications with Docker Swarm
The stack name matches the branch name and this uniqueness should be sufficient. Services in the stack receive unique ip addresses, and ports, directories, etc. will be isolated, but the same from stack to stack (because the configuration file is the same for all stacks) - what we wanted. We deploy the stack (cluster) using docker-compose.yml, which describes our cluster.

docker-compose.yml

---
version: '3'

services:
  userprop:
    image: redis:alpine
    deploy:
      replicas: 1
      placement:
        constraints: [node.id == kilqc94pi2upzvabttikrfr5d]
      restart_policy:
        condition: none
    networks:
      nw_swarm:
  celery_bcd:
    image: redis:alpine
    deploy:
      replicas: 1
      placement:
        constraints: [node.id == kilqc94pi2upzvabttikrfr5d]
      restart_policy:
        condition: none
    networks:
      nw_swarm:

  schedulerdb:
    image: mariadb:latest
    environment:
      MYSQL_ALLOW_EMPTY_PASSWORD: 'yes'
      MYSQL_DATABASE: schedulerdb
      MYSQL_USER: ****
      MYSQL_PASSWORD: ****
    command: ['--character-set-server=utf8mb4', '--collation-server=utf8mb4_unicode_ci', '--explicit_defaults_for_timestamp=1']
    deploy:
      replicas: 1
      placement:
        constraints: [node.id == kilqc94pi2upzvabttikrfr5d]
      restart_policy:
        condition: none
    networks:
      nw_swarm:

  celerydb:
    image: mariadb:latest
    environment:
      MYSQL_ALLOW_EMPTY_PASSWORD: 'yes'
      MYSQL_DATABASE: celerydb
      MYSQL_USER: ****
      MYSQL_PASSWORD: ****
    deploy:
      replicas: 1
      placement:
        constraints: [node.id == kilqc94pi2upzvabttikrfr5d]
      restart_policy:
        condition: none
    networks:
      nw_swarm:

  cluster:
    image: $CENTOS7
    environment:
      - CENTOS
      - CI_ENVIRONMENT_NAME
      - CI_API_V4_URL
      - CI_REPOSITORY_URL
      - CI_PROJECT_ID
      - CI_PROJECT_URL
      - CI_PROJECT_PATH
      - CI_PROJECT_NAME
      - CI_COMMIT_REF_NAME
      - CI_BIN_DEPENDENCIES_JOB
    command: >
      sudo -u myusername -H /bin/bash -c ". /etc/profile &&
        mkdir -p /storage1/$CI_COMMIT_REF_NAME/$CI_PROJECT_NAME &&
        cd /storage1/$CI_COMMIT_REF_NAME/$CI_PROJECT_NAME &&
            git clone -b $CI_COMMIT_REF_NAME $CI_REPOSITORY_URL . &&
            curl $CI_API_V4_URL/projects/$CI_PROJECT_ID/jobs/artifacts/$CI_COMMIT_REF_NAME/download?job=$CI_BIN_DEPENDENCIES_JOB -o artifacts.zip &&
            unzip artifacts.zip ;
        cd /storage1/$CI_COMMIT_REF_NAME/$CI_PROJECT_NAME/scripts/deploy/ &&
            python3 createconfig.py -s $CI_ENVIRONMENT_NAME &&
            /bin/bash install_venv.sh -d -r ../../requirements.txt &&
            python3 prepare_init.d.py &&
            python3 deploy.py -s $CI_ENVIRONMENT_NAME"
    deploy:
      replicas: 1
      placement:
        constraints: [node.id == kilqc94pi2upzvabttikrfr5d]
      restart_policy:
        condition: none
    tty: true
    stdin_open: true
    networks:
      nw_swarm:

networks:
  nw_swarm:
    external: true

Here you can see that the components are connected by one network (nw_swarm) and are available to each other.

System components (based on redis, mysql) are separated from the general pool of custom components (in plans and custom ones are divided as services). The deployment stage of our cluster looks like passing CMD into our one large configured image and, in general, practically does not differ from the deployment described in Part I. I will highlight the differences:

  • git clone... - get the files needed to deploy (createconfig.py, install_venv.sh, etc.)
  • curl... && unzip... - download and unzip build artifacts (compiled utilities)

There is only one yet undescribed problem: components that have a web interface are not accessible from developers' browsers. We solve this problem using reverse proxy, thus:

In .gitlab-ci.yml, after deploying the cluster stack, we add the line of deploying the balancer (which, when commits, only updates its configuration (creates new nginx configuration files according to the template: /etc/nginx/conf.d/${CI_COMMIT_REF_NAME}.conf) - see docker-compose-nginx.yml code)

    - docker stack deploy -c docker-compose-nginx.yml ${CI_ENVIRONMENT_NAME} --with-registry-auth

docker-compose-nginx.yml

---
version: '3'

services:
  nginx:
    image: nginx:latest
    environment:
      CI_COMMIT_REF_NAME: ${CI_COMMIT_REF_NAME}
      NGINX_CONFIG: |-
            server {
                listen 8080;
                server_name staging_${CI_COMMIT_REF_NAME}_cluster.dev;

                location / {
                    proxy_pass http://staging_${CI_COMMIT_REF_NAME}_cluster:8080;
                }
            }
            server {
                listen 5555;
                server_name staging_${CI_COMMIT_REF_NAME}_cluster.dev;

                location / {
                    proxy_pass http://staging_${CI_COMMIT_REF_NAME}_cluster:5555;
                }
            }
    volumes:
      - /tmp/staging/nginx:/etc/nginx/conf.d
    command:
      /bin/bash -c "echo -e "$$NGINX_CONFIG" > /etc/nginx/conf.d/${CI_COMMIT_REF_NAME}.conf;
        nginx -g "daemon off;";
        /etc/init.d/nginx reload"
    ports:
      - 8080:8080
      - 5555:5555
      - 3000:3000
      - 443:443
      - 80:80
    deploy:
      replicas: 1
      placement:
        constraints: [node.id == kilqc94pi2upzvabttikrfr5d]
      restart_policy:
        condition: none
    networks:
      nw_swarm:

networks:
  nw_swarm:
    external: true

On the development computers, update /etc/hosts; prescribe url to nginx:

10.50.173.106 staging_BRANCH-1831_cluster.dev

So, the deployment of isolated staging clusters has been implemented and developers can now run them in any number sufficient to check their tasks.

Future plans:

  • Separate our components as services
  • Have for each Dockerfile
  • Automatically detect less loaded nodes in the stack
  • Specify nodes by name pattern (rather than using id as in the article)
  • Add a check that the stack is destroyed
  • ...

Special thanks for Article.

Source: habr.com

Add a comment