Dynamic build and deployment of Docker images with werf on the example of a versioned documentation site

We have already talked about our GitOps tool more than once yard, and this time we would like to share the experience of building a site with the documentation of the project itself - werf.io (its Russian version is en.werf.io). This is a normal static site, but its build is interesting because it is built using a dynamic number of artifacts.

Dynamic build and deployment of Docker images with werf on the example of a versioned documentation site

Go into the nuances of the site structure: generating a common menu for all versions, pages with information about releases, etc. - we will not. Instead, let's focus on the issues and specifics of dynamic assembly and a bit on the related CI/CD processes.

Introduction: how the site works

Let's start with the fact that werf documentation is stored along with its code. This imposes certain development requirements that are generally beyond the scope of this article, but at a minimum it can be said that:

  • New werf features should not be released without updating the documentation and, conversely, any changes in the documentation imply a new version of werf;
  • The project has a fairly intensive development: new versions can be released several times a day;
  • Any manual operations to deploy a site with a new version of the documentation are at least tedious;
  • The project adopted the approach of semantic versioning, with 5 channels of stability. The release process involves the sequential passage of versions through the channels in order of increasing stability: from alpha to rock-solid;
  • The site has a Russian-language version that "lives and develops" (i.e., the content of which is updated) in parallel with the main (i.e., English-language) version.

To hide all this "inner kitchen" from the user, offering him something that "just works", we did separate werf installation and update tool - Is multiwerf. It is enough to specify the release number and the stability channel you are ready to use, and multiwerf will check if there is a new version on the channel and download it if necessary.

The latest versions of werf in each channel are available in the version selection menu on the site. By default, at werf.io/documentation the version of the most stable channel for the latest release is opened - it is also indexed by search engines. Documentation for the channel is available at separate addresses (for example, werf.io/v1.0-beta/documentation for beta release 1.0).

In total, the site has the following versions available:

  1. root (opens by default),
  2. for each active update channel of each release (for example, werf.io/v1.0-beta).

To generate a specific version of the site, in the general case, it is enough to compile it using Jekyllby running in the directory /docs werf repository with the corresponding command (jekyll build), after switching to the Git tag of the required version.

It only remains to add that:

  • the utility itself (werf) is used for assembly;
  • CI/CD processes are built on top of GitLab CI;
  • and all this, of course, works in Kubernetes.

Tasks

Now we formulate tasks that take into account all the described specifics:

  1. After changing werf version on any update channel documentation on the site should be automatically updated.
  2. For development, you need to be able sometimes view site previews.

Recompilation of the site must be performed after changing the version on any channel from the corresponding Git tags, but in the process of building the image, we will get the following features:

  • Since the list of versions on the channels changes, it is only necessary to rebuild the documentation for the channels where the version has changed. After all, rebuilding everything anew is not very beautiful.
  • The set of channels for releases may change. At some point in time, for example, there may not be a version on the channels more stable than the early-access 1.1 release, but over time they will appear - in this case, do not change the assembly by hand?

It turns out that assembly depends on changing external data.

implementation

Choice of approach

Alternatively, you can run each required version as a separate pod in Kubernetes. This option implies a larger number of objects in the cluster, which will grow with an increase in the number of stable releases of werf. And this, in turn, implies more complex maintenance: each version has its own HTTP server, and with a small load. Of course, this also entails higher resource costs.

We went along the way assembly of all necessary versions in one image. The compiled statics of all versions of the site is in a container with NGINX, and traffic to the corresponding Deployment comes through NGINX Ingress. A simple structure - a stateless application - makes it easy to scale Deployment (depending on the load) using Kubernetes itself.

To be more precise, we are building two images: one is for the production loop, the second is an additional one for the dev loop. The additional image is used (launched) only on the dev loop together with the main one and contains the version of the site from the review commit, and routing between them is performed using Ingress resources.

werf vs git clone and artifacts

As already mentioned, in order to generate site statics for a specific version of the documentation, you need to build by switching to the appropriate repository tag. You could also do this by cloning the repository every time you build, choosing the appropriate tags from the list. However, this is a rather resource-intensive operation and, moreover, requires writing non-trivial instructions ... Another serious disadvantage is that with this approach, there is no way to cache something during assembly.

Here the werf utility itself comes to the rescue, implementing smart caching and allowing you to use external repositories. Using werf to add code from the repository will greatly speed up the build, because werf essentially clones the repository once and then does only fetch if necessary. In addition, when adding data from the repository, we can select only the necessary directories (in our case, this is the directory docs), which will significantly reduce the amount of added data.

Since Jekyll is a tool for compiling statics and is not needed in the final image, it would be logical to compile to werf artifact, and into the final image import only compilation result.

We write werf.yaml

So, we have decided that we will compile each version in a separate werf artifact. However, we we do not know how many of these artifacts will be during assembly, so we can't write a fixed build configuration (strictly speaking, we can, but it won't be very efficient).

werf allows you to use Go patterns in your config file (werf.yaml), which makes it possible generate config on the fly depending on external data (what you need!). External data in our case is information about versions and releases, on the basis of which we collect the required number of artifacts and as a result we get two images: werf-doc ΠΈ werf-dev to run on different circuits.

External data is passed through environment variables. Here is their composition:

  • RELEASES - a string with a list of releases and the corresponding current version of werf, as a list separated by a space of values ​​in the format <ΠΠžΠœΠ•Π _Π Π•Π›Π˜Π—Π>%<ΠΠžΠœΠ•Π _Π’Π•Π Π‘Π˜Π˜>. Example: 1.0%v1.0.4-beta.20
  • CHANNELS - a line with a list of channels and the corresponding current version of werf, in the form of a list separated by a space of values ​​in the format <ΠšΠΠΠΠ›>%<ΠΠžΠœΠ•Π _Π’Π•Π Π‘Π˜Π˜>. Example: 1.0-beta%v1.0.4-beta.20 1.0-alpha%v1.0.5-alpha.22
  • ROOT_VERSION - version of the werf release to be displayed by default on the site (it is not always necessary to display documentation by the highest release number). Example: v1.0.4-beta.20
  • REVIEW_SHA β€” hash of the review commit, from which you need to build a version for the test loop.

These variables will be filled in the GitLab CI pipeline, and how exactly is written below.

First of all, for convenience, we define in werf.yaml Go template variables by assigning them values ​​from environment variables:

{{ $_ := set . "WerfVersions" (cat (env "CHANNELS") (env "RELEASES") | splitList " ") }}
{{ $Root := . }}
{{ $_ := set . "WerfRootVersion" (env "ROOT_VERSION") }}
{{ $_ := set . "WerfReviewCommit" (env "REVIEW_SHA") }}

The description of the artifact for compiling the static version of the site is generally the same for all the cases we need (including the generation of the root version, as well as the version for the dev circuit). Therefore, we will take it out into a separate block using the function define - for later reuse include. We will pass the following arguments to the template:

  • Version β€” generated version (tag name);
  • Channel β€” the name of the update channel for which the artifact is being generated;
  • Commit β€” commit hash, if the artifact is generated for a review commit;
  • context.

Artifact template description

{{- define "doc_artifact" -}}
{{- $Root := index . "Root" -}}
artifact: doc-{{ .Channel }}
from: jekyll/builder:3
mount:
- from: build_dir
  to: /usr/local/bundle
ansible:
  install:
  - shell: |
      export PATH=/usr/jekyll/bin/:$PATH
  - name: "Install Dependencies"
    shell: bundle install
    args:
      executable: /bin/bash
      chdir: /app/docs
  beforeSetup:
{{- if .Commit }}
  - shell: echo "Review SHA - {{ .Commit }}."
{{- end }}
{{- if eq .Channel "root" }}
  - name: "releases.yml HASH: {{ $Root.Files.Get "releases.yml" | sha256sum }}"
    copy:
      content: |
{{ $Root.Files.Get "releases.yml" | indent 8 }}
      dest:  /app/docs/_data/releases.yml
{{- else }}
  - file:
      path: /app/docs/_data/releases.yml
      state: touch
{{- end }}
  - file:
      path: "{{`{{ item }}`}}"
      state: directory
      mode: 0777
    with_items:
    - /app/main_site/
    - /app/ru_site/
  - file:
      dest: /app/docs/pages_ru/cli
      state: link
      src: /app/docs/pages/cli
  - shell: |
      echo -e "werfVersion: {{ .Version }}nwerfChannel: {{ .Channel }}" > /tmp/_config_additional.yml
      export PATH=/usr/jekyll/bin/:$PATH
{{- if and (ne .Version "review") (ne .Channel "root") }}
{{- $_ := set . "BaseURL" ( printf "v%s" .Channel ) }}
{{- else if ne .Channel "root" }}
{{- $_ := set . "BaseURL" .Channel }}
{{- end }}
      jekyll build -s /app/docs  -d /app/_main_site/{{ if .BaseURL }} --baseurl /{{ .BaseURL }}{{ end }} --config /app/docs/_config.yml,/tmp/_config_additional.yml
      jekyll build -s /app/docs  -d /app/_ru_site/{{ if .BaseURL }} --baseurl /{{ .BaseURL }}{{ end }} --config /app/docs/_config.yml,/app/docs/_config_ru.yml,/tmp/_config_additional.yml
    args:
      executable: /bin/bash
      chdir: /app/docs
git:
- url: https://github.com/flant/werf.git
  to: /app/
  owner: jekyll
  group: jekyll
{{- if .Commit }}
  commit: {{ .Commit }}
{{- else }}
  tag: {{ .Version }}
{{- end }}
  stageDependencies:
    install: ['docs/Gemfile','docs/Gemfile.lock']
    beforeSetup: '**/*'
  includePaths: 'docs'
  excludePaths: '**/*.sh'
{{- end }}

The artifact name must be unique. We can achieve this, for example, by adding the name of the channel (the value of the variable .Channel) as an artifact name suffix: artifact: doc-{{ .Channel }}. But you need to understand that when importing from artifacts, you will need to refer to the same names.

When describing an artifact, this werf feature is used, such as mounting. Mounting with service directory build_dir allows you to save the Jekyll cache between pipeline runs, which significantly speeds up reassembly.

Also, you may have noticed the use of the file releases.yml is a YAML file with release data requested from github.com (artifact obtained by executing the pipeline). It is needed when compiling the site, but in the context of the article, it is interesting to us because it depends on its state rebuilding only one artifact - an artifact of the site of the root version (it is not needed in other artifacts).

This is implemented using the conditional operator if Go patterns and designs {{ $Root.Files.Get "releases.yml" | sha256sum }} in stage stage. It works as follows: when building an artifact for the root version (variable .Channel is root) file hash releases.yml affects the signature of the entire stage, since it is part of the name of the Ansible job (parameter name). Thus, when changing content File releases.yml the corresponding artifact will be rebuilt.

Pay attention also to working with an external repository. In the image of an artifact from werf repository, only the directory is added /docs, and depending on the passed parameters, the data of the immediately necessary tag or review commit is added.

To use the artifact template to generate an artifact description of the submitted versions of channels and releases, we organize a loop over the variable .WerfVersions Π² werf.yaml:

{{ range .WerfVersions -}}
{{ $VersionsDict := splitn "%" 2 . -}}
{{ dict "Version" $VersionsDict._1 "Channel" $VersionsDict._0 "Root" $Root | include "doc_artifact" }}
---
{{ end -}}

Because the loop will generate several artifacts (we hope so), you need to take into account the separator between them - the sequence --- (For details on configuration file syntax, see documentation). As defined earlier, when calling the template in a loop, we pass the version parameters, the URL, and the root context.

Similarly, but without a cycle, we call the artifact template for β€œspecial cases”: for the root version, as well as the version from the review commit:

{{ dict "Version" .WerfRootVersion "Channel" "root" "Root" $Root  | include "doc_artifact" }}
---
{{- if .WerfReviewCommit }}
{{ dict "Version" "review" "Channel" "review" "Commit" .WerfReviewCommit "Root" $Root  | include "doc_artifact" }}
{{- end }}

Note that the artifact for the review commit will only be built if the variable is set .WerfReviewCommit.

Artifacts are ready - it's time to start importing!

The final image intended to run on Kubernetes is a regular NGINX that has a server configuration file added to it. nginx.conf and static from artifacts. In addition to the artifact of the root version of the site, we need to repeat the loop over the variable .WerfVersions to import channel and release version artifacts + follow the artifact naming rule we adopted earlier. Since each artifact stores versions of the site for two languages, we import them to the places provided by the configuration.

Description of the final werf-doc image

image: werf-doc
from: nginx:stable-alpine
ansible:
  setup:
  - name: "Setup /etc/nginx/nginx.conf"
    copy:
      content: |
{{ .Files.Get ".werf/nginx.conf" | indent 8 }}
      dest: /etc/nginx/nginx.conf
  - file:
      path: "{{`{{ item }}`}}"
      state: directory
      mode: 0777
    with_items:
    - /app/main_site/assets
    - /app/ru_site/assets
import:
- artifact: doc-root
  add: /app/_main_site
  to: /app/main_site
  before: setup
- artifact: doc-root
  add: /app/_ru_site
  to: /app/ru_site
  before: setup
{{ range .WerfVersions -}}
{{ $VersionsDict := splitn "%" 2 . -}}
{{ $Channel := $VersionsDict._0 -}}
{{ $Version := $VersionsDict._1 -}}
- artifact: doc-{{ $Channel }}
  add: /app/_main_site
  to: /app/main_site/v{{ $Channel }}
  before: setup
{{ end -}}
{{ range .WerfVersions -}}
{{ $VersionsDict := splitn "%" 2 . -}}
{{ $Channel := $VersionsDict._0 -}}
{{ $Version := $VersionsDict._1 -}}
- artifact: doc-{{ $Channel }}
  add: /app/_ru_site
  to: /app/ru_site/v{{ $Channel }}
  before: setup
{{ end -}}

The additional image, which runs along with the main one on the dev circuit, contains only two versions of the site: the version from the review commit and the root version of the site (there are common assets and, if you remember, data on releases). Thus, the additional image will differ from the main one only in the import section (and, of course, in the name):

image: werf-dev
...
import:
- artifact: doc-root
  add: /app/_main_site
  to: /app/main_site
  before: setup
- artifact: doc-root
  add: /app/_ru_site
  to: /app/ru_site
  before: setup
{{- if .WerfReviewCommit  }}
- artifact: doc-review
  add: /app/_main_site
  to: /app/main_site/review
  before: setup
- artifact: doc-review
  add: /app/_ru_site
  to: /app/ru_site/review
  before: setup
{{- end }}

As noted above, the artifact for the review commit will be generated only when the environment variable is set to run REVIEW_SHA. It would be possible not to generate the werf-dev image at all if there is no environment variable REVIEW_SHA, but in order to cleaning by policy Docker images in werf worked for the werf-dev image, we will leave it to be built only with the root version artifact (it is already built anyway), to simplify the pipeline structure.

Assembly is ready! Let's move on to CI / CD and important nuances.

Pipeline in GitLab CI and features of dynamic assembly

When running the build, we need to set the environment variables used in werf.yaml. This does not apply to the REVIEW_SHA variable, which we will set when calling the pipeline from the GitHub hook.

We will transfer the formation of the necessary external data to the Bash script generate_artifacts, which will generate two GitLab pipeline artifacts:

  • file releases.yml with release data
  • file common_envs.shAn containing the environment variables to export.

File contents generate_artifacts you will find in our repositories with examples. Getting the data itself is not the subject of the article, but the file common_envs.sh important to us, because werf depends on it. An example of its content:

export RELEASES='1.0%v1.0.6-4'
export CHANNELS='1.0-alpha%v1.0.7-1 1.0-beta%v1.0.7-1 1.0-ea%v1.0.6-4 1.0-stable%v1.0.6-4 1.0-rock-solid%v1.0.6-4'
export ROOT_VERSION='v1.0.6-4'

You can use the output of such a script, for example, using the Bash function source.

And now the most interesting. In order for both the build and deployment of the application to work correctly, you need to make sure that werf.yaml was the same least within one pipeline. If this condition is not met, then the stage signatures that werf calculates during build and, for example, deploy will be different. This will result in a deployment error, as the image required for deployment will be missing.

In other words, if during the build of the site image information about releases and versions is the same, and at the time of deployment a new version is released and the environment variables have different values, then the deployment will fail with an error: after all, the artifact of the new version has not yet been built.

If generation werf.yaml depends on external data (for example, a list of current versions, as in our case), then the composition and values ​​of such data should be fixed within the pipeline. This is especially important if the external parameters change quite frequently.

We will receive and fix external data at the first stage of the pipeline in GitLab (Prebuild) and pass them further in the form GitLab CI artifact. This will allow you to start and restart pipeline tasks (build, deploy, cleanup) with the same configuration in werf.yaml.

Stage content Prebuild File .gitlab-ci.yml:

Prebuild:
  stage: prebuild
  script:
    - bash ./generate_artifacts 1> common_envs.sh
    - cat ./common_envs.sh
  artifacts:
    paths:
      - releases.yml
      - common_envs.sh
    expire_in: 2 week

Having fixed external data in an artifact, you can build and deploy using the standard stages of the GitLab CI pipeline: Build and Deploy. We launch the pipeline itself by hooks from the werf GitHub repository (that is, when changes are made in the GitHub repository). The data for them can be taken in the properties of the GitLab project in the section CI / CD Settings -> Pipeline triggers, and then create the corresponding Webhook in GitHub (Settings -> Webhooks).

The build stage will look like this:

Build:
  stage: build
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - werf build-and-publish --stages-storage :local
  except:
    refs:
      - schedules
  dependencies:
    - Prebuild

GitLab will add two artifacts from the stage to the build stage Prebuild, so we export prepared input variables with the construct source common_envs.sh. We start the build stage in all cases, except for the scheduled launch of the pipeline. According to the schedule, we will run a pipeline for cleaning - in this case, you do not need to build.

At the deployment stage, we will describe two tasks - separately for deployment to production and dev circuits, using a YAML template:

.base_deploy: &base_deploy
  stage: deploy
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - werf deploy --stages-storage :local
  dependencies:
    - Prebuild
  except:
    refs:
      - schedules

Deploy to Production:
  <<: *base_deploy
  variables:
    WERF_KUBE_CONTEXT: prod
  environment:
    name: production
    url: werf.io
  only:
    refs:
      - master
  except:
    variables:
      - $REVIEW_SHA
    refs:
      - schedules

Deploy to Test:
  <<: *base_deploy
  variables:
    WERF_KUBE_CONTEXT: dev
  environment:
    name: test
    url: werf.test.flant.com
  except:
    refs:
      - schedules
  only:
    variables:
      - $REVIEW_SHA

The tasks essentially differ only by specifying the cluster context into which werf should deploy (WERF_KUBE_CONTEXT), and setting the environment variables of the path (environment.name ΠΈ environment.url), which are then used in Helm chart templates. We will not give the content of the templates, because there is nothing interesting for the topic under consideration, but you can find them in repositories for the article.

Final touch

Since werf versions are released quite often, new images will be built frequently, and the Docker Registry will constantly grow. Therefore, it is imperative to configure automatic cleaning of images according to policies. It is very easy to do this.

Implementation will require:

  • Add a cleaning step to .gitlab-ci.yml;
  • Add periodic cleanup job execution;
  • Set an environment variable with a write access token.

Adding a cleanup step .gitlab-ci.yml:

Cleanup:
  stage: cleanup
  script:
    - type multiwerf && . $(multiwerf use 1.0 alpha --as-file)
    - type werf && source <(werf ci-env gitlab --tagging-strategy tag-or-branch --verbose)
    - source common_envs.sh
    - docker login -u nobody -p ${WERF_IMAGES_CLEANUP_PASSWORD} ${WERF_IMAGES_REPO}
    - werf cleanup --stages-storage :local
  only:
    refs:
      - schedules

We have already seen almost all of this a little higher - only for cleaning you need to first log in to the Docker Registry with a token that has rights to delete images in the Docker Registry (the automatically issued GitLab CI task token does not have such rights). The token must be entered in GitLab in advance and its value must be specified in the environment variable WERF_IMAGES_CLEANUP_PASSWORD project (CI/CD Settings -> Variables).

Adding a cleaning task with the required schedule is done in CI/CD ->
Schedules
.

Everything: the project in the Docker Registry will no longer constantly grow from unused images.

At the end of the practical part, let me remind you that the full listings from the article are available in Go:

Experience the Power of Effective Results

  1. We got a logical build structure: one artifact per version.
  2. The assembly is universal and does not require manual changes when new versions of werf are released: the documentation on the site is automatically updated.
  3. Two images are assembled for different contours.
  4. Works fast, because caching is used to the maximum β€” when a new version of werf is released or a GitHub hook is called for a review commit β€” only the corresponding artifact with the modified version is rebuilt.
  5. No need to think about deleting unused images: cleaning by werf policies will keep the Docker Registry in order.

Conclusions

  • Using werf allows the assembly to work quickly due to caching both the assembly itself and caching when working with external repositories.
  • Working with external Git repositories eliminates the need to clone the repository every time completely or reinvent the wheel with tricky optimization logic. werf uses the cache and does the clone only once and then uses fetch and only when needed.
  • Ability to use Go-templates in the build configuration file werf.yaml allows you to describe an assembly whose result depends on external data.
  • Using mount in werf significantly speeds up the collection of artifacts - due to the cache, which is common to all pipelines.
  • werf makes it easy to customize cleanup, which is especially true for dynamic builds.

PS

Read also on our blog:

Source: habr.com

Add a comment