Content-based tagging in the werf builder: why and how does it work?

Content-based tagging in the werf builder: why and how does it work?

yard is our open source GitOps CLI utility for building and delivering applications to Kubernetes. IN release v1.1 a new feature has been introduced in the image assembler: tagging images by content or content-based tagging. Until now, the typical tagging scheme in werf has been to tag Docker images by Git tag, Git branch, or Git commit. But all of these schemes have drawbacks that are completely solved by the new tagging strategy. Details about her and why she is so good - under the cut.

Rolling out a set of microservices from one Git repository

Often there is a situation when the application is divided into many more or less independent services. Releases of these services can occur independently: one or more services can be released at a time, while the rest must continue to work without any changes. But from the point of view of code storage and project management, it is more convenient to keep such application services in a single repository.

There are situations when services are truly independent and not associated with one application. In this case, they will be located in separate projects and their release will be carried out through separate CI / CD processes in each of the projects.

However, in reality, developers often break a single application into several microservices, but starting a separate repository and project for each one is an obvious overkill. It is this situation that will be discussed further: several of these microservices lie in a single project repository and releases occur through a single process in CI / CD.

Tagging by Git branch and Git tag

Let's say the most common tagging strategy is used βˆ’ tag-or-branch. For Git branches, images are tagged with the name of the branch; for one branch at a time, there is only one published image by the name of this branch. For Git tags, images are tagged according to the tag name.

When a new Git tag is created - for example, when a new version is released - a new Docker tag will be created for all project images in the Docker Registry:

  • myregistry.org/myproject/frontend:v1.1.10
  • myregistry.org/myproject/myservice1:v1.1.10
  • myregistry.org/myproject/myservice2:v1.1.10
  • myregistry.org/myproject/myservice3:v1.1.10
  • myregistry.org/myproject/myservice4:v1.1.10
  • myregistry.org/myproject/myservice5:v1.1.10
  • myregistry.org/myproject/database:v1.1.10

These new image names are passed through Helm templates to the Kubernetes configuration. When launching the deployment with the command werf deploy field is updated image in the manifests of the Kubernetes resources and restarting the corresponding resources due to a changed image name.

Problem: in the case when the content of the image has not really changed since the previous rollout (Git tag), but only its Docker tag, the excess restarting this application and, accordingly, some downtime is possible. Although there was no real reason to do this restart.

As a result, with the current tagging scheme, several separate Git repositories have to be fenced in and the problem of organizing the rollout of these several repositories arises. In general, such a scheme turns out to be overloaded and complex. It is better to combine many services into a single repository and create such Docker tags so that there are no unnecessary restarts.

Tagging by Git commit

werf also has a tagging strategy related to Git commits.

Git-commit is a content ID of a Git repository and depends on the revision history of files in the Git repository, so it seems logical to use it to tag images in the Docker Registry.

However, tagging on a Git commit has the same disadvantages as tagging on Git branches or Git tags:

  • An empty commit could be created that does not change any files, and the image's Docker tag would be changed.
  • A merge commit might have been created that doesn't change any files, and the image's Docker tag would have changed.
  • A commit could be created that changes those files in Git that are not imported into the image, and the Docker tag of the image will be changed again.

Tagging by Git branch name does not reflect image version

There is another problem with the Git branch tagging strategy.

Tagging by branch name works as long as the commits of that branch are built sequentially in chronological order.

If in the current scheme the user starts rebuilding an old commit associated with some branch, then werf will overwrite the image at the corresponding Docker tag with the newly built version of the image for the old commit. From now on, Deployments using this tag run the risk of pulling another version of the image during the restart of the pods, as a result of which our application will lose contact with the CI system and become out of sync.

In addition, with successive pushes to the same branch with a small time interval between them, the old commit may be built later than the newer one: the old version of the image will overwrite the new one by the Git branch tag. Such problems can be solved by a CI / CD system (for example, in GitLab CI, for a series of commits, the pipeline of the latter is launched). However, not all systems support this, and there must be a more reliable way to prevent such a fundamental problem.

What is content-based tagging?

So, what is content-based tagging - tagging images by content.

To create Docker tags, not Git primitives (Git branch, Git tag ...) are used, but a checksum associated with:

  • image content. The tag-id of an image reflects its content. When building a new version, this identifier will not change if the files in the image have not changed;
  • history of creating this image in Git. Images linked to different Git branches and different build histories via werf will have different ID tags.

Such an identifier tag is the so-called image stage signature.

Each image consists of a set of stages: from, before-install, git-archive, install, imports-after-install, before-setup,… git-latest-patch etc. Each stage has an ID that reflects its contents βˆ’ stage signature (stage signature).

The final image, consisting of these stages, is tagged with the so-called signature of the set of these stages - stages signature, β€” which is generalizing for all stages of the image.

Each image from the configuration werf.yaml in the general case, there will be such a signature and, accordingly, a Docker tag.

The stage signature solves all of these problems:

  • Resistant to empty Git commits.
  • Resistant to Git commits that change files that are not relevant to the image.
  • Doesn't cause the problem of overwriting the current version of the image when restarting builds for old branch Git commits.

This is now the recommended tagging strategy and is the default in werf for all CI systems.

How to enable and use in werf

The corresponding option appeared in the command werf publish: --tag-by-stages-signature=true|false

In the CI system, the tagging strategy is set by the command werf ci-env. Previously, the parameter was defined for it werf ci-env --tagging-strategy=tag-or-branch. Now if you specify werf ci-env --tagging-strategy=stages-signature or don't specify this option, werf will use the tagging strategy by default stages-signature. Team werf ci-env will automatically set the necessary flags for the command werf build-and-publish (or werf publish), so no additional options need be specified for these commands.

For example, the command:

werf publish --stages-storage :local --images-repo registry.hello.com/web/core/system --tag-by-stages-signature

... can create the following images:

  • registry.hello.com/web/core/system/backend:4ef339f84ca22247f01fb335bb19f46c4434014d8daa3d5d6f0e386d
  • registry.hello.com/web/core/system/frontend:f44206457e0a4c8a54655543f749799d10a9fe945896dab1c16996c6

Here 4ef339f84ca22247f01fb335bb19f46c4434014d8daa3d5d6f0e386d is the signature of the stages of the image backend, f44206457e0a4c8a54655543f749799d10a9fe945896dab1c16996c6 β€” signature of image stages frontend.

When using special functions werf_container_image ΠΈ werf_container_env nothing needs to be changed in the Helm templates: these functions will automatically generate the correct image names.

Configuration example in CI system:

type multiwerf && source <(multiwerf use 1.1 beta)
type werf && source <(werf ci-env gitlab)
werf build-and-publish|deploy

More information on configuration is available in the documentation:

Total

  • New option werf publish --tag-by-stages-signature=true|false.
  • New option value werf ci-env --tagging-strategy=stages-signature|tag-or-branch (if not specified, it will default to stages-signature).
  • If the tagging options for Git commits were previously used (WERF_TAG_GIT_COMMIT or option werf publish --tag-git-commit COMMIT), then be sure to switch to the tagging strategy stages signature.
  • It is better to switch new projects to the new tagging scheme right away.
  • When transferring old projects to werf 1.1, it is advisable to switch to the new tagging scheme, but the old tag-or-branch is still supported.

Content-based tagging solves all the problems covered in the article:

  • Docker tag name resistance to empty Git commits.
  • Resilience of the Docker tag name to Git commits that change files that are irrelevant to the image.
  • Doesn't lead to the problem of overwriting the current version of the image when restarting builds for old Git commits for Git branches.

Enjoy! And don't forget to visit us GitHubto create an issue or find an existing one, upvote it, create a PR, or just watch the development of the project.

PS

Read also on our blog:

Source: habr.com

Add a comment