What is GitOps?

Note. transl.: After a recent post material about the pull and push methods in GitOps, we saw interest in this model as a whole, but there were very few Russian-language publications on this topic (they simply do not exist on Habré). Therefore, we are glad to bring to your attention a translation of another article - albeit almost a year old! - from Weaveworks, the head of which coined the term "GitOps". The text explains the essence of the approach and the key differences from the existing ones.

A year ago we published introduction to GitOps. Then we talked about how the Weaveworks team launched a SaaS based entirely on Kubernetes and developed a set of prescriptive best practices for deploying, managing and monitoring in a cloud native environment.

The article proved popular. Other people started talking about GitOps, started publishing new tools for git push, Development of, secrets, functions, continuous integration and so on. Our website has a lot of publications and GitOps use cases. But some people still have questions. How is the model different from the traditional infrastructure as code and continuous supply (continuous delivery)? Is it mandatory to use Kubernetes?

We soon realized that a new description was needed, offering:

  1. Lots of examples and stories;
  2. Specific definition of GitOps;
  3. Comparison with traditional continuous delivery.

In this article, we have tried to cover all these topics. It provides an updated introduction to GitOps and a developer and CI/CD perspective on it. We mainly focus on Kubernetes, although the model can be generalized.

Meet GitOps

Imagine Alice. She runs Family Insurance, which offers health, auto, real estate and travel insurance to people who are too busy to figure out the nuances of contracts on their own. Her business started as a side project when Alice worked at a bank as a data scientist. One day, she realized that she could use advanced computer algorithms to more effectively analyze data and form insurance packages. Investors funded the project, and now her company brings in more than $20 million a year and is growing rapidly. Currently, it employs 180 people in various positions. Among them is a technology team that develops, maintains the site, database and analyzes the client base. A team of 60 people is led by Bob, the company's CTO.

Bob's team deploys production systems in the cloud. Their core applications run on GKE, taking advantage of Kubernetes on Google Cloud. In addition, they use various tools for working with data and analytics in their work.

Family Insurance had no intention of using containers, but got caught up in the enthusiasm around Docker. The company soon discovered that GKE made it easy and effortless to deploy clusters to test new features. Jenkins for CI and Quay were added to organize a container registry, scripts were written for Jenkins that push or new containers and configurations to GKE.

Some time has passed. Alice and Bob became frustrated with the performance of the chosen approach and its impact on the business. The introduction of containers hasn't improved productivity as much as the team had hoped. Sometimes deployments would break, and it wasn't clear if code changes were to blame. It also turned out to be difficult to keep track of config changes. Often it was necessary to create a new cluster and move applications to it, since this was the easiest way to eliminate the mess that the system had become. Alice was afraid that the situation would worsen as the application developed (in addition, a new project based on machine learning was brewing). Bob automated most of the work and did not understand why the pipeline is still unstable, does not scale well and periodically requires manual intervention?

Then they learned about GitOps. This decision turned out to be exactly what they needed to move forward confidently.

Alice and Bob have been hearing about workflows based on Git, DevOps, and infrastructure as code for years. The uniqueness of GitOps is that it brings a number of best practices - categorical and normative - to implement these ideas in the context of Kubernetes. This theme repeatedly raisedincluding Weaveworks blog.

Family Insurance decides to implement GitOps. The company now has an automated operations model that is compatible with Kubernetes and combines speed with stabilitybecause they:

  • found that the team has doubled productivity and no one goes crazy;
  • stopped serving scripts. Instead, they can now focus on new features and improve engineering practices - for example, implement canary rollouts and improve testing;
  • improved the deployment process - now it rarely breaks;
  • got the ability to restore deployments after partial failures without manual intervention;
  • would have boughtоGreater confidence in delivery systems. Alice and Bob discovered that it was possible to divide the team into microservices teams that worked in parallel;
  • can make 30-50 changes to the project every day with the efforts of each group and try new techniques;
  • easily attract new developers to the project, who have the opportunity to roll out updates to production using pull requests in a few hours;
  • are easily audited under SOC2 (on compliance of service providers with requirements for secure data management; read more, for example, here - approx. transl.).

What happened?

GitOps is two things:

  1. Operational model for Kubernetes and cloud native. It provides a set of best practices for deploying, managing, and monitoring containerized clusters and applications. An elegant definition in the form one slide from Luis Faceira:
  2. A path to creating a developer-centric environment for managing applications. We apply the Git workflow to both production and development. Please note that this is not just about Git push, but about organizing the entire set of CI / CD and UI / UX tools.

A few words about Git

If you are unfamiliar with version control systems and the Git-based workflow, we highly recommend learning them. Working with branches and pull requests may seem like black magic at first, but the benefits are well worth the effort. Here good article to start.

How Kubernetes Works

In our story, Alice and Bob turned to GitOps after working with Kubernetes for a while. Indeed, GitOps is closely related to Kubernetes - it is an operating model for infrastructure and applications based on Kubernetes.

What does Kubernetes bring to users?

Here are some main features:

  1. In the Kubernetes model, everything can be described in a declarative form.
  2. The Kubernetes API Server accepts such a declaration as input and then continuously attempts to bring the cluster into the state described in the declaration.
  3. Declarations are sufficient to describe and manage a wide variety of workloads - "applications".
  4. As a result, changes to the application and cluster are due to:
    • changes in container images;
    • changes in the declarative specification;
    • errors in the environment, such as container crashes.

Excellent convergence capabilities of Kubernetes

When an administrator makes configuration changes, the Kubernetes orchestrator will apply them to the cluster as long as its state is will not come close to the new configuration. This model works for any Kubernetes resource and is extensible with Custom Resource Definitions (CRDs). Therefore, Kubernetes deployments have the following wonderful properties:

  • AutomationA: Kubernetes updates provide a mechanism to automate the process of applying changes correctly and in a timely manner.
  • Convergence: Kubernetes will keep trying updates until successful.
  • Idempotency: repeated applications of convergence lead to the same result.
  • Determinism: When resources are sufficient, the state of the upgraded cluster depends only on the desired state.

How Git Ops Works

We have learned enough about Kubernetes to explain how GitOps works.

Let's get back to the Family Insurance microservices teams. What do they usually have to do? Take a look at the list below (if any of the points in it seem strange or unfamiliar, please wait a little with criticism and stay with us). These are just examples of Jenkins based workflows. There are many other processes when working with other tools.

The main thing is that we see that each update ends with changes to the configuration files and Git repositories. These Git changes cause the "GitOps operator" to update the cluster:

1. Workflow: "Jenkins build - master branch».
Task list:

  • Jenkins pushes tagged images to Quay;
  • Jenkins pushes config and Helm charts to the master storage bucket;
  • The cloud function copies the config and charts from the master storage bucket to the master Git repository;
  • The GitOps operator updates the cluster.

2. Jenkins build - release or hotfix branch:

  • Jenkins pushes untagged images to Quay;
  • Jenkins pushes config and Helm charts to the staging bucket
  • The cloud function copies the config and charts from the staging bucket to the staging Git repository;
  • The GitOps operator updates the cluster.

3. Jenkins build - develop or feature branch:

  • Jenkins pushes untagged images to Quay;
  • Jenkins pushes config and Helm charts to the develop storage bucket;
  • The cloud function copies the config and charts from the develop repository bucket to the develop Git repository;
  • The GitOps operator updates the cluster.

4. Adding a new client:

  • The manager or administrator (LCM/ops) invokes Gradle to initially deploy and configure network load balancers (NLBs);
  • LCM/ops commits a new config to prepare the deployment for updates;
  • The GitOps operator updates the cluster.

Brief description of GitOps

  1. Describe the desired state of the entire system using declarative specifications for each environment (in our story, Bob's team defines the entire system configuration in Git).
    • The git repository is the only source of truth about the desired state of the entire system.
    • All changes to the desired state are made by Git commits.
    • All desired cluster parameters are also observable in the cluster itself. Thus, we can determine whether they coincide (converge, converges) or differ (diverge, diverges) desired and observed states.
  2. If the desired and observed states are different, then:
    • There is a convergence mechanism that, sooner or later, automatically synchronizes the target and observed states. Inside the cluster, Kubernetes does this.
    • The process starts immediately with a "change committed" alert.
    • After a configurable amount of time, a "diff" alert can be sent if the states are different.
  3. Thus, all commits in Git cause verifiable and idempotent updates in the cluster.
    • A rollback is a convergence to a previously desired state.
  4. Convergence is final. Its occurrence is evidenced by:
    • No "diff" alerts for a certain amount of time.
    • "converged" alert (e.g. webhook, Git writeback event).

What is divergence?

Let's repeat again: all desired cluster properties must be observable in the cluster itself.

A few examples of divergence:

  • Change in config file due to branch merging in Git.
  • A change in the config file due to a Git commit made by the GUI client.
  • Multiple changes to desired state due to Git PR followed by container image build and config changes.
  • A change in the state of the cluster due to an error, a resource conflict resulting in "bad behavior", or simply an accidental deviation from the original state.

What is the convergence mechanism?

A few examples:

  • For containers and clusters, the convergence mechanism is provided by Kubernetes.
  • The same mechanism can be used to manage Kubernetes-based applications and constructs (such as Istio and Kubeflow).
  • The mechanism for managing the working interaction between Kubernetes, image repositories and Git provides GitOps operator Weave Flux, which is part Weave Cloud.
  • For underlying machines, the convergence mechanism must be declarative and self-contained. From our experience we can say that terraform closest to this definition, but still requires human control. In this sense, GitOps extends the tradition of Infrastructure as Code.

GitOps combines Git with the beautiful Kubernetes convergence engine to offer a model for exploitation.

GitOps allows us to declare: only those systems that can be described and observed can be automated and controlled.

GitOps is for the entire cloud native stack (like Terraform, etc.)

GitOps is not just about Kubernetes. We want the whole system to be managed declaratively and use convergence. By the whole system, we mean a set of environments that work with Kubernetes - for example, "dev cluster 1", "production", etc. Each environment includes machines, clusters, applications, as well as interfaces to external services that provide data, monitoring and etc.

Notice how important Terraform is to the bootstrapping problem here. Kubernetes has to be deployed somewhere, and using Terraform means we can apply the same GitOps workflows to build the governance layer that underpins Kubernetes and applications. This is a useful best practice.

There is a lot of focus on applying GitOps concepts to layers on top of Kubernetes. At the moment, there are GitOps-type solutions for Istio, Helm, Ksonnet, OpenFaaS and Kubeflow, as well as for example, for Pulumi, which create a layer for developing applications under cloud native.

Kubernetes CI/CD: Comparing GitOps to Other Approaches

As mentioned, GitOps is two things:

  1. The operating model for Kubernetes and cloud native described above.
  2. The path to a developer-centric environment for managing applications.

For many, GitOps is primarily a workflow based on Git pushes. We like it too. But that's not all: let's now look at CI/CD pipelines.

GitOps enables continuous deployment (CD) under Kubernetes

GitOps offers a continuous deployment mechanism that eliminates the need for separate "deployment management systems". Kubernetes does all the work for you.

  • Updating an application requires an update in Git. This is a transactional update to the desired state. The "deployment" is then done inside the cluster by Kubernetes itself based on the updated description.
  • Due to the way Kubernetes works, these updates are convergent. This provides a mechanism for continuous deployment in which all updates are atomic.
  • Note: Weave Cloud offers a GitOps operator that integrates Git and Kubernetes and allows you to perform CD by matching the desired and current state of the cluster.

Without kubectl and scripts

You should avoid using Kubectl to update your cluster, and especially scripts to group kubectl commands. Instead, with a GitOps pipeline, a user can update their Kubernetes cluster via Git.

Benefits include:

  1. Right. An update group can be applied, converged, and finally validated, bringing us closer to the goal of atomic deployment. In contrast, the use of scripts does not provide any guarantee of convergence (more on this below).
  2. Security. Quoting Kelsey Hightower: "Restrict access to the Kubernetes cluster to automation tools and administrators who are responsible for debugging or maintaining it." see also my publication about safety and compliance with specifications, as well as article on hacking Homebrew by stealing credentials from a sloppy Jenkins script.
  3. User experience. Kubectl exposes the mechanics of the Kubernetes object model, which is quite complex. Ideally, users should interact with the system at a higher level of abstraction. Here I will again refer to Kelsey and recommend to look such a summary.

Difference between CI and CD

GitOps improves existing CI/CD models.

A modern CI server is an orchestration tool. In particular, it is a tool for orchestrating CI pipelines. They include build, test, merge to trunk, etc. CI servers automate the management of complex multi-step pipelines. A common temptation is to script a set of Kubernetes updates and run it as part of a pipeline to push changes to the cluster. Indeed, many professionals do this. However, this is not optimal, and here's why.

CI should be used to make updates to the trunk, and the Kubernetes cluster should change itself based on those updates in order to manage the CD "internally". We call it pull model for CD, unlike the CI push model. CD is part runtime orchestration.

Why CI Servers Shouldn't CD Through Direct Kubernetes Updates

Don't use a CI server to orchestrate direct updates to Kubernetes as a set of CI jobs. This is the anti-pattern that we already told on your blog.

Let's go back to Alice and Bob.

What problems did they face? Bob's CI server applies the changes to the cluster, but if it crashes in the process, Bob won't know what state the cluster is (or should be in) or how to fix it. The same is true in case of success.

Let's assume that Bob's team built a new image and then patched their deployments to deploy the image (all from the CI pipeline).

If the image builds fine but the pipeline fails, the team will have to figure out:

  • Has the update rolled out?
  • Are we starting a new build? Will this lead to unnecessary side effects - with the possibility of getting two builds of the same immutable image?
  • Should we wait for the next update before starting the build?
  • What exactly went wrong? Which steps need to be repeated (and which ones are safe to repeat)?

Establishing a Git-based workflow does not guarantee that Bob's team will not run into these problems. They can still make a mistake with pushing a commit, with a tag, or some other parameter; however, this approach is still much closer to an explicit all-or-nothing approach.

To sum it up, here's why CI servers shouldn't deal with CD:

  • Update scripts are not always deterministic; they are easy to make mistakes.
  • CI servers do not converge to a declarative cluster model.
  • It is difficult to guarantee idempotency. Users must understand the deep semantics of the system.
  • It is more difficult to recover from a partial failure.

A note about Helm: If you want to use Helm, we recommend combining it with a GitOps statement such as Flux Helm. This will help ensure convergence. By itself, Helm is neither deterministic nor atomic.

GitOps as the best way to implement Continuous Delivery for Kubernetes

Alice and Bob's team implements GitOps and finds that it's much easier to work with software products, maintain high performance and stability. Let's end this article with illustrations showing what their new approach looks like. Keep in mind that we are mainly talking about applications and services, however GitOps can be used to manage the entire platform.

Exploitation model for Kubernetes

Look at the following diagram. It exposes Git and the container image store as shared resources for two orchestrated life cycles:

  • A continuous integration pipeline that reads and writes files to Git and can update the container image repository.
  • A Runtime GitOps pipeline that combines deployment with management and observability. It reads and writes files to Git and can download container images.

What are the main findings?

  1. Separation of concerns: Note that both pipelines can only exchange data by updating Git or the image repository. In other words, there is a firewall between CI and the runtime environment. We call it the "immutability firewall" (immutability firewall), because all repository updates create new versions. For more information on this topic, see slides 72-87 this presentation.
  2. You can use any CI and Git serverA: GitOps works with any component. You can continue to use your favorite CI and Git servers, image repositories, and test suites. Almost every other Continuous Delivery tool on the market requires its own CI/Git server or image repository. This can become a limiting factor in the development of cloud native. In the case of GitOps, you can use familiar tools.
  3. Events as an integration tool: As soon as the data in Git is updated, Weave Flux (or the Weave Cloud operator) notifies the runtime about it. Whenever Kubernetes accepts a set of changes, Git is updated. This provides a simple integration model for organizing workflows for GitOps, as shown below.

Conclusion

GitOps provides strong update guarantees that any modern CI/CD tool needs:

  • automation;
  • convergence;
  • idempotence;
  • determinism.

This is important because it offers an operating model for cloud native developers.

  • Traditional tools for managing and monitoring systems are associated with operations teams operating within a runbook. (a set of routine procedures and operations - approx. transl.), tied to a specific deployment'u.
  • In cloud native management, monitoring tools are the best way to evaluate the results of deployments so that the development team can quickly respond to them.

Imagine many clusters scattered across different clouds and many services with their own teams and deployment plans. GitOps offers a scale-invariant model for managing all this abundance.

PS from translator

Read also on our blog:

Only registered users can participate in the survey. Sign in, you are welcome.

Did you know about GitOps before these two translations appeared on Habré?

  • Yes, I knew everything

  • Only superficially

  • No

35 users voted. 10 users abstained.

Source: habr.com

Add a comment