Building a kubernetes platform on Pinterest

Over the years, Pinterest's 300 million users have created over 200 billion Pins on over 4 billion boards. To serve this army of users and vast content base, the portal has developed thousands of services, ranging from microservices that can be handled by a few CPUs to giant monoliths that run on a whole fleet of virtual machines. And then the moment came when the company's eyes fell on k8s. What did the β€œcube” look like to β€œPinterest”? You will learn about this from our translation of a recent article from blog Pinterest engineering.

Building a kubernetes platform on Pinterest

So, hundreds of millions of users and hundreds of billions of pins. To serve this army of users and vast content base, we have developed thousands of services, ranging from microservices that can be handled by a few CPUs to giant monoliths that run on a whole fleet of virtual machines. In addition, we have a variety of frameworks that may also require CPU resources, memory, or access to I / O operations.

While maintaining this zoo of tools, the development team faces a number of challenges:

  • Engineers don't have a unified way to start a production environment. Stateless services, Stateful services, and projects under active development are based on completely different technology stacks. This has led to a whole training course for engineers, and also makes our infrastructure team's job much more difficult.
  • Developers who have their own fleet of virtual machines at their disposal create a huge burden on internal administrators. As a result, such simple operations as updating the OS or AMI are stretched out for weeks and months. This leads to an increase in the load in seemingly absolutely everyday situations.
  • Difficulties with the creation of global infrastructure management tools on top of existing solutions. The situation is further complicated by the fact that it is not easy to find the owners of virtual machines. That is, we do not know if these capacities can be safely removed to work in other parts of our infrastructure.

Container orchestration systems are a way to unify workload management. They open the way for you to increase the speed of development and simplify infrastructure management, since all the resources involved in the project are managed by one centralized system.

Building a kubernetes platform on Pinterest

Figure 1: Infrastructure priorities (reliability, developer productivity, and efficiency).

The Cloud Management Platform team at Pinterest met K8s in 2017. By the first half of 2017, we had documented most of our production facilities, including the API and all of our web servers. After that, we conducted a careful assessment of various systems for orchestrating container solutions, building clusters and working with them. By the end of 2017, we decided to use Kubernetes. It was quite flexible and widely supported in the developer community.

So far, we have created our own Kops-based cluster bootstrapping tools and migrated existing infrastructure components such as network, security, metrics, logging, identity management, and traffic to Kubernetes. We also implemented a workload modeling system for our resource, the complexity of which is hidden from developers. Now we are focused on ensuring the stability of the cluster, scaling it and connecting new clients.

Kubernetes: The Pinterest Way

Getting started with Pinterest-scale Kubernetes as a platform to be loved by our engineers has been a lot of hard work.

As a large company, we have invested heavily in infrastructure tools. Examples include security tools that process certificates and distribute keys, traffic control components, service discovery systems, visibility and reporting components for logs and metrics. All this was collected for a reason: we went through a normal path of trial and error, and therefore we wanted to integrate all this economy into a new infrastructure on Kubernetes, instead of reinventing the old bicycle on a new platform. This approach has generally simplified the migration, since all application support is already in place, it does not have to be created from scratch.

On the other hand, the load prediction models in Kubernetes itself (such as deployments, jobs, and daemons) are not enough for our project. These usability issues are huge roadblocks to moving to Kubernetes. For example, we've heard service developers complain about missing or incorrectly configured logins. We also encountered misuse of templating engines, creating hundreds of copies with the same specification and job, resulting in nightmare debugging problems.

It was also very difficult to support different versions in the same cluster. Imagine the complexity of customer support if you need to work in multiple versions of the same runtime at once, with all their problems, bugs and updates.

Pinterest Custom Resources and Controllers

To make it easier for our engineers to implement Kubernetes, and to simplify and speed up infrastructure, we have developed our own Custom Resource Definitions (CRDs).

CRDs provide the following functionality:

  1. Consolidation of various native Kubernetes resources so that they work as a single load. For example, the PinterestService resource includes a deployment, a login service, and a configuration map. This allows developers not to worry about setting up DNS.
  2. Implementation of the necessary application support. The user should only focus on the container specification according to their business logic, while the CRD controller injects all the necessary init containers, environment variables, and pod specifications. This provides a fundamentally different level of comfort for developers.
  3. CRD controllers also manage the lifecycle of their own resources and increase the availability of debugging. This includes negotiating desired and actual specifications, updating CRD status and event logging, and more. Without CRD, developers would be forced to manage a large set of resources, which would only increase the chance of error.

Here is an example of a PinterestService and an internal resource that is managed by our controller:

Building a kubernetes platform on Pinterest

As you can see above, in order to support a custom container, we need to integrate an initialization container and a few add-ons into it to provide security, visibility, and network traffic. In addition, we created configuration map templates and implemented support for PVC templates for batch jobs, as well as tracking many environment variables to track identity, resource consumption, and garbage collection.

It's hard to imagine that developers would want to write these configuration files by hand without CRD support, let alone further maintain and debug the configurations.

Application deployment workflow

Building a kubernetes platform on Pinterest

The image above shows how to deploy a Pinterest user resource in a Kubernetes cluster:

  1. Developers interact with our Kubernetes cluster through the CLI and user interface.
  2. The CLI/UI tools pull the YAML workflow configuration files and other build properties (same version ID) from Artifactory and then submit them to the Job Submission Service. This step ensures that only production versions are delivered to the cluster.
  3. JSS is the gateway to various platforms including Kubernetes. Here the user is authenticated, quotas are issued, and a partial check of the configuration of our CRD takes place.
  4. After checking the CRD on the JSS side, the information is sent to the k8s platform API.
  5. Our CRD controller listens for events on all user resources. It converts CRs to native k8s resources, adds the necessary modules, sets the appropriate environment variables, and performs other ancillary work to ensure containerized user applications have sufficient infrastructure support.
  6. The CRD controller then passes the received data to the Kubernetes API for it to be processed by the scheduler and put to work.

Note: This pre-release deployment workflow was created for early adopters of the new k8s platform. We are currently in the process of refining this process to fully integrate with our new CI/CD. This means that we cannot cover everything related to Kubernetes. We look forward to sharing our experience and the team's progress on this front in our next blog post "Building a CI/CD platform for Pinterest".

Types of Special Resources

Based on the specific needs of Pinterest, we have developed the following CRDs to suit a variety of workflows:

  • PinterestService is a stateless service that has been operating for a long time. Many of our main systems are based on a set of such services.
  • PinterestJobSet models full cycle batch jobs. There is a common scenario on Pinterest where multiple jobs run the same containers in parallel, independent of other similar processes.
  • PinterestCronJob is widely used in conjunction with small periodic loads. This is a wrapper for native cron work with Pinterest support mechanisms that are responsible for security, traffic, logs and metrics.
  • PinterestDaemon includes Infrastructure Daemons. This family continues to grow as we add more support to our clusters.
  • PinterestTrainingJob extends to Tensorflow and Pytorch processes, providing the same level of runtime support as all other CRDs. Since Pinterest actively uses Tensorflow and other machine learning systems, we had a reason to build a separate CRD around them.

We are also working on PinterestStatefulSet, which will soon be adapted for data warehouses and other stateful systems.

Runtime support

When an app module runs on Kubernetes, it automatically receives a certificate to identify itself. This certificate is used to access the secret store or to communicate with other services via mTLS. Meanwhile, the container init configurator and the Daemon will download all the necessary dependencies prior to launching the containerized application. When it's ready, the traffic's sidecar and Daemon will register the pod's IP address with our Zookeeper so that clients can discover it. All this will work, as the network module was configured before the application was launched.

The above are typical examples of running workload support. Other types of workloads may require slightly different support, but they are all represented as pod-level sidecars, virtual machine-level hosts, or daemons. We make sure that all this is deployed within the management infrastructure and consistent across applications, which ultimately reduces the burden in terms of technical work and customer support.

Testing and QA

We have built an end-to-end test pipeline on top of the existing Kubernetes test infrastructure. These tests apply to all of our clusters. Our pipeline went through a lot of changes before it became part of the product cluster.

In addition to testing systems, we have monitoring and alerting systems that constantly monitor the status of system components, resource consumption and other important indicators, notifying us only when human intervention is needed.

Alternatives

We have looked at some alternatives to custom resources such as mutated access controllers and template systems. However, all of them are fraught with serious difficulties in work, so we chose the path of CRD.

A mutational tolerance controller was used to introduce sidecars, environment variables, and other runtime support. However, he encountered various problems, for example, with resource binding and resource management, when, as in CRD, such problems do not arise.

Note: Template systems such as Helm charts are also widely used to run applications with similar configurations. However, our work applications are too diverse to be managed with templates. Also, during continuous deployment, too many errors will occur when using templates.

Future work

Now we are dealing with a mixed load on all of our clusters. To support similar processes of various types and sizes, we work in the following areas:

  • Clustering distributes large applications across different clusters for scalability and stability.
  • Ensuring the stability, scalability, and visibility of the cluster to create a connection between the application and its SLA.
  • Resource and quota management so that applications do not conflict with each other, and the scale of the cluster is controlled from our side.
  • A new CI/CD platform for supporting and deploying applications on Kubernetes.

Source: habr.com

Add a comment