Our experience in developing a CSI driver in Kubernetes for Yandex.Cloud

Our experience in developing a CSI driver in Kubernetes for Yandex.Cloud

We are glad to announce that Flant is expanding its contribution to Open Source tools for Kubernetes by releasing alpha version of the CSI driver (Container Storage Interface) for Yandex.Cloud.

But before moving on to implementation details, let's answer the question of why this is needed at all when Yandex already has a service Managed Service for Kubernetes.

Introduction

Why is that?

Inside our company, from the very beginning of Kubernetes operation in production (that is, for several years), our own tool (deckhouse) has been developing, which, by the way, we also plan to make available as an Open Source project soon. With its help, we uniformly configure and configure all our clusters, and at the moment there are already more than 100 of them, moreover, on a wide variety of hardware configurations and in all available cloud services.

Clusters that use deckhouse have all the components necessary for operation: balancers, monitoring with convenient charts, metrics and alerts, user authentication through external providers to access all dashboards, and so on. It makes no sense to put such a “pumped” cluster into a managed solution, since it is often either impossible or will lead to the need to disable half of the components.

NB: This is our experience, and it is quite specific. We are by no means suggesting that everyone should do their own deployment of Kubernetes clusters instead of using ready-made solutions. By the way, we have no real experience in operating Kubernetes from Yandex and we will not give any assessment of this service in this article.

What is it and for whom?

So, we have already talked about the modern approach to storage in Kubernetes: how CSI works и How did the community come to such an approach.

Currently, many major cloud service providers have developed drivers to use their "cloud" disks as a Persistent Volume in Kubernetes. If the supplier does not have such a driver, but all the necessary functions are provided through the API, then nothing prevents you from implementing the driver on your own. This is what we did with Yandex.Cloud.

We took as a basis for development CSI Driver for DigitalOcean Cloud and a couple of ideas GCP drivers, since interaction with the API of these clouds (Google and Yandex) has a number of similarities. In particular, the API and GCP, and y Yandex return an object Operation to monitor the status of lengthy operations (for example, creating a new drive). To interact with the Yandex.Cloud API, use Yandex.Cloud Go SDK.

The result of the work done published on GitHub and can be useful to those who, for some reason, use their own Kubernetes installation on Yandex.Cloud virtual machines (but not a ready-made managed cluster) and would like to use (order) disks via CSI.

implementation

Main Features

The driver currently supports the following features:

  • Order disks in all zones of the cluster according to the topology of the nodes in the cluster;
  • Removal of previously ordered discs;
  • Offline resize for disks (Yandex.Cloud do not support increase the disks that are mounted to the virtual machine). See below for how the driver had to be modified to make resize as painless as possible.

In the future, it is planned to implement support for creating and deleting disk snapshots.

The main difficulty and its overcoming

The absence in the Yandex.Cloud API of the ability to increase disks in real time is a limitation that complicates the resize operation for PV (Persistent Volume): after all, in this case, it is necessary that the application pod that uses the disk be stopped, and this can cause a downtime applications.

According to CSI specifications, if the CSI controller reports that it can only resize disks “offline” (VolumeExpansion.OFFLINE), then the process of increasing the disk should go like this:

If the plugin has only VolumeExpansion.OFFLINE expansion capability and volume is currently published or available on a node then ControllerExpandVolume MUST be called ONLY after either:

  • The plugin has controller PUBLISH_UNPUBLISH_VOLUME capability and ControllerUnpublishVolume has been invoked successfully.

OR ELSE

  • The plugin does NOT have controller PUBLISH_UNPUBLISH_VOLUME capability, the plugin has node STAGE_UNSTAGE_VOLUME capability, and NodeUnstageVolume has been successfully completed.

OR ELSE

  • The plugin does NOT have controller PUBLISH_UNPUBLISH_VOLUME capability, nor node STAGE_UNSTAGE_VOLUME capability, and NodeUnpublishVolume has successfully completed.

In essence, this means the need to detach the disk from the virtual machine before increasing it.

However, unfortunately, Realization of CSI specification via sidecars does not meet these requirements:

  • In a sidecar container csi-attacher, which should be responsible for the presence of the required gap between mounts, this functionality is simply not implemented with offline resizing. This discussion was initiated here.
  • What is a sidecar container in general in this context? The CSI plugin itself does not interact with the Kubernetes API, but only responds to gRPC calls that sidecar containers send to it. Latest are being developed the Kubernetes community.

In our case (CSI plugin), the disk expansion operation looks like this:

  1. Getting a gRPC call ControllerExpandVolume;
  2. We are trying to increase the disk in the API, but we get an error about the impossibility of performing the operation, since the disk is mounted;
  3. We store the disk identifier in a map containing the disks for which we need to perform an increase operation. Hereafter, for brevity, we will call this map as volumeResizeRequired;
  4. Manually remove the pod that is using the drive. Kubernetes will then restart it. To prevent the disk from being mounted (ControllerPublishVolume) before the increase operation is completed when trying to mount, we check that this disk is still in volumeResizeRequired and return an error;
  5. The CSI driver tries to re-execute the resize operation. If the operation was successful, then remove the disk from volumeResizeRequired;
  6. Because disk ID is missing in volumeResizeRequired, ControllerPublishVolume succeeds, the disk is mounted, the pod is started.

Everything looks simple enough, but as always there are pitfalls. Enlargement of disks is engaged in external-resizer, which in case of an error during the operation uses a queue with an exponential increase in the timeout to 1000 seconds:

func DefaultControllerRateLimiter() RateLimiter {
  return NewMaxOfRateLimiter(
  NewItemExponentialFailureRateLimiter(5*time.Millisecond, 1000*time.Second),
  // 10 qps, 100 bucket size.  This is only for retry speed and its only the overall factor (not per item)
  &BucketRateLimiter{Limiter: rate.NewLimiter(rate.Limit(10), 100)},
  )
}

This can occasionally result in the disk grow operation stretching out for 15+ minutes and thus making the corresponding pod unavailable.

The only option that allowed us to reduce potential downtime quite easily and painlessly was to use our version of external-resizer with a maximum timeout limit in 5 seconds:

workqueue.NewItemExponentialFailureRateLimiter(5*time.Millisecond, 5*time.Second)

We did not consider it necessary to urgently initiate a discussion and patch the external-resizer, because offline resize disks is an atavism that will soon disappear from all cloud providers.

How to start using?

The driver is supported in Kubernetes version 1.15 and above. The following requirements must be met for the driver to work:

  • Flag --allow-privileged set to value true for API server and kubelet;
  • Included --feature-gates=VolumeSnapshotDataSource=true,KubeletPluginsWatcher=true,CSINodeInfo=true,CSIDriverRegistry=true for API server and kubelet;
  • Mount propagation (mount propagation) must be enabled on the cluster. When using Docker, the daemon must be configured to allow shared mounts.

All necessary steps for the installation itself described in the README. Installation is the creation of objects in Kubernetes from manifests.

For the driver to work, you will need the following:

  • Specify the directory identifier in the manifest (folder-id) Yandex.Clouds (see documentation);
  • The CSI driver uses a service account to interact with the Yandex.Cloud API. In the Secret manifest, you need to pass authorized keys from a service account. Documentation describedhow to create a service account and get the keys.

All in all - tryand we'd love to hear from you and new issuesif you run into any problems!

Further support

As a result, we would like to note that we implemented this CSI driver not out of a great desire to have fun with writing applications in Go, but because of the urgent need within the company. Supporting our own implementation does not seem appropriate to us, therefore, if Yandex shows interest and decides to continue supporting the driver, then we will be happy to transfer the repository to their disposal.

In addition, Yandex probably has its own implementation of the CSI driver in the managed Kubernetes cluster, which can be released in Open Source. We also see this development option as favorable - the community will be able to use a proven driver from a service provider, and not from a third-party company.

PS

Read also on our blog:

Source: habr.com

Add a comment