Kubernetes storage volume plugins: from Flexvolume to CSI

Kubernetes storage volume plugins: from Flexvolume to CSI

Back when Kubernetes was still v1.0.0, there were volume plugins. They were needed to connect systems to Kubernetes to store persistent (permanent) container data. The number of them was small, and among the first were such storage providers as GCE PD, Ceph, AWS EBS and others.

Plugins were supplied along with Kubernetes, for which they got their name - in-tree. However, for many, the existing set of such plugins turned out to be insufficient. Craftsmen added simple plugins to the Kubernetes core using patches, after which they assembled their own Kubernetes and put it on their servers. But over time, Kubernetes developers realized that fish not solve the problem. people need fishing rod. And in the release of Kubernetes v1.2.0, it appeared ...

Plugin Flexvolume: fishing rod at the minimum

Kubernetes developers created the FlexVolume plugin, which was a logical binding of variables and methods for working with Flexvolume drivers implemented by third-party developers.

Let's stop and take a closer look at what the FlexVolume driver is. This is some executable file (binary file, Python script, Bash script, etc.), which, when executed, takes command line arguments as input and returns a message with known fields in JSON format. By convention, the first command-line argument is always the method, and the remaining arguments are its parameters.

Kubernetes storage volume plugins: from Flexvolume to CSI
Connection scheme for CIFS Shares in OpenShift. Flexvolume driver - right in the middle

Minimum set of methods looks like that:

flexvolume_driver mount # ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅Ρ‚ Π·Π° присоСдинСниС Ρ‚ΠΎΠΌΠ° ΠΊ pod'Ρƒ
# Π€ΠΎΡ€ΠΌΠ°Ρ‚ Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅ΠΌΠΎΠ³ΠΎ сообщСния:
{
  "status": "Success"/"Failure"/"Not supported",
  "message": "По ΠΊΠ°ΠΊΠΎΠΉ ΠΏΡ€ΠΈΡ‡ΠΈΠ½Π΅ Π±Ρ‹Π» Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π΅Π½ ΠΈΠΌΠ΅Π½Π½ΠΎ Ρ‚Π°ΠΊΠΎΠΉ статус",
}

flexvolume_driver unmount # ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅Ρ‚ Π·Π° отсоСдинСниС Ρ‚ΠΎΠΌΠ° ΠΎΡ‚ pod'Π°
# Π€ΠΎΡ€ΠΌΠ°Ρ‚ Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅ΠΌΠΎΠ³ΠΎ сообщСния:
{
  "status": "Success"/"Failure"/"Not supported",
  "message": "По ΠΊΠ°ΠΊΠΎΠΉ ΠΏΡ€ΠΈΡ‡ΠΈΠ½Π΅ Π±Ρ‹Π» Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π΅Π½ ΠΈΠΌΠ΅Π½Π½ΠΎ Ρ‚Π°ΠΊΠΎΠΉ статус",
}

flexvolume_driver init # ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅Ρ‚ Π·Π° ΠΈΠ½ΠΈΡ†ΠΈΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΡŽ ΠΏΠ»Π°Π³ΠΈΠ½Π°
# Π€ΠΎΡ€ΠΌΠ°Ρ‚ Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅ΠΌΠΎΠ³ΠΎ сообщСния:
{
  "status": "Success"/"Failure"/"Not supported",
  "message": "По ΠΊΠ°ΠΊΠΎΠΉ ΠΏΡ€ΠΈΡ‡ΠΈΠ½Π΅ Π±Ρ‹Π» Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π΅Π½ ΠΈΠΌΠ΅Π½Π½ΠΎ Ρ‚Π°ΠΊΠΎΠΉ статус",
  // ΠžΠΏΡ€Π΅Π΄Π΅Π»ΡΠ΅Ρ‚, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ Π»ΠΈ Π΄Ρ€Π°ΠΉΠ²Π΅Ρ€ ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ attach/deatach
  "capabilities":{"attach": True/False}
}

Using Methods attach ΠΈ detach defines the scenario in which the kubelet will act when the driver is called in the future. There are also special methods expandvolume ΠΈ expandfs, which are responsible for dynamically resizing the volume.

As an example of the changes that the method adds expandvolume, and with it the ability to perform real-time resizing of volumes, see our pull request in Rook Ceph Operator.

And here is an example of the implementation of the Flexvolume driver for working with NFS:

usage() {
    err "Invalid usage. Usage: "
    err "t$0 init"
    err "t$0 mount <mount dir> <json params>"
    err "t$0 unmount <mount dir>"
    exit 1
}

err() {
    echo -ne $* 1>&2
}

log() {
    echo -ne $* >&1
}

ismounted() {
    MOUNT=`findmnt -n ${MNTPATH} 2>/dev/null | cut -d' ' -f1`
    if [ "${MOUNT}" == "${MNTPATH}" ]; then
        echo "1"
    else
        echo "0"
    fi
}

domount() {
    MNTPATH=$1

    NFS_SERVER=$(echo $2 | jq -r '.server')
    SHARE=$(echo $2 | jq -r '.share')

    if [ $(ismounted) -eq 1 ] ; then
        log '{"status": "Success"}'
        exit 0
    fi

    mkdir -p ${MNTPATH} &> /dev/null

    mount -t nfs ${NFS_SERVER}:/${SHARE} ${MNTPATH} &> /dev/null
    if [ $? -ne 0 ]; then
        err "{ "status": "Failure", "message": "Failed to mount ${NFS_SERVER}:${SHARE} at ${MNTPATH}"}"
        exit 1
    fi
    log '{"status": "Success"}'
    exit 0
}

unmount() {
    MNTPATH=$1
    if [ $(ismounted) -eq 0 ] ; then
        log '{"status": "Success"}'
        exit 0
    fi

    umount ${MNTPATH} &> /dev/null
    if [ $? -ne 0 ]; then
        err "{ "status": "Failed", "message": "Failed to unmount volume at ${MNTPATH}"}"
        exit 1
    fi

    log '{"status": "Success"}'
    exit 0
}

op=$1

if [ "$op" = "init" ]; then
    log '{"status": "Success", "capabilities": {"attach": false}}'
    exit 0
fi

if [ $# -lt 2 ]; then
    usage
fi

shift

case "$op" in
    mount)
        domount $*
        ;;
    unmount)
        unmount $*
        ;;
    *)
        log '{"status": "Not supported"}'
        exit 0
esac

exit 1

So, after preparing the actual executable file, you need to upload driver to Kubernetes cluster. The driver must be located on each node of the cluster according to a predetermined path. The default was:

/usr/libexec/kubernetes/kubelet-plugins/volume/exec/имя_поставщика_Ρ…Ρ€Π°Π½ΠΈΠ»ΠΈΡ‰Π°~имя_Π΄Ρ€Π°ΠΉΠ²Π΅Ρ€Π°/

…but when using different Kubernetes distributions (OpenShift, Rancher…) the path may be different.

Flexvolume problems: how to cast a rod correctly?

Spreading the Flexvolume driver on the cluster nodes turned out to be a non-trivial task. Having done the operation once manually, it is easy to run into a situation where new nodes appear in the cluster: due to the addition of a new node, automatic horizontal scaling, or, worse, the replacement of a node due to a malfunction. In this case, work with the storage on these nodes should be performed impossibleuntil you manually add the Flexvolume driver to them.

The solution to this problem was one of the Kubernetes primitives - DaemonSet. When a new node appears in the cluster, a pod from our DaemonSet automatically appears on it, to which the local volume is attached along the way to find Flexvolume drivers. Upon successful creation, the pod copies the necessary files for the driver to work on disk.

Here is an example of such a DaemonSet for laying out a Flexvolume plugin:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: flex-set
spec:
  template:
    metadata:
      name: flex-deploy
      labels:
        app: flex-deploy
    spec:
      containers:
        - image: <deployment_image>
          name: flex-deploy
          securityContext:
              privileged: true
          volumeMounts:
            - mountPath: /flexmnt
              name: flexvolume-mount
      volumes:
        - name: flexvolume-mount
          hostPath:
            path: <host_driver_directory>

... and an example of a Bash script for uploading a Flexvolume driver:

#!/bin/sh

set -o errexit
set -o pipefail

VENDOR=k8s.io
DRIVER=nfs

driver_dir=$VENDOR${VENDOR:+"~"}${DRIVER}
if [ ! -d "/flexmnt/$driver_dir" ]; then
  mkdir "/flexmnt/$driver_dir"
fi

cp "/$DRIVER" "/flexmnt/$driver_dir/.$DRIVER"
mv -f "/flexmnt/$driver_dir/.$DRIVER" "/flexmnt/$driver_dir/$DRIVER"

while : ; do
  sleep 3600
done

It is important not to forget that the copy operation is not atomic. There is a good chance that the kubelet will start using the driver before its provisioning process is complete, causing an error in the system. The correct approach is to first copy the driver files under a different name, and then use an atomic rename operation.

Kubernetes storage volume plugins: from Flexvolume to CSI
Diagram of working with Ceph in the Rook statement: the Flexvolume driver in the diagram is inside the Rook agent

The next problem when using Flexvolume drivers is that for most storages on a cluster node the required software must be installed (for example, the ceph-common package for Ceph). The Flexvolume plugin was not originally designed to implement such complex systems.

The original solution to this problem can be seen in the Flexvolume driver implementation of the Rook operator:

The driver itself is made in the form of an RPC client. The IPC socket for communication lies in the same directory as the driver itself. We remember that it would be good to use DaemonSet to copy driver files, which connects the directory with the driver as a volume. After copying the necessary rook driver files, this pod does not die, but connects to the IPC socket through the attached volume as a full-fledged RPC server. The ceph-common package is already installed inside the pod container. The IPC socket gives confidence that the kubelet will communicate with the pod that is on the same node with it. All ingenious is simple! ..

Goodbye our sweet… in-tree plugins!

Kubernetes developers have found that the number of storage plugins inside the kernel is twenty. And a change in each of them somehow goes through the full release cycle of Kubernetes.

It turns out that in order to use the new version of the storage plugin, you need to update the entire cluster. In addition, you may be surprised that the new version of Kubernetes suddenly becomes incompatible with the Linux kernel you are using ... And therefore, you wipe your tears and grit your teeth, coordinate with the authorities and users when to update the Linux kernel and Kubernetes cluster. With possible downtime in the provision of services.

The situation is more than comical, don't you think? It became clear to the entire community that the approach was not working. In a willful decision, the Kubernetes developers announce that new plugins for working with storage will no longer be accepted into the core. In addition, as we already know, a number of flaws were identified in the implementation of the Flexvolume plugin ...

Once and for all, the last added plug-in for volumes in Kubernetes, CSI, was called upon to close the issue with persistent data stores. Its alpha version, more fully referred to as Out-of-Tree CSI Volume Plugins, was announced in the release Kubernetes 1.9.

Container Storage Interface, or spinning CSI 3000!

First of all, I would like to note that CSI is not just a volume plugin, but a real standard on creating custom components for working with data warehouses. Container orchestration systems such as Kubernetes and Mesos were supposed to "learn" to work with components implemented according to this standard. And now Kubernetes has already learned.

What is the device of the CSI plugin in Kubernetes? The CSI plugin works with special drivers (CSI drivers) written by third parties. A CSI driver in Kubernetes should at least consist of two components (pods):

  • Controller - manages external persistent storages. It is implemented as a gRPC server, for which the primitive is used StatefulSet.
  • Node - is responsible for mounting persistent storages to cluster nodes. It is also implemented as a gRPC server, but it uses a primitive DaemonSet.

Kubernetes storage volume plugins: from Flexvolume to CSI
Scheme of the CSI plugin in Kubernetes

You can learn about some other details of the work of CSI, for example, from the article "Understanding the C.S.I.Β» whose translation we published a year ago.

Advantages of this implementation

  • For basic things like registering a driver for a node, the Kubernetes developers have implemented a set of containers. You no longer need to generate a JSON response with capabilities yourself, as was done for the Flexvolume plugin.
  • Instead of β€œslipping” executable files onto the nodes, we now upload pods to the cluster. This is what we initially expect from Kubernetes: all processes take place inside containers deployed using Kubernetes primitives.
  • You no longer need to develop an RPC server and RPC client to implement complex drivers. The client was implemented by Kubernetes developers for us.
  • Passing arguments to work via the gRPC protocol is much more convenient, flexible and reliable than passing them through command line arguments. To understand how to add support for volume usage metrics to CSI by adding a standardized gRPC method, see our pull request for the vsphere-csi driver.
  • Communication takes place via IPC sockets, so as not to get confused whether the kubelet sent the request to the right pod.

Does this list remind you of anything? The benefits of CSI are solving those problems, which were not taken into account when developing the Flexvolume plugin.

Conclusions

CSI as a standard for implementing custom plug-ins for interacting with data warehouses has been very warmly received by the community. Moreover, due to their advantages and versatility, CSI drivers are even created for storages such as Ceph or AWS EBS, plugins for which were added in the very first version of Kubernetes.

Early 2019 in-tree plugins have been declared obsolete. It is planned to continue to support the Flexvolume plugin, but there will be no development of new functionality for it.

We ourselves already have experience in using ceph-csi, vsphere-csi and are ready to add to this list! So far, CSI is coping with the tasks assigned to it with a bang, but we'll wait and see.

Do not forget that everything new is a well-rethought old!

PS

Read also on our blog:

Source: habr.com

Add a comment