A practical example of connecting Ceph-based storage to a Kubernetes cluster

Container Storage Interface (CSI) is a unified interface between Kubernetes and storage systems. Briefly about him we already told, and today we will take a closer look at the combination of CSI and Ceph: we will show how mount Ceph storage to the Kubernetes cluster.
The article contains real, albeit slightly simplified for ease of perception examples. We do not consider installation and configuration of Ceph and Kubernetes clusters.

Are you wondering how it works?

A practical example of connecting Ceph-based storage to a Kubernetes cluster

So, you have a Kubernetes cluster at hand, deployed, for example, kubespray. A Ceph cluster works nearby - you can also put it, for example, with this set of playbooks. I hope I don't need to mention that for production, there must be a network between them with a bandwidth of at least 10 Gb / s.

If you have all this, let's go!

First, let's go to one of the nodes of the Ceph cluster and check that everything is in order:

ceph health
ceph -s

Next, we will immediately create a pool for RBD disks:

ceph osd pool create kube 32
ceph osd pool application enable kube rbd

Let's go to the Kubernetes cluster. There, first of all, we will install the Ceph CSI driver for RBD. We will install, as expected, through Helm.
Add a repository with a chart, get a set of ceph-csi-rbd chart variables:

helm repo add ceph-csi https://ceph.github.io/csi-charts
helm inspect values ceph-csi/ceph-csi-rbd > cephrbd.yml

Now you need to fill in the cephrbd.yml file. To do this, we find out the cluster ID and IP addresses of monitors in Ceph:

ceph fsid  # Ρ‚Π°ΠΊ ΠΌΡ‹ ΡƒΠ·Π½Π°Π΅ΠΌ clusterID
ceph mon dump  # Π° Ρ‚Π°ΠΊ ΡƒΠ²ΠΈΠ΄ΠΈΠΌ IP-адрСса ΠΌΠΎΠ½ΠΈΡ‚ΠΎΡ€ΠΎΠ²

The resulting values ​​are entered into the cephrbd.yml file. Along the way, we enable the creation of PSP policies (Pod Security Policies). Section Options node plugin ΠΈ provisioner already in the file, they can be corrected as shown below:

csiConfig:
  - clusterID: "bcd0d202-fba8-4352-b25d-75c89258d5ab"
    monitors:
      - "v2:172.18.8.5:3300/0,v1:172.18.8.5:6789/0"
      - "v2:172.18.8.6:3300/0,v1:172.18.8.6:6789/0"
      - "v2:172.18.8.7:3300/0,v1:172.18.8.7:6789/0"

nodeplugin:
  podSecurityPolicy:
    enabled: true

provisioner:
  podSecurityPolicy:
    enabled: true

Then all that remains for us is to install the chart in the Kubernetes cluster.

helm upgrade -i ceph-csi-rbd ceph-csi/ceph-csi-rbd -f cephrbd.yml -n ceph-csi-rbd --create-namespace

Great, the RBD driver works!
Let's create a new StorageClass in Kubernetes. This again requires some work with Ceph.

Create a new user in Ceph and give him permission to write to the pool be:

ceph auth get-or-create client.rbdkube mon 'profile rbd' osd 'profile rbd pool=kube'

And now let's see the access key is still there:

ceph auth get-key client.rbdkube

The command will output something like this:

AQCO9NJbhYipKRAAMqZsnqqS/T8OYQX20xIa9A==

Let's put this value in Secret in the Kubernetes cluster - where we need it userKey:

---
apiVersion: v1
kind: Secret
metadata:
  name: csi-rbd-secret
  namespace: ceph-csi-rbd
stringData:
  # ЗначСния ΠΊΠ»ΡŽΡ‡Π΅ΠΉ ΡΠΎΠΎΡ‚Π²Π΅Ρ‚ΡΡ‚Π²ΡƒΡŽΡ‚ ΠΈΠΌΠ΅Π½ΠΈ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»Ρ ΠΈ Π΅Π³ΠΎ ΠΊΠ»ΡŽΡ‡Ρƒ, ΠΊΠ°ΠΊ ΡƒΠΊΠ°Π·Π°Π½ΠΎ Π²
  # кластСрС Ceph. ID ΡŽΠ·Π΅Ρ€Π° Π΄ΠΎΠ»ΠΆΠ΅Π½ ΠΈΠΌΠ΅Ρ‚ΡŒ доступ ΠΊ ΠΏΡƒΠ»Ρƒ,
  # ΡƒΠΊΠ°Π·Π°Π½Π½ΠΎΠΌΡƒ Π² storage class
  userID: rbdkube
  userKey: <user-key>

And we create our secret:

kubectl apply -f secret.yaml

Next, we need something like this StorageClass manifest:

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: csi-rbd-sc
provisioner: rbd.csi.ceph.com
parameters:
   clusterID: <cluster-id>
   pool: kube

   imageFeatures: layering

   # Π­Ρ‚ΠΈ сСкрСты Π΄ΠΎΠ»ΠΆΠ½Ρ‹ ΡΠΎΠ΄Π΅Ρ€ΠΆΠ°Ρ‚ΡŒ Π΄Π°Π½Π½Ρ‹Π΅ для Π°Π²Ρ‚ΠΎΡ€ΠΈΠ·Π°Ρ†ΠΈΠΈ
   # Π² ваш ΠΏΡƒΠ».
   csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
   csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-rbd
   csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
   csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-rbd
   csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
   csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-rbd

   csi.storage.k8s.io/fstype: ext4

reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  - discard

Need to fill clusterID, which we have already learned by the team ceph fsid, and apply this manifest on the Kubernetes cluster:

kubectl apply -f storageclass.yaml

To check the operation of clusters in a bundle, let's create the following PVC (Persistent Volume Claim):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-rbd-sc

Let's immediately see how Kubernetes created the requested volume in Ceph:

kubectl get pvc
kubectl get pv

Everything seems to be great! And what does it look like on the Ceph side?
We get a list of volumes in the pool and view information about our volume:

rbd ls -p kube
rbd -p kube info csi-vol-eb3d257d-8c6c-11ea-bff5-6235e7640653  # Ρ‚ΡƒΡ‚, ΠΊΠΎΠ½Π΅Ρ‡Π½ΠΎ ΠΆΠ΅, Π±ΡƒΠ΄Π΅Ρ‚ Π΄Ρ€ΡƒΠ³ΠΎΠΉ ID Ρ‚ΠΎΠΌΠ°, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π²Ρ‹Π΄Π°Π»Π° прСдыдущая ΠΊΠΎΠΌΠ°Π½Π΄Π°

Now let's see how RBD volume resizing works.
Change the volume size in the pvc.yaml manifest to 2Gi and apply it:

kubectl apply -f pvc.yaml

Let's wait for the changes to take effect and take another look at the volume size.

rbd -p kube info csi-vol-eb3d257d-8c6c-11ea-bff5-6235e7640653

kubectl get pv
kubectl get pvc

We see that the size of the PVC has not changed. You can ask Kubernetes for a description of the PVC in YAML format to find out why:

kubectl get pvc rbd-pvc -o yaml

And here is the problem:

message: Waiting for user to (re-)start a pod to finish file system resize of volume on node. type: FileSystemResizePending

That is, the disk has grown, but the file system on it has not.
To expand the file system, you need to mount the volume. In our case, the created PVC / PV is not used in any way now.

We can create a test Pod like this:

---
apiVersion: v1
kind: Pod
metadata:
  name: csi-rbd-demo-pod
spec:
  containers:
    - name: web-server
      image: nginx:1.17.6
      volumeMounts:
        - name: mypvc
          mountPath: /data
  volumes:
    - name: mypvc
      persistentVolumeClaim:
        claimName: rbd-pvc
        readOnly: false

And now let's look at PVC:

kubectl get pvc

The size has changed, everything is in order.

In the first part, we worked with the RBD block device (it stands for Rados Block Device), but this cannot be done if you want to work with this disk of different microservices at the same time. For working with files, rather than with a disk image, CephFS is much better suited.
Using the example of Ceph and Kubernetes clusters, we will configure CSI and other necessary entities to work with CephFS.

Let's get the values ​​from the new Helm chart we need:

helm inspect values ceph-csi/ceph-csi-cephfs > cephfs.yml

Again, you need to fill in the cephfs.yml file. As before, the Ceph commands will help:

ceph fsid
ceph mon dump

We fill the file with values ​​like this:

csiConfig:
  - clusterID: "bcd0d202-fba8-4352-b25d-75c89258d5ab"
    monitors:
      - "172.18.8.5:6789"
      - "172.18.8.6:6789"
      - "172.18.8.7:6789"

nodeplugin:
  httpMetrics:
    enabled: true
    containerPort: 8091
  podSecurityPolicy:
    enabled: true

provisioner:
  replicaCount: 1
  podSecurityPolicy:
    enabled: true

Note that monitor addresses are specified in the simple form address:port. To mount cephfs on the host, these addresses are passed to a kernel module that does not yet know how to work with the monitor protocol v2.
We change the port for httpMetrics (Prometheus will go there for monitoring metrics) so that it does not conflict with nginx-proxy, which is installed by Kubespray. You may not need this.

Install Helm chart in Kubernetes cluster:

helm upgrade -i ceph-csi-cephfs ceph-csi/ceph-csi-cephfs -f cephfs.yml -n ceph-csi-cephfs --create-namespace

Let's move on to the Ceph data store to create a separate user there. The documentation states that the CephFS provider needs cluster administrator access rights. But we will create a separate user fs with limited rights:

ceph auth get-or-create client.fs mon 'allow r' mgr 'allow rw' mds 'allow rws' osd 'allow rw pool=cephfs_data, allow rw pool=cephfs_metadata'

And immediately let's see its access key, it will be useful to us further:

ceph auth get-key client.fs

Let's create separate Secret and StorageClass.
Nothing new, we have already seen this with the example of RBD:

---
apiVersion: v1
kind: Secret
metadata:
  name: csi-cephfs-secret
  namespace: ceph-csi-cephfs
stringData:
  # НСобходимо для динамичСски создаваСмых Ρ‚ΠΎΠΌΠΎΠ²
  adminID: fs
  adminKey: <Π²Ρ‹Π²ΠΎΠ΄ ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰Π΅ΠΉ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹>

Applying the manifest:

kubectl apply -f secret.yaml

And now - a separate StorageClass:

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-cephfs-sc
provisioner: cephfs.csi.ceph.com
parameters:
  clusterID: <cluster-id>

  # Имя Ρ„Π°ΠΉΠ»ΠΎΠ²ΠΎΠΉ систСмы CephFS, Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΉ Π±ΡƒΠ΄Π΅Ρ‚ создан Ρ‚ΠΎΠΌ
  fsName: cephfs

  # (Π½Π΅ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ) ΠŸΡƒΠ» Ceph, Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ Π±ΡƒΠ΄ΡƒΡ‚ Ρ…Ρ€Π°Π½ΠΈΡ‚ΡŒΡΡ Π΄Π°Π½Π½Ρ‹Π΅ Ρ‚ΠΎΠΌΠ°
  # pool: cephfs_data

  # (Π½Π΅ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ) Π Π°Π·Π΄Π΅Π»Π΅Π½Π½Ρ‹Π΅ запятыми ΠΎΠΏΡ†ΠΈΠΈ монтирования для Ceph-fuse
  # Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€:
  # fuseMountOptions: debug

  # (Π½Π΅ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ) Π Π°Π·Π΄Π΅Π»Π΅Π½Π½Ρ‹Π΅ запятыми ΠΎΠΏΡ†ΠΈΠΈ монтирования CephFS для ядра
  # Π‘ΠΌ. man mount.ceph Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΡƒΠ·Π½Π°Ρ‚ΡŒ список этих ΠΎΠΏΡ†ΠΈΠΉ. НапримСр:
  # kernelMountOptions: readdir_max_bytes=1048576,norbytes

  # Π‘Π΅ΠΊΡ€Π΅Ρ‚Ρ‹ Π΄ΠΎΠ»ΠΆΠ½Ρ‹ ΡΠΎΠ΄Π΅Ρ€ΠΆΠ°Ρ‚ΡŒ доступы для Π°Π΄ΠΌΠΈΠ½Π° ΠΈ/ΠΈΠ»ΠΈ ΡŽΠ·Π΅Ρ€Π° Ceph.
  csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-cephfs
  csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-cephfs
  csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-cephfs

  # (Π½Π΅ΠΎΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ) Π”Ρ€Π°ΠΉΠ²Π΅Ρ€ ΠΌΠΎΠΆΠ΅Ρ‚ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Π»ΠΈΠ±ΠΎ ceph-fuse (fuse), 
  # Π»ΠΈΠ±ΠΎ ceph kernelclient (kernel).
  # Если Π½Π΅ ΡƒΠΊΠ°Π·Π°Π½ΠΎ, Π±ΡƒΠ΄Π΅Ρ‚ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒΡΡ ΠΌΠΎΠ½Ρ‚ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ Ρ‚ΠΎΠΌΠΎΠ² ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ,
  # это опрСдСляСтся поиском ceph-fuse ΠΈ mount.ceph
  # mounter: kernel
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  - debug

Fill in here clusterID and applicable in Kubernetes:

kubectl apply -f storageclass.yaml

inspection

To check, as in the previous example, let's create a PVC:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-cephfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: csi-cephfs-sc

And check for PVC/PV:

kubectl get pvc
kubectl get pv

If you want to look at the files and directories in CephFS, you can mount this file system somewhere. For example, as shown below.

We go to one of the nodes of the Ceph cluster and perform the following actions:

# Π’ΠΎΡ‡ΠΊΠ° монтирования
mkdir -p /mnt/cephfs

# Π‘ΠΎΠ·Π΄Π°Ρ‘ΠΌ Ρ„Π°ΠΉΠ» с ΠΊΠ»ΡŽΡ‡ΠΎΠΌ администратора
ceph auth get-key client.admin >/etc/ceph/secret.key

# ДобавляСм запись Π² /etc/fstab
# !! ИзмСняСм ip адрСс Π½Π° адрСс нашСго ΡƒΠ·Π»Π°
echo "172.18.8.6:6789:/ /mnt/cephfs ceph name=admin,secretfile=/etc/ceph/secret.key,noatime,_netdev    0       2" >> /etc/fstab

mount /mnt/cephfs

Of course, such a FS mount on a Ceph node is only suitable for learning purposes, which is what we do on our Slurm courses. I don't think anyone would do this in production, there's a big risk of accidentally overwriting important files.

And finally, let's check how things work with volume resizing in the case of CephFS. We return to Kubernetes and edit our manifest for PVC - we increase the size there, for example, to 7Gi.

Apply the edited file:

kubectl apply -f pvc.yaml

Let's see how the quota has changed on the mounted directory:

getfattr -n ceph.quota.max_bytes <ΠΊΠ°Ρ‚Π°Π»ΠΎΠ³-с-Π΄Π°Π½Π½Ρ‹ΠΌΠΈ>

For this command to work, you may need to install the package attr.

The eyes are afraid, and the hands do

On the surface, all these spells and long YAML manifests seem complicated, but in practice, Slurm students deal with them pretty quickly.
In this article, we did not delve into the wilds - there is official documentation for this. If you are interested in the details of setting up Ceph storage with a Kubernetes cluster, these links will help:

General principles of Kubernetes with volumes
RBD Documentation
RBD and Kubernetes integration from Ceph point of view
RBD and Kubernetes integration from a CSI point of view
General CephFS Documentation
CephFS and Kubernetes integration from a CSI point of view

On course Slurm Kubernetes Base you can go a step further and deploy a real application in Kubernetes that will use CephFS as storage for files. Through GET/POST requests, you will be able to transfer files to and receive them from Ceph.

And if you are more interested in data storage, then sign up for new Ceph course. While the beta test is running, the course can be obtained at a discount and influence its content.

Article author: Alexander Shvalov, practicing engineer Southbridge, Certified Kubernetes Administrator, author and developer of Slurm courses.

Source: habr.com