Rook or not Rook, that is the question

Rook or not Rook, that is the question

At the beginning of this month, on May 3, a major release of the "management system for distributed data storage in Kubernetes" was announced - Rook 1.0.0. Over a year ago we published general overview of Rook. Then we were asked to talk about his experience use in practice - and now, just in time for such a significant milestone in the history of the project, we are happy to share our impressions.

In short, Rook is a set operators for Kubernetes, which take full control of the deployment, management, automatic recovery of storage solutions such as Ceph, EdgeFS, Minio, Cassandra, CockroachDB.

Currently the most developed (and the only Π² stable stage) the solution is rook-ceph-operator.

Note: Among the significant changes in the release of Rook 1.0.0 related to Ceph, we can note support for Ceph Nautilus and the ability to use NFS for CephFS or RGW buckets. Of the others, the β€œmaturation” of EdgeFS support to the beta level stands out.

So, in this article we:

  • answer the question, what advantages do we see in using Rook to deploy Ceph in a Kubernetes cluster;
  • share experience and impressions from using Rook in production;
  • We will tell you why we say β€œYes!” to Rook, and about our plans for him.

Let's start with general concepts and theory.

"I have one Rook advantage!" (unknown chess player)

Rook or not Rook, that is the question

One of the main advantages of Rook is that interaction with data stores is carried out through Kubernetes mechanisms. This means that you no longer need to copy commands to configure Ceph from a piece of paper to the console.

- Do you want to deploy CephFS in a cluster? Just write a YAML file!
- What? Do you want to deploy an object store with S3 API as well? Just write a second YAML file!

Rook was created according to all the rules of a typical operator. It interacts with CRD (Custom Resource Definitions), in which we describe the characteristics of Ceph entities we need (since this is the only stable implementation, the article will talk about Ceph by default, unless otherwise specified). According to the set parameters, the operator will automatically execute the commands necessary for setting up.

Let's look at the specifics using the example of creating an Object Store, or rather - CephObjectStoreUser.

apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
  name: {{ .Values.s3.crdName }}
  namespace: kube-rook
spec:
  metadataPool:
    failureDomain: host
    replicated:
      size: 3
  dataPool:
    failureDomain: host
    erasureCoded:
      dataChunks: 2
      codingChunks: 1
  gateway:
    type: s3
    sslCertificateRef:
    port: 80
    securePort:
    instances: 1
    allNodes: false
---
apiVersion: ceph.rook.io/v1
kind: CephObjectStoreUser
metadata:
  name: {{ .Values.s3.crdName }}
  namespace: kube-rook
spec:
  store: {{ .Values.s3.crdName }}
  displayName: {{ .Values.s3.username }}

The parameters indicated in the listing are quite standard and hardly need comments, but you should pay special attention to those that are allocated to template variables.

The general scheme of work boils down to the fact that through the YAML file we β€œorder” resources, for which the operator executes the necessary commands and returns us a β€œnot real” secret with which we can continue to work (see below). And from the variables that are indicated above, the command and the name of the secret will be compiled.

What is this team? When creating a user for object storage, the Rook statement inside the pod will do the following:

radosgw-admin user create --uid="rook-user" --display-name="{{ .Values.s3.username }}"

The result of this command will be a JSON structure:

{
    "user_id": "rook-user",
    "display_name": "{{ .Values.s3.username }}",
    "keys": [
        {
           "user": "rook-user",
           "access_key": "NRWGT19TWMYOB1YDBV1Y",
           "secret_key": "gr1VEGIV7rxcP3xvXDFCo4UDwwl2YoNrmtRlIAty"
        }
    ],
    ...
}

Keys - what applications will need in the future to access object storage through the S3 API. Rook-operator kindly selects them and puts them in his namespace as a secret with the name rook-ceph-object-user-{{ $.Values.s3.crdName }}-{{ $.Values.s3.username }}.

To use the data from this secret, just add it to the container as environment variables. As an example, I will give a template for Job, in which we automatically create buckets for each user environment:

{{- range $bucket := $.Values.s3.bucketNames }}
apiVersion: batch/v1
kind: Job
metadata:
  name: create-{{ $bucket }}-bucket-job
  annotations:
    "helm.sh/hook": post-install
    "helm.sh/hook-weight": "2"
spec:
  template:
    metadata:
      name: create-{{ $bucket }}-bucket-job
    spec:
      restartPolicy: Never
      initContainers:
      - name: waitdns
        image: alpine:3.6
        command: ["/bin/sh", "-c", "while ! getent ahostsv4 rook-ceph-rgw-{{ $.Values.s3.crdName }}; do sleep 1; done" ]
      - name: config
        image: rook/ceph:v1.0.0
        command: ["/bin/sh", "-c"]
        args: ["s3cmd --configure --access_key=$(ACCESS-KEY) --secret_key=$(SECRET-KEY) -s --no-ssl --dump-config | tee /config/.s3cfg"]
        volumeMounts:
        - name: config
          mountPath: /config
        env:
        - name: ACCESS-KEY
          valueFrom:
            secretKeyRef:
              name: rook-ceph-object-user-{{ $.Values.s3.crdName }}-{{ $.Values.s3.username }}
              key: AccessKey
        - name: SECRET-KEY
          valueFrom:
            secretKeyRef:
              name: rook-ceph-object-user-{{ $.Values.s3.crdName }}-{{ $.Values.s3.username }}
              key: SecretKey
      containers:
      - name: create-bucket
        image: rook/ceph:v1.0.0
        command: 
        - "s3cmd"
        - "mb"
        - "--host=rook-ceph-rgw-{{ $.Values.s3.crdName }}"
        - "--host-bucket= "
        - "s3://{{ $bucket }}"
        ports:
        - name: s3-no-sll
          containerPort: 80
        volumeMounts:
        - name: config
          mountPath: /root
      volumes:
      - name: config
        emptyDir: {}
---
{{- end }}

All actions listed in this Job were performed without going beyond Kubernetes. The structures described in the YAML files are folded into a Git repository and reused many times. We see this as a huge plus for DevOps engineers and the CI / CD process as a whole.

With Rook and Rados in joy

Using the Ceph + RBD bundle imposes certain restrictions on mounting volumes to pods.

In particular, the namespace must contain the secret to access Ceph so that stateful applications can function. It's ok if you have 2-3 environments in your namespaces: you can go ahead and copy the secret by hand. But what if a separate environment with its own namespace is created for each feature for developers?

We have solved this problem by using shell-operator, which automatically copied secrets to new namespaces (an example of such a hook is described in this article).

#! /bin/bash

if [[ $1 == β€œ--config” ]]; then
   cat <<EOF
{"onKubernetesEvent":[
 {"name": "OnNewNamespace",
  "kind": "namespace",
  "event": ["add"]
  }
]}
EOF
else
    NAMESPACE=$(kubectl get namespace -o json | jq '.items | max_by( .metadata.creationTimestamp ) | .metadata.name')
    kubectl -n ${CEPH_SECRET_NAMESPACE} get secret ${CEPH_SECRET_NAME} -o json | jq ".metadata.namespace="${NAMESPACE}"" | kubectl apply -f -
fi

However, when using Rook, this problem simply does not exist. The mounting process takes place using proprietary drivers based on Flexvolume or CSI (still in beta stage) and therefore does not require secrets.

Rook automatically solves many problems, which is what encourages us to use it in new projects.

Siege of Rook

Let's finish the practical part by deploying Rook and Ceph so that we can conduct our own experiments. In order to make it easier to storm this impregnable tower, the developers have prepared a Helm-package. Let's download it:

$ helm fetch rook-master/rook-ceph --untar --version 1.0.0

In file rook-ceph/values.yaml you can find many different settings. The most important thing is to specify the tolerations for agents and search. What you can use the taints / tolerances mechanism for, we described in detail in this article.

In short, we don't want the client application pods to be on the same nodes as the storage disks. The reason is simple: so the work of Rook agents will not affect the application itself.

So let's open the file. rook-ceph/values.yaml with your favorite editor and add the following block to the end:

discover:
  toleration: NoExecute
  tolerationKey: node-role/storage
agent:
  toleration: NoExecute
  tolerationKey: node-role/storage
  mountSecurityMode: Any

For each node reserved for data storage, add the corresponding taint:

$ kubectl taint node ${NODE_NAME} node-role/storage="":NoExecute

Then install the Helm-chart with the command:

$ helm install --namespace ${ROOK_NAMESPACE} ./rook-ceph

Now you need to create a cluster and specify the location OSD:

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  clusterName: "ceph"
  finalizers:
  - cephcluster.ceph.rook.io
  generation: 1
  name: rook-ceph
spec:
  cephVersion:
    image: ceph/ceph:v13
  dashboard:
    enabled: true
  dataDirHostPath: /var/lib/rook/osd
  mon:
    allowMultiplePerNode: false
    count: 3
  network:
    hostNetwork: true
  rbdMirroring:
    workers: 1
  placement:
    all:
      tolerations:
      - key: node-role/storage
        operator: Exists
  storage:
    useAllNodes: false
    useAllDevices: false
    config:
      osdsPerDevice: "1"
      storeType: filestore
    resources:
      limits:
        memory: "1024Mi"
      requests:
        memory: "1024Mi"
    nodes:
    - name: host-1
      directories:
      - path: "/mnt/osd"
    - name: host-2
      directories:
      - path: "/mnt/osd"
    - name: host-3
      directories:
      - path: "/mnt/osd"

Checking the status of Ceph - we expect to see HEALTH_OK:

$ kubectl -n ${ROOK_NAMESPACE} exec $(kubectl -n ${ROOK_NAMESPACE} get pod -l app=rook-ceph-operator -o name -o jsonpath='{.items[0].metadata.name}') -- ceph -s

At the same time, we will check that the pods with the client application do not fall on the nodes reserved for Ceph:

$ kubectl -n ${APPLICATION_NAMESPACE} get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName

Further optional components are configured as desired. More details about them are in documentation. For administration, we strongly recommend installing dashboard and toolbox.

Rook and hooks: is Rook enough for everything?

As you can see, the development of Rook is in full swing. But there are still problems that do not allow us to completely abandon the manual configuration of Ceph:

  • No driver Rook can not export metrics on the use of mounted blocks, which deprives us of monitoring.
  • Flexvolume and CSI do not know how change the size of volumes (unlike the same RBD), so Rook loses a useful (and sometimes critical!) tool.
  • Rook is still not as flexible as regular Ceph. If we want to configure the pool for CephFS metadata to be stored on the SSD, and the data itself to be stored on the HDD, we will need to register separate device groups in CRUSH maps manually.
  • Although rook-ceph-operator is considered stable, there are currently some issues when upgrading Ceph from version 13 to 14.

Conclusions

β€œNow the Rook is closed off from the outside world by pawns, but we believe that one day it will play a decisive role in the game!” (quote made specifically for this article)

The Rook project has undoubtedly won our hearts - we believe that [with all its pluses and minuses] it definitely deserves your attention.

Our future plans are to make rook-ceph a module for addon operator, which will make its use in our numerous Kubernetes clusters even easier and more convenient.

PS

Read also on our blog:

Source: habr.com

Add a comment