Rook is a "self-service" data store for Kubernetes

Rook is a "self-service" data store for Kubernetes

On January 29, the technical committee of the CNCF (Cloud Native Computing Foundation), behind Kubernetes, Prometheus and other Open Source products from the world of containers and cloud native, объявил on project acceptance Smoke into their ranks. A great opportunity to get to know this “distributed storage orchestrator in Kubernetes” better.

What is Rook?

Smoke is software written in Go (spreads under the free Apache License 2.0) designed to empower data warehouses with automated features that make them self-managed, self-scaling and self-healing. To do this, Rook automates (for data stores used in a Kubernetes environment): deployment, bootstrapping, configuration, provisioning, scaling, upgrades, migrations, failover, monitoring and resource management.

The project is in the alpha stage and specializes in orchestrating the Ceph distributed storage system in Kubernetes clusters. The authors also announce plans to support other storage systems, but this will not happen in the next releases.

Components and technical device

Rook inside Kubernetes is based on a special operator (we wrote more about Kubernetes Operators in this article), which automates storage configuration and implements storage monitoring.

So, Rook operator is represented by a container that contains everything necessary for the deployment and subsequent maintenance of the repository. Operator responsibilities include:

  • creating a DaemonSet for the Ceph storage daemons (ceph-osd) with a simple RADOS cluster;
  • creating pods for Ceph monitoring ceph-mon, checking the state of the cluster; for quorum, in most cases, three instances are unrolled, and when any one of them falls, a new one is raised);
  • management of CRDs (Custom Resource Definitions) for the most cluster, storage pools, object stores (sets of resources and services for serving HTTP requests that perform PUT / GET for objects - they are compatible with S3 and Swift API), file systems;
  • initialization of pods to start all the necessary services;
  • creating Rook agents.

Rook Agents are represented by separate pods that are deployed on each Kubernetes node. Agent Purpose - Plugin Configuration FlexVolume, which provides support for storage volumes in Kubernetes. The agent implements storage operation: connects network storage devices, mounts volumes, formats the file system, etc.

Rook is a "self-service" data store for Kubernetes
The place and role of Rook components in the general scheme of the Kubernetes cluster

Rook offers three types of storage:

  1. block (Block, StorageClass) - mounts the repository to a single pod;
  2. object (Object, ObjectStore) - available inside and outside the Kubernetes cluster (via S3 API);
  3. shared file system (Shared File System, Filesystem) is a file system that can be read/write mounted from multiple pods.

The Rook indoor unit includes:

  • Mons - pods for monitoring Ceph (with the already mentioned ceph-mon);
  • OSDs - pods with ceph-osd daemons (Object Storage Daemons);
  • M.G.R. - pods with a demon ceph-mgr (Ceph Manager), which provides additional monitoring capabilities and interfaces for external systems (monitoring / management);
  • RG extension (Optional) - pods with object storage;
  • MDS (Optional) - pods with a shared FS.

Rook is a "self-service" data store for Kubernetes

All Rook daemons (Mons, OSDs, MGR, RGW, MDS) are compiled into a single binary (rook) running in a container.

For a brief presentation of the Rook project, this short one (12 slides) may also be useful. presentation from Bassam Tabbara (CTO at Quantum Corp).

Rook operation

The Rook operator fully supports Kubernetes version 1.6 and higher (and, in part, the older release of K8s - 1.5.2). Him installation в the simplest scenario looks like that:

cd cluster/examples/kubernetes
kubectl create -f rook-operator.yaml
kubectl create -f rook-cluster.yaml

In addition, for the Rook operator prepared Helm chart, thanks to which the installation can be carried out like this:

helm repo add rook-alpha https://charts.rook.io/alpha
helm install rook-alpha/rook

There is a small amount setting options (for example, you can disable support RBAC, if this feature is not used in your cluster), which are passed to helm install via parameter --set key=value[,key=value] (or store in a separate YAML file, and transfer via -f values.yaml).

After installing the Rook operator and starting the pods with its agents, it remains to create the Rook cluster itself, the simplest configuration of which is as follows (rook-cluster.yaml):

apiVersion: v1
kind: Namespace
metadata:
  name: rook
---
apiVersion: rook.io/v1alpha1
kind: Cluster
metadata:
  name: rook
  namespace: rook
spec:
  dataDirHostPath: /var/lib/rook
  storage:
    useAllNodes: true
    useAllDevices: false
    storeConfig:
      storeType: bluestore
      databaseSizeMB: 1024
      journalSizeMB: 1024

Note: pay special attention to the attribute dataDirHostPath, the correct value of which is necessary to save the cluster after reboots. For cases where it is used as a permanent storage location for Rook data on Kubernetes hosts, the authors recommend having at least 5 GB of free disk space in this directory.

It remains to actually create a cluster from the configuration and make sure that the pods were created in the cluster (in the namespace rook):

kubectl create -f rook-cluster.yaml
kubectl -n rook get pod
NAME                              READY     STATUS    RESTARTS   AGE
rook-api-1511082791-7qs0m         1/1       Running   0          5m
rook-ceph-mgr0-1279756402-wc4vt   1/1       Running   0          5m
rook-ceph-mon0-jflt5              1/1       Running   0          6m
rook-ceph-mon1-wkc8p              1/1       Running   0          6m
rook-ceph-mon2-p31dj              1/1       Running   0          6m
rook-ceph-osd-0h6nb               1/1       Running   0          5m

Upgrade Rook cluster (to new version) is a procedure that at this stage requires one-by-one updating of all its components in a certain sequence, and you can start it only after you have made sure that the current Rook installation is in a completely “healthy” state. Detailed step-by-step instructions using the example of updating Rook version 0.5.0 to 0.5.1 can be found in project documentation.

Last November on the Rook blog was published comparison productivity with EBS. His results are noteworthy, and if quite briefly, they are as follows:

Rook is a "self-service" data store for Kubernetes
Rook is a "self-service" data store for Kubernetes

Prospects

Rook's current status is alpha, and the latest major release to date is Version 0.6, released in November 2017 (the current fix is v0.6.2 - came out December 14). Already in the first half of 2018, releases of more mature versions are expected: beta and stable (officially ready for use in production).

According to roadmap project, the developers have a detailed vision for the development of Rook in at least two next releases: 0.7 (its readiness in the GitHub tracker estimated like 60%) and 0.8. Among the expected changes are the transfer of support for Ceph Block and Ceph Object to beta status, dynamic provisioning of volumes for CephFS, an advanced logging system, automated cluster updates, support for snapshots for volumes.

Rook's acceptance in number CNCF projects (so far at the very early stage - "inception-level", - on a par with linkerd и CoreDNS) is a kind of guarantee of growing interest in the product. How much it will gain a foothold in the world of cloud applications will become better clear after the appearance of stable versions, which will certainly bring new "testers" and users to Rook.

P.S.

Read also on our blog:

Source: habr.com

Buy reliable hosting for sites with DDoS protection, VPS VDS servers 🔥 Buy reliable website hosting with DDoS protection, VPS VDS servers | ProHoster