🥇Storage in Kubernetes: OpenEBS vs Rook (Ceph) vs Rancher Longhorn vs StorageOS vs Robin vs Portworx vs Linstor

Update!. In the comments one of the readers suggested to try Linstor (maybe he's working on it himself), so I've added a section about that solution. I also wrote post on how to install itbecause the process is very different from the rest.

To be honest, I gave up and gave up Kubernetes (at least for now). I will use Heroku. Why? Because of storage! Who would have thought that I would be messing with storage more than with Kubernetes itself. I use Hetzner cloud, because it is inexpensive and the performance is good, and from the very beginning I deployed clusters with Rancher. I haven't tried Kubernetes managed services from Google/Amazon/Microsoft/DigitalOcean etc., etc., because I wanted to learn everything myself. I'm also frugal.

So - yes, I spent a lot of time trying to decide which storage to choose when I was considering a possible stack on Kubernetes. I prefer open source solutions, and not only because of the price, but I looked at a couple of paid options out of curiosity, because they have free versions with restrictions. I jotted down a few numbers from recent benchmarks when I was comparing different options, and they may be of interest to those studying storage in Kubernetes. Although personally I have said goodbye to Kubernetes so far. I also want to mention CSI driver, in which you can directly provision Hetzner Cloud volumes, but I have not tried it yet. I was looking into cloud software-defined storage because I needed replication and the ability to quickly mount persistent volumes on any node, especially in case of node failures and other similar situations. Some solutions offer point-in-time snapshots and off-site backups, which is handy.

I tested 6-7 storage solutions:

OpenEBS

As I already said in a previous post, having tested most of the options from the list, I initially settled on OpenEBS. OpenEBS is very easy to install and use, but to be honest, after testing with real data under load, its performance disappointed me. This is open source, and developers are on their own Slack channel always very helpful when I needed help. Unfortunately, it has very poor performance compared to other options, so I had to run the tests again. Right now OpenEBS has 3 storage engine, but I'm posting benchmark results for cStor. I don't have numbers for Jiva and LocalPV yet.

In a nutshell, Jiva is slightly faster, and LocalPV is generally fast, no worse than the drive benchmark directly. The problem with LocalPV is that the volume can only be accessed on the node where it was provisioned, and there is no replication at all. I had some problems with restoring a backup via Sailboat on the new cluster because the node names were different. Speaking of backups, cStor has plugin for Velero, with which you can make off-site point-in-time snapshot backups, which is more convenient than file-level backups with Velero-Restic. I wrote multiple scriptsto make it easier to manage backups and restores with this plugin. Overall, I really like OpenEBS, but its performance...

Smoke

Rook is also open source, and differs from the rest of the options on the list in that it is a storage orchestrator that performs complex storage management tasks with different backends, for example front, EdgeFS and others, which greatly simplifies the work. I had problems with EfgeFS when I tried it a few months ago, so I tested mainly with Ceph. Ceph offers not only block storage, but also object storage compatible with S3/Swift and distributed file system. What I like about Ceph is the ability to spread volume data across multiple disks so that the volume can use more disk space than can fit on one disk. It's comfortable. Another cool feature is that when you add disks to the cluster, it automatically redistributes data across all disks.

Ceph has snapshots, but as far as I know they can't be used directly in Rook/Kubernetes. Admittedly, I didn't delve into it. But there are no off site backups, so you have to use something with Velero / Restic, but there are only file-level backups, not point-in-time snapshots. What I really like about Rook, however, is the ease of working with Ceph - it hides almost all the complex stuff and offers tools to talk directly to Ceph for troubleshooting. Unfortunately, on the stress test of Ceph volumes, I always had this problem, which causes Ceph to become unstable. It is not yet clear whether this is a bug in Ceph itself or a problem in how Rook manages Ceph. I fiddled with the memory settings, and it got better, but the problem was not completely resolved. Ceph has good performance as seen in the benchmarks below. It also has a good dashboard.

Rancher Longhorn

I really like Longhorn. I think this is a promising solution. True, the developers themselves (Rancher Labs) admit that it is not yet suitable for a production environment, and this shows. It's open source and has decent performance (although they haven't optimized it yet), but the volumes take a very long time to attach to the pod, and in worst cases it takes 15-16 minutes, especially after restoring a large backup or upgrading a workload. It has snapshots and off-site backups of those snapshots, but they only apply to volumes, so you still need something like Velero to back up the rest of the resources. Backups and restores are very reliable, but indecently slow. Seriously, just prohibitively slow. CPU usage and system load often spike when working with an average amount of data in Longhorn. There is a handy dashboard to manage Longhorn. I already said that I like Longhorn, but it needs to be worked on properly.

Storage OS

StorageOS is the first paid product on the list. It has a developer version with a limited managed storage size of 500 GB, but I don't think there is a limit on the number of nodes. The sales department told me that the cost starts at $125 per month for 1 TB, if I remember correctly. There is a basic dashboard and a handy CLI, but something strange is going on with the performance: in some benchmarks it is quite decent, but in the stress test of volumes, I did not like the speed at all. In general, I do not know what to say. So I didn't really understand. There are no off site backups here and you will also have to use Velero with Restic to backup volumes. It's strange, because the product is paid. And the developers were not eager to communicate in Slack.

Robin

I learned about Robin on Reddit from their CTO. I had never heard of him before. Maybe because I was looking for free solutions, and Robin is paid. They have a pretty generous free version with 10TB of storage and three nodes. In general, the product is quite decent and with nice features. There's a great CLI, but the coolest thing is that you can snapshot and backup the entire application (called Helm releases or "flex apps" in the resource selector), including volumes and other resources, so you can do without Velero. And everything would be wonderful if not for one small detail: if you restore (or "import", as it is called in Robin) an application on a new cluster - for example, in the event of a disaster recovery - the restoration, of course, works, but continue to backup the application it is forbidden. In this release, this is simply not possible, and the developers have confirmed. This is strange, to say the least, especially when you consider other advantages (for example, incredibly fast backups and restores). The developers promise to fix everything by the next release. The performance is generally good, but I noticed a strange thing: if you run the benchmark directly on a volume attached to the host, the read speed is much higher than in the same volume, but from inside the pod. All other results are identical, but in theory there should be no difference. Although they are working on it, I got frustrated with the restore and backup problem - it seemed to me that I had finally found a suitable solution, and I was even ready to pay for it when I needed more space or more servers.

portworx

I don't have much to say here. This is a paid product, equally cool and expensive. The performance is just amazing. So far this is the best indicator. Slack told me prices start at $205 per month per node, as listed on Google's GKE Marketplace. I don't know if it will be cheaper if you buy directly. In any case, I can't afford it, so I was very, very disappointed that a developer license (up to 1TB and 3 nodes) is practically useless with Kubernetes, unless you are content with static provisioning. I was hoping that the volume license would automatically downgrade to developer at the end of the trial period, but that didn't happen. The developer license can only be used directly with Docker, and the setup in Kubernetes is very cumbersome and limited. Of course, I prefer open source, but if I had money, I would definitely choose Portworx. So far, its performance simply doesn't compare to other options.

Linstor

I added this section after the post was published, when a reader suggested trying Linstor. I tried it and liked it! But I need to do some more digging. For now, I can say that the performance is quite good (I've added the benchmark results below). In fact, I got the same performance as with a direct disk benchmark, without any overhead. (Don't ask why Portworx's numbers are better than the direct disk benchmark. I have no idea. Magic, I guess.) So, Linstor seems very effective so far. Setting it up isn't exactly difficult, but it's not as easy as other options. First, I had to install Linstor (kernel module and tools/services) and set up LVM for thin provisioning and snapshot support outside of Kubernetes, directly on the host, and then create the resources needed to use the storage from Kubernetes. I wasn't happy that it didn't work on CentOS and had to use UbuntuIt's not a big deal, of course, but it's a bit annoying because the documentation (which is excellent, by the way) mentions several packages that aren't available in the specified Epel repositories. Linstor has snapshots, but no off-site backups, so I had to use Velero with Restic again for volume backups. I'd prefer snapshots over file-level backups, but that's tolerable if the solution is performant and reliable. Linstor is open source, but there's paid support. If I understand correctly, you can use it without restrictions even if you don't have a support contract, but I'd have to check that out. I don't know how tested Linstor is for Kubernetes, but the storage layer itself is outside of Kubernetes, and it looks like it's been around for a while, so it's probably already been tested in real-world conditions. Is there a solution here that would make me change my mind and switch back to Kubernetes? I don't know. I need to dig around a bit more and learn about replication. We'll see. But the first impression is good. I'd definitely prefer using my own Kubernetes clusters instead of Heroku for more freedom and to learn new things. Since Linstor isn't as easy to install as others, I'll write a post about that soon.

Benchmarks

Unfortunately, I kept few records of the comparison, because I did not think that I would write about it. I only have fio baseline benchmark results and only for single node clusters, so I don't have numbers yet for replicated configurations. But from these results you can get a rough idea of what to expect from each option, because I compared them on the same cloud servers, 4 cores, 16 GB of RAM, with an additional 100 GB disk for the tested volumes. I ran the benchmarks three times for each solution and calculated the average result, plus resetting the server settings for each product. All this is completely unscientific, just so you understand in general terms. In other tests, I copied 38 GB of photos and videos from the volume and to test reading and writing, but, alas, I did not save the numbers. In short: Portworkx was much faster.

For the volume benchmark, I used this manifest:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: dbench
spec:
  storageClassName: ...
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: batch/v1
kind: Job
metadata:
  name: dbench
spec:
  template:
    spec:
      containers:
      - name: dbench
        image: sotoaster/dbench:latest
        imagePullPolicy: IfNotPresent
        env:
          - name: DBENCH_MOUNTPOINT
            value: /data
          - name: FIO_SIZE
            value: 1G
        volumeMounts:
        - name: dbench-pv
          mountPath: /data
      restartPolicy: Never
      volumes:
      - name: dbench-pv
        persistentVolumeClaim:
          claimName: dbench
  backoffLimit: 4

I first created a volume with the appropriate storage class, and then ran the job with fio behind the scenes. I took 1 GB to estimate the performance and not to wait too long. Here are the results:

I've highlighted the best value for each metric in green and the worst in red.

Conclusion

As you can see, in most cases Portworx performed better than others. But for me it is expensive. I don't know how much Robin costs, but there's a great free version, so if you need a paid product, you can try it (I hope they fix the problem with restores and backups soon). Of the three free ones, I've had the fewest problems with OpenEBS, but its performance is abysmal. I'm sorry I didn't save more results, but I hope the numbers and my comments will help you.

Source: habr.com