
Update!. In the comments one of the readers suggested to try (maybe he's working on it himself), so I've added a section about that solution. I also wrote because the process is very different from the rest.
To be honest, I gave up and gave up (at least for now). I will use . Why? Because of storage! Who would have thought that I would be messing with storage more than with Kubernetes itself. I use , because it is inexpensive and the performance is good, and from the very beginning I deployed clusters with . I haven't tried Kubernetes managed services from Google/Amazon/Microsoft/DigitalOcean etc., etc., because I wanted to learn everything myself. I'm also frugal.
So - yes, I spent a lot of time trying to decide which storage to choose when I was considering a possible stack on Kubernetes. I prefer open source solutions, and not only because of the price, but I looked at a couple of paid options out of curiosity, because they have free versions with restrictions. I jotted down a few numbers from recent benchmarks when I was comparing different options, and they may be of interest to those studying storage in Kubernetes. Although personally I have said goodbye to Kubernetes so far. I also want to mention , in which you can directly provision Hetzner Cloud volumes, but I have not tried it yet. I was looking into cloud software-defined storage because I needed replication and the ability to quickly mount persistent volumes on any node, especially in case of node failures and other similar situations. Some solutions offer point-in-time snapshots and off-site backups, which is handy.
I tested 6-7 storage solutions:
As I already said , having tested most of the options from the list, I initially settled on OpenEBS. OpenEBS is very easy to install and use, but to be honest, after testing with real data under load, its performance disappointed me. This is open source, and developers are on their own always very helpful when I needed help. Unfortunately, it has very poor performance compared to other options, so I had to run the tests again. Right now OpenEBS has 3 storage engine, but I'm posting benchmark results for cStor. I don't have numbers for Jiva and LocalPV yet.
In a nutshell, Jiva is slightly faster, and LocalPV is generally fast, no worse than the drive benchmark directly. The problem with LocalPV is that the volume can only be accessed on the node where it was provisioned, and there is no replication at all. I had some problems with restoring a backup via on the new cluster because the node names were different. Speaking of backups, cStor has , with which you can make off-site point-in-time snapshot backups, which is more convenient than file-level backups with Velero-Restic. I wrote to make it easier to manage backups and restores with this plugin. Overall, I really like OpenEBS, but its performance...
Rook is also open source, and differs from the rest of the options on the list in that it is a storage orchestrator that performs complex storage management tasks with different backends, for example , and others, which greatly simplifies the work. I had problems with EfgeFS when I tried it a few months ago, so I tested mainly with Ceph. Ceph offers not only block storage, but also object storage compatible with S3/Swift and distributed file system. What I like about Ceph is the ability to spread volume data across multiple disks so that the volume can use more disk space than can fit on one disk. It's comfortable. Another cool feature is that when you add disks to the cluster, it automatically redistributes data across all disks.
Ceph has snapshots, but as far as I know they can't be used directly in Rook/Kubernetes. Admittedly, I didn't delve into it. But there are no off site backups, so you have to use something with Velero / Restic, but there are only file-level backups, not point-in-time snapshots. What I really like about Rook, however, is the ease of working with Ceph - it hides almost all the complex stuff and offers tools to talk directly to Ceph for troubleshooting. Unfortunately, on the stress test of Ceph volumes, I always had , which causes Ceph to become unstable. It is not yet clear whether this is a bug in Ceph itself or a problem in how Rook manages Ceph. I fiddled with the memory settings, and it got better, but the problem was not completely resolved. Ceph has good performance as seen in the benchmarks below. It also has a good dashboard.
I really like Longhorn. I think this is a promising solution. True, the developers themselves (Rancher Labs) admit that it is not yet suitable for a production environment, and this shows. It's open source and has decent performance (although they haven't optimized it yet), but the volumes take a very long time to attach to the pod, and in worst cases it takes 15-16 minutes, especially after restoring a large backup or upgrading a workload. It has snapshots and off-site backups of those snapshots, but they only apply to volumes, so you still need something like Velero to back up the rest of the resources. Backups and restores are very reliable, but indecently slow. Seriously, just prohibitively slow. CPU usage and system load often spike when working with an average amount of data in Longhorn. There is a handy dashboard to manage Longhorn. I already said that I like Longhorn, but it needs to be worked on properly.
StorageOS is the first paid product on the list. It has a developer version with a limited managed storage size of 500 GB, but I don't think there is a limit on the number of nodes. The sales department told me that the cost starts at $125 per month for 1 TB, if I remember correctly. There is a basic dashboard and a handy CLI, but something strange is going on with the performance: in some benchmarks it is quite decent, but in the stress test of volumes, I did not like the speed at all. In general, I do not know what to say. So I didn't really understand. There are no off site backups here and you will also have to use Velero with Restic to backup volumes. It's strange, because the product is paid. And the developers were not eager to communicate in Slack.
I learned about Robin on Reddit from their CTO. I had never heard of him before. Maybe because I was looking for free solutions, and Robin is paid. They have a pretty generous free version with 10TB of storage and three nodes. In general, the product is quite decent and with nice features. There's a great CLI, but the coolest thing is that you can snapshot and backup the entire application (called Helm releases or "flex apps" in the resource selector), including volumes and other resources, so you can do without Velero. And everything would be wonderful if not for one small detail: if you restore (or "import", as it is called in Robin) an application on a new cluster - for example, in the event of a disaster recovery - the restoration, of course, works, but continue to backup the application it is forbidden. In this release, this is simply not possible, and the developers have confirmed. This is strange, to say the least, especially when you consider other advantages (for example, incredibly fast backups and restores). The developers promise to fix everything by the next release. The performance is generally good, but I noticed a strange thing: if you run the benchmark directly on a volume attached to the host, the read speed is much higher than in the same volume, but from inside the pod. All other results are identical, but in theory there should be no difference. Although they are working on it, I got frustrated with the restore and backup problem - it seemed to me that I had finally found a suitable solution, and I was even ready to pay for it when I needed more space or more servers.
I don't have much to say here. This is a paid product, equally cool and expensive. The performance is just amazing. So far this is the best indicator. Slack told me prices start at $205 per month per node, as listed on Google's GKE Marketplace. I don't know if it will be cheaper if you buy directly. In any case, I can't afford it, so I was very, very disappointed that a developer license (up to 1TB and 3 nodes) is practically useless with Kubernetes, unless you are content with static provisioning. I was hoping that the volume license would automatically downgrade to developer at the end of the trial period, but that didn't happen. The developer license can only be used directly with Docker, and the setup in Kubernetes is very cumbersome and limited. Of course, I prefer open source, but if I had money, I would definitely choose Portworx. So far, its performance simply doesn't compare to other options.
I added this section after the post was published, when a reader suggested trying Linstor. I tried it and I liked it! But you still have to dig. Now I can say that the performance is not bad (benchmark results added below). In fact, I got the same performance as for the disk directly, without any overhead. (Don't ask why Portworx's numbers are better than the drive's benchmark directly. I have no idea. Magic, I guess.) So Linstor seems very effective so far. Installing it is not that difficult, but not as easy as other options. I first had to install Linstor (kernel module and tools/services) and set up LVM for thin provisioning and snapshot support outside of Kubernetes, directly on the host, and then create the resources needed to use storage from Kubernetes. I didn't like that it didn't work on CentOS and had to use Ubuntu. Not terrible, of course, but a little annoying, because the documentation (which, by the way, is excellent) mentions several packages that cannot be found in the specified Epel repositories. Linstor has snapshots, but no off-site backups, so here again I had to use Velero with Restic to back up the volumes. I would prefer snapshots over file-level backups, but this can be tolerated if the solution is both performant and reliable. Linstor is open source but has paid support. If I understand correctly, it can be used without restrictions, even if you do not have a support contract, but this needs to be clarified. I don't know how Linstor is tested for Kubernetes, but the storage layer itself is outside of Kubernetes and, apparently, the solution did not appear yesterday, so it is probably already tested in real conditions. Is there a solution here that will make me change my mind and return to Kubernetes? I do not know. We still need to dig deeper, study replication. Let's see. But the first impression is good. I would definitely prefer to use my own Kubernetes clusters instead of Heroku to have more freedom and learn new things. Since Linstor is not as easy to install as others, I will write a post about it soon.
Benchmarks
Unfortunately, I kept few records of the comparison, because I did not think that I would write about it. I only have fio baseline benchmark results and only for single node clusters, so I don't have numbers yet for replicated configurations. But from these results you can get a rough idea of what to expect from each option, because I compared them on the same cloud servers, 4 cores, 16 GB of RAM, with an additional 100 GB disk for the tested volumes. I ran the benchmarks three times for each solution and calculated the average result, plus resetting the server settings for each product. All this is completely unscientific, just so you understand in general terms. In other tests, I copied 38 GB of photos and videos from the volume and to test reading and writing, but, alas, I did not save the numbers. In short: Portworkx was much faster.
For the volume benchmark, I used this manifest:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: dbench
spec:
storageClassName: ...
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
apiVersion: batch/v1
kind: Job
metadata:
name: dbench
spec:
template:
spec:
containers:
- name: dbench
image: sotoaster/dbench:latest
imagePullPolicy: IfNotPresent
env:
- name: DBENCH_MOUNTPOINT
value: /data
- name: FIO_SIZE
value: 1G
volumeMounts:
- name: dbench-pv
mountPath: /data
restartPolicy: Never
volumes:
- name: dbench-pv
persistentVolumeClaim:
claimName: dbench
backoffLimit: 4I first created a volume with the appropriate storage class, and then ran the job with fio behind the scenes. I took 1 GB to estimate the performance and not to wait too long. Here are the results:
I've highlighted the best value for each metric in green and the worst in red.
Conclusion
As you can see, in most cases Portworx performed better than others. But for me it is expensive. I don't know how much Robin costs, but there's a great free version, so if you need a paid product, you can try it (I hope they fix the problem with restores and backups soon). Of the three free ones, I've had the fewest problems with OpenEBS, but its performance is abysmal. I'm sorry I didn't save more results, but I hope the numbers and my comments will help you.
Source: habr.com
