Discovering The Flexibility Of Openebs

In this post, I’ll give you my first impressions of OpenEBS: how it works, how to get started with it, and what I like about it. OpenEBS provides storage for stateful applications running on Kubernetes. including dynamic local persistent volumes (such as the Rancher local path provider) or replicated volumes using various “data engines”. Similar to Prometheus, which can be deployed on a Raspberry Pi to monitor the temperature of your beer or yeast cultures in your basement, but also scaled to monitor hundreds of thousands of servers, OpenEBS can be used for simple projects, quick demos, but also large clusters with sophisticated storage needs.

Local PV data engines

Permanent volumes using one of the “local PV” engines are not replicated across multiple nodes. OpenEBS will use the node’s local storage. Many variations of local PV motors are available. It can use local directories (used as HostPath volumes), existing block devices (disks, partitions, or other), raw files (ZFS filesystems that allow advanced features like snapshots and clones), or Linux LVM volumes (in this case OpenEBS works similarly in TopoLVM). The obvious disadvantage of local PV data engines is that a node failure will cause the volumes on that node to become unavailable. and if the node is lost, so is the data that existed on that node. However, these engines have excellent performance: since there is no overhead in the data path, the read/write performance will be the same as if we were using the storage directly, without containers. Another advantage is that the host path local PV works out of the box – without any additional configuration required – when installing OpenEBS, similar to the Rancher local path provider. Super handy when I need a storage class “right now” for a quick test!

Repetitive engines

OpenEBS also offers multiple replication engines: Jiva, cStor and Mayastor. I’ll be honest, at first I was quite confused: why do we need not one, not two, but three repeat engines? Let’s find out!

Jiva engine

The Jiva engine is the simplest. Its main advantage is that it does not require any additional configuration. Like the local host path PV engine, the Jiva engine works out of the box when you install OpenEBS. It provides powerful data replication. With the default settings, each time we provision a Jiva volume, three storage groups will be created, using a scheduling placement constraint to ensure they are placed on different nodes. That way, a node outage won’t remove more than one volume copy at a time. The Jiva engine is simple to operate, but lacks the advanced features of other engines (such as snapshots, clones, or adding capacity on the fly), and the OpenEBS docs state that Jiva is suitable when “capacity requirements are small” (as below 50 GB). In other words, this is fantastic for testing, labs or demos, but maybe not for that giant production database.

cStor engine

Next on the list is the cStor engine. This brings us the extra features mentioned earlier (snapshots, clones, and adding capacity on the fly), but requires a bit more work to achieve. That is, you need to include NDM, the Node Disk Manager component of OpenEBS, and you need to tell it which available block devices you want to use. This means you should have some free partitions (or even whole disks) to allocate to cStor. If you don’t have an extra disk or partition available, you may be able to use loop devices. However, since loop devices have a significant performance overhead, you can also use the Jiva provider in this case because it will achieve similar results but be much easier to configure.

Mayastor engine

Finally, there is the Mayastor engine. It is designed to work closely with NVMe (Non-Volatile Memory Express) drives and protocols (it can however use non-NVMe drives). I was wondering why this was a big deal, so I did some digging. In old storage systems, you could only send one command at a time: read this block or write this block. Then you had to wait until the command was completed before you could submit another one. Later, it became possible to submit multiple commands and let the disk reorder them to execute them faster. For example, to reduce the number of head searches using an elevator algorithm. In the late 90s, the ATA-4 standard introduced TCQ (Tagged Command Queuing) to the ATA specification. This was later greatly improved by NCQ (Native Command Queuing) with SATA drives. SCSI disks had a longer command queue, so they were more expensive and more likely to be found in high-end servers and storage systems. Over time, waiting systems have evolved a lot. Early standards allowed for a few dozen commands to be queued in a single queue. now we are talking about thousands of commands in thousands of queues. This makes multi-core systems more efficient as queues can be bound to specific cores and reduce contention. We can now have priorities between queues as well, which can ensure fair disk access between queues. This is great for virtualized workloads, to make sure one VM doesn’t starve the others. Most importantly, NVMe also optimizes CPU usage associated with disk access because it’s designed to require less back-and-forth between the operating system and the disk controller. While there are certainly many other possibilities in NVMe, this queuing business alone makes a big difference. and I can see why Mayastor would be relevant to people who want to design the highest performance storage systems. If you want help figuring out which engine is best for your needs, you’re not alone. and the OpenEBS documentation has a great page about it.

Storage space connected to the container

Another interesting thing in OpenEBS is the concept of CAS or Container Attached Storage. The wording made me raise an eyebrow at first. Is it a marketing ploy? Not exactly. When using the Jiva replication engine, I noticed that for each Jiva volume, I would get 4 pods and a service:

a “controller” group (with “-ctrl-” in its name) three pod “data replicas” (with “-rep-” in its name) a service that exposes (on different ports): an iSCSI target, a Prometheus metrics endpoint, and an API server

This is interesting because it mimics what you get when you deploy a SAN: multiple disks (the data copy containers) and a controller (to interface between a storage protocol like iSCSI and the disks themselves). These components are implemented by containers and pods, and the storage actually resides in the containers, so the term “container-attached storage” makes a lot of sense (note that storage doesn’t necessarily use copy-on-write container storage. my setup, by default, uses a hostPath volume, however this can be configured).

I mentioned iSCSI above. I found it reassuring that OpenEBS was using iSCSI with cStor because it is a stable, proven protocol widely used in the storage industry. This means that OpenEBS does not require a custom kernel module or anything like that. I believe it does, however, require some userland tools to be installed on the nodes. I say “I believe” because on my Ubuntu test nodes with a very bare cloud image, I didn’t need to install or configure anything extra. After this quick tour of OpenEBS, the most important question is: does it fit my needs? I found that its wide range of options meant it could handle almost anything I threw at it. For training,…

Local PV data engines#

Repetitive engines#

Jiva engine#

cStor engine#

Mayastor engine#

Storage space connected to the container#