Cluster storage for small web clusters based on drbd+ocfs2

What we will talk about:
How to quickly deploy shared storage for two servers based on drbd+ocfs2 solutions.

For whom it will be useful:
The tutorial will be useful for system administrators and anyone who chooses a storage implementation method or wants to try a solution.

What decisions have we abandoned and why?

Often we are faced with a situation where we need to implement shared storage with good read-write performance on a small web cluster. We tried various shared storage implementations for our projects, but few were able to satisfy us in several ways at once. Now we'll tell you why.

  • Glusterfs did not suit us with read and write performance, there were problems with simultaneous reading of a large number of files, there was a high load on the CPU. The problem with reading files could be solved by asking them directly in the bricks, but this is not always applicable and is generally wrong.

  • Ceph didn't like the extra complexity, which can be detrimental on projects with 2-4 servers, especially if the project is subsequently maintained. Again, there are serious performance limitations, forcing you to build separate storage clusters, just like with glusterfs.

  • Using a single nfs server to implement shared storage raises issues in terms of fault tolerance.

  • s3 is a great popular solution for a certain range of tasks, but it is not a file system, which narrows the scope.

  • lsyncd. If we have already started talking about "non-file systems", then it's worth going through this popular solution. Not only is it not suitable for two-way exchange (but if you really want to, you can), it also does not work stably on a large number of files. A nice addition to everything is that it is single-threaded. The reason is in the architecture of the program: it uses inotify to monitor work objects that it hangs at startup and when rescanning. The transfer medium is rsync.

Tutorial: How to deploy shared storage based on drbd+ocfs2

One of the most convenient solutions for us was a bundle ocfs2+drbd. Now we'll show you how you can quickly deploy shared storage for two solution database servers. But first, a little about the components:

DRBD - a storage system from the standard distribution of Linux, which allows you to replicate data between server blocks. The main application is to build fault-tolerant storages.

OCFS2 - a file system that provides shared use of the same storage by several systems. It is included in the distribution of Linux and is a kernel module and userspace toolkit for working with FS. OCFS2 can be used not only over DRBD, but also over iSCSI with multiple connections. In our example, we are using DRBD.

All actions are performed on ubuntu server 18.04 in a minimal configuration.

Step 1. Set up DRBD:

In the file /etc/drbd.d/drbd0.res we describe our virtual block device /dev/drbd0:

resource drbd0 {
    syncer { rate 1000M; }
    net {
        allow-two-primaries;
        after-sb-0pri discard-zero-changes;
        after-sb-1pri discard-secondary;
        after-sb-2pri disconnect;
    }
    startup { become-primary-on both; }
    on drbd1 {
        meta-disk internal;
        device /dev/drbd0;
        disk /dev/vdb1;
        address 10.10.10.192:7789;
}
    on drbd2 {
        meta-disk internal;
        device /dev/drbd0;
        disk /dev/vdb1;
        address 10.10.10.193:7789;
}
}

meta-disk internal - use the same block devices for storing metadata
device /dev/drbd0 - use /dev/drbd0 as the path to the drbd volume.
disk /dev/vdb1 - use /dev/vdb1
syncer { rate 1000M; } - use gigabit bandwidth
allow-two-primaries - an important option that allows changes to be accepted on two primary servers
after-sb-0pri, after-sb-1pri, after-sb-2pri — options responsible for the actions of the node when a splitbrain is detected. More details can be found in the documentation.
become-primary-on both - sets both nodes to primary.

In our case, we have two absolutely identical VMs, with a dedicated virtual network with a bandwidth of 10 gigabits.

In our example, the network names of two cluster nodes are drbd1 and drbd2. For proper operation, it is necessary to match the names and ip addresses of hosts in /etc/hosts.

10.10.10.192 drbd1
10.10.10.193 drbd2

Step 2. Set up the nodes:

On both servers, run:

drbdadm create-md drbd0

Cluster storage for small web clusters based on drbd+ocfs2

modprobe drbd
drbdadm up drbd0
cat /proc/drbd

We get the following:

Cluster storage for small web clusters based on drbd+ocfs2

You can start sync. On the first node you need to do:

drbdadm primary --force drbd0

Let's see the status:

cat /proc/drbd

Cluster storage for small web clusters based on drbd+ocfs2

Great, sync has started. We are waiting for the end and see the picture:

Cluster storage for small web clusters based on drbd+ocfs2

Step 3. Start synchronization on the second node:

drbdadm primary --force drbd0

We get the following:

Cluster storage for small web clusters based on drbd+ocfs2

Now we can write to drbd from two servers.

Step 4. Installing and configuring ocfs2.

We will use a fairly trivial configuration:

cluster:
     node_count = 2
     name = ocfs2cluster

node:
     number = 1
     cluster = ocfs2cluster
     ip_port = 7777
     ip_address = 10.10.10.192
     name = drbd1

node:
     number = 2
     cluster = ocfs2cluster
     ip_port = 7777
     ip_address = 10.10.10.193
     name = drbd2

It must be written in /etc/ocfs2/cluster.conf on both nodes.

Create FS on drbd0 on any node:

mkfs.ocfs2 -L "testVol" /dev/drbd0

Here we have created a filesystem labeled testVol on drbd0 using the default parameters.

Cluster storage for small web clusters based on drbd+ocfs2

In /etc/default/o2cb must be set (as in our configuration file)

O2CB_ENABLED=true 
O2CB_BOOTCLUSTER=ocfs2cluster 

and execute on each node:

o2cb register-cluster ocfs2cluster

Then we turn on and add to autorun all the units we need:

systemctl enable drbd o2cb ocfs2
systemctl start drbd o2cb ocfs2

Some of this will already be running in the setup process.

Step 5. Add mount points to fstab on both nodes:

/dev/drbd0 /media/shared ocfs2 defaults,noauto,heartbeat=local 0 0

Directory /media/shared and must be created in advance.

Here we use the options noauto, which means that the filesystem will not be mounted at startup (I prefer to mount network files via systemd) and heartbeat=local, which means that the heartbeat service is used on each node. There is also global heartbeat, which is more suitable for large clusters.

Next, you can mount /media/shared and check the content sync.

Done! As a result, we get a more or less fault-tolerant storage with scalability and decent performance.

Source: habr.com

Add a comment