Increasing the density of containers on a node using PFCACHE technology

Increasing the density of containers on a node using PFCACHE technology

One of the goals of the hosting provider is the maximum possible utilization of existing equipment in order to provide quality service to end users. The resources of end servers are always limited, however, the number of hosted client services, and in our case we are talking about VPS, can vary significantly. Read about how to climb a Christmas tree and eat a burger under the cut.

Compacting the VPS on the node in such a way that customers do not feel it at all helps a lot to increase the economic performance of any hosting provider. Of course, a node should not burst at the seams if containers are stuffed to the eyeballs into it, and all clients immediately feel any surge in load.

How many VPS can be hosted on one node depends on many factors, such obvious as:

1. Characteristics of the hardware of the node itself
2. VPS size
3. The nature of the load on the VPS
4. Software technologies to help optimize density

In this case, we will share our experience of using Pfcache technology for Virtuozzo.
We use the 6th branch, but everything said is true for the 7th.

pfcache is a Virtuozzo engine that helps dedupe IOPS and RAM in containers by allocating identical files in containers into a separate common zone.

In fact, it consists of:
1. Kernel code
2. User-space daemon
3.User-space utilities

On the node side, we allocate a whole section in which files will be created that will be directly used by all VPS on the node. This section mounts the ploop block device. Further, when the container starts, it receives a reference to this section:

[root@pcs13 ~]# cat /proc/mounts
...
/dev/ploop62124p1 /vz/pfcache ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12 0 0
...
/dev/ploop22927p1 /vz/root/418 ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12,pfcache_csum,pfcache=/vz/pfcache 0 0
/dev/ploop29642p1 /vz/root/264 ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12,pfcache_csum,pfcache=/vz/pfcache 0 0
...

Here is an approximate statistics of the number of files on one of our nodes:

[root@pcs13 ~]# find /vz/pfcache -type f | wc -l
45851
[root@pcs13 ~]# du -sck -h /vz/pfcache
2.4G    /vz/pfcache
2.4G    total

How pfcache works is as follows:
β€’ The user-space Pfcached daemon writes the sha-1 hash of the file to the xattr attribute of the file. Not all files are processed, but only in the directories /usr, /bin, /usr/sbin, /sbin, /lib, /lib64

β€’ It is most likely that the files in these directories will be "shared" and will be used by several containers;

β€’ Pfcached periodically collects statistics on reading files from the kernel, analyzes it, and adds files to the cache if they are used frequently;

β€’ These directories may be different and are configured in the configuration files.

β€’ When a file is read, it is checked whether it contains the specified hash in the xattr extended attributes. If it contains, a "general" file is opened, instead of a container file. This substitution occurs imperceptibly for the container code, and is hidden in the kernel;

β€’ When writing to a file, the hash is invalidated. Thus, the next time you open it, the container file will be opened directly, and not its cache.

By keeping common files from /vz/pfcache in the page cache, we achieve savings on this cache itself, as well as saving IOPS. Instead of reading ten files from disk, we read one that immediately goes to the page cache.

struct inode {
...
 struct file             *i_peer_file;
...
};
struct address_space {
...
 struct list_head        i_peer_list;
...
}

The VMA list for the file remains the same (deduplicating memory) and is read from disk less often (saving iops). Our "common fund" is hosted on an SSD - an additional gain in speed.

Example for caching the /bin/bash file:

[root@pcs13 ~]# ls -li /vz/root/2388/bin/bash
524650 -rwxr-xr-x 1 root root 1021112 Oct  7  2018 /vz/root/2388/bin/bash
[root@pcs13 ~]# pfcache dump /vz/root/2388 | grep 524650
8e3aa19fdc42e87659746f6dc8ea3af74ab30362 i:524650      g:1357611108  f:CP
[root@pcs13 ~]# sha1sum /vz/root/2388/bin/bash
8e3aa19fdc42e87659746f6dc8ea3af74ab30362  /vz/root/2388/bin/bash
[root@pcs13 /]# getfattr -ntrusted.pfcache /vz/root/2388/bin/bash
# file: vz/root/2388/bin/bash
trusted.pfcache="8e3aa19fdc42e87659746f6dc8ea3af74ab30362"
[root@pcs13 ~]# sha1sum /vz/pfcache/8e/3aa19fdc42e87659746f6dc8ea3af74ab30362
8e3aa19fdc42e87659746f6dc8ea3af74ab30362  /vz/pfcache/8e/3aa19fdc42e87659746f6dc8ea3af74ab30362

Usage efficiency is calculated ready-made script.

This script goes through all the containers on the node, calculating the cached files for each container.

[root@pcs16 ~]# /pcs/distr/pfcache-examine.pl
...
Pfcache cache uses 831 MB of memory
Total use of pfcached files in containers is 39837 MB of memory
Pfcache effectiveness: 39006 MB

Thus, we save about 40 gigabytes of files in containers from memory, they will be loaded from the cache.

In order for this mechanism to work even better, it is necessary to place the most β€œidentical” VPS on the node. For example, those for which the user does not have root access and for which the environment from the deployed image is configured.

You can tune pfcache work through the config file
/etc/vz/pfcache.conf

MINSIZE, MAXSIZE - minimum/maximum file size for caching
TIMEOUT - timeout between caching attempts

For a complete list of options, see here to register:.

Source: habr.com

Add a comment