Linux has many faces: how to work on any distribution

Linux has many faces: how to work on any distribution

Creating a backup application that works on any distribution is not an easy task. Getting Veeam Agent for Linux to work on distributions from Red Hat 6 and Debian 6 to OpenSUSE 15.1 and Ubuntu 19.04 has to deal with a range of issues, especially considering that the software product includes a kernel module.

The article was created based on the materials of the speech at the conference Linux Peter 2019.

Linux is not just one of the most popular operating systems. In fact, this is a platform on the basis of which you can make something unique, something of your own. Because of this, Linux has many distributions that differ in the set of software components. And here a problem arises: in order for a software product to function on any distribution, you have to take into account the features of each.

package managers. .deb vs .rpm

Let's start with the obvious problem of distributing a product across distributions.
The most common way to distribute software products is to put the package on a repository so that the package manager built into the system can install it from there.
However, we have two popular package formats: rpm ΠΈ deb. So everyone needs to be supported.

In the world of deb packages, the level of compatibility is amazing. The same package installs and works equally well on both Debian 6 and Ubuntu 19.04. The standards for packaging and working with older Debian distributions remain relevant in the newfangled Linux Mint and elementary OS. Therefore, in the case of Veeam Agent for Linux, one deb package is enough for each hardware platform.

But in the world of rpm packages, the differences are great. First, due to the fact that there are two completely independent distributors of Red Hat and SUSE, for which compatibility is not needed at all. Secondly, these distributors have distributions from those. support and experimental. They don't need to be compatible either. It turned out that el6, el7 and el8 have their own packages. Separate package for Fedora. Packages for SLES11 and 12 and separate for openSUSE. The main problem is in dependencies and package names.

Dependency problem

Alas, the same packages often end up under different names in different distributions. Below is a partial list of veeam package dependencies.

For EL7:
For SLES 12:

  • libblkid
  • libgcc
  • libstdc ++
  • ncurses-libs
  • fuse-libs
  • file-libs
  • veeamsnap=3.0.2.1185
  • libblkid1
  • libgcc_s1
  • libstdc + + 6
  • libmagic1
  • libfuse2
  • veeamsnap-kmp=3.0.2.1185

As a result, the list of dependencies is unique to the distribution.

It's worse when an updated version starts to hide under the old package name.

Example:

Package updated in Fedora 24 ncurses from version 5 to version 6. Our product was built with version 5 to ensure compatibility with older distributions. To use the old version 5 of the library on Fedora 24, I had to use the package ncurses-compat-libs.

The result is two packages for Fedora, with different dependencies.

Further more interesting. After the next update of the distribution kit, the package ncurses-compat-libs with the 5th version of the library is not available. It is costly for a distributor to pull old libraries into a new version of the distribution. After some time, the problem recurred in SUSE distributions.

As a result, some distributions had to drop their explicit dependency on ncurses-libs, and fix the product so that it can work with any version of the library.

By the way, Red Hat version 8 no longer has a meta-package pythonwhich referred to the good old python-2.7. There is python2 ΠΈ python3.

Alternative to package managers

The dependency problem is old and obvious. Recall at least Dependency hell.
To unite various libraries and applications so that they all work stably and do not conflict - in fact, this is the task that any Linux distributor tries to solve.

The package manager tries to solve this problem in a completely different way. Snappy from Canonical. The main idea: the application runs in a sandbox isolated and protected from the main system. If the application needs libraries, then they are supplied with the application itself.

Flatpak also allows you to run applications in a sandbox using Linux Containers. The idea of ​​a sandbox is also used by AppImage.

These solutions allow you to create one package for any distribution. In case of Flatpak installation and launch of the application is possible even without the knowledge of the administrator.

The main problem is that not all applications can run in a sandbox. Some need direct access to the platform. I'm not talking about kernel modules, which are rigidly dependent on the kernel and do not fit into the concept of a sandbox.

The second problem is that distribution kits from Red Hat and SUSE that are popular in the enterprise environment do not yet contain support for Snappy and Flatpak.

In this regard, Veeam Agent for Linux is not available on snapcraft.io not on www.flathub.org.

At the end of the question about package managers, I note that there is an option to completely abandon package managers by combining binary files and a script for installing them into one package.

Such a bundle allows you to create one common package for different distributions and platforms, to perform an interactive installation process, carrying out the necessary customization. I've only come across such packages for Linux from VMware.

Update problem

Linux has many faces: how to work on any distribution
Even if all dependency issues are resolved, the program may work differently on the same distribution. It's about updates.

There are 3 upgrade strategies:

  • The simplest is to never update. Set up a server and forget. Why update if everything works? Problems start the first time you contact support. The creator of the distribution only supports the updated release.
  • You can trust the distributor and set up automatic updates. In this case, a call to the support service is likely immediately after an unsuccessful update.
  • The option of manually updating only after it has been tested on a test infrastructure is the most reliable, but expensive and time-consuming. Not everyone is able to afford it.

Since different users use different update strategies, you need to support both the latest release and all previously released ones. This complicates both the development process and the testing process, adding headaches to the support team.

Variety of hardware platforms

Different hardware platforms is a problem that is largely specific to native code. At a minimum, you have to build binaries for each supported platform.

In the Veeam Agent for Linux project, we still can't support anything like this RISC.

I will not dwell on this issue in detail. I will only outline the main problems: platform-specific types, such as size_t, structure alignment and byte order.

Static and/or dynamic linking

Linux has many faces: how to work on any distribution
But the question "How to link with libraries - dynamically or statically?" worth discussing.

As a rule, C/C++ applications under Linux use dynamic linking. This works great if the application is built specifically for a specific distribution.

If the task is to cover various distributions with one binary file, then you have to focus on the oldest supported distribution. For us, this is Red Hat 6. It contains gcc 4.4, which even the C++11 standard does not support. fully.

We are building our project with gcc 6.3 which fully supports C++14. Naturally, in this case, the libstdc ++ library and boost have to be dragged with you to Red Hat 6. It's easiest to link to them statically.

But alas, not all libraries can be linked statically.

First, system libraries such as libfuse, libblkid you need to link dynamically to make sure they are compatible with the kernel and its modules.

Secondly, there is a subtlety with licenses.

The GPL license basically only allows you to link libraries with open source code. MIT and BSD allow static linking and allow libraries to be included in a project. But the LGPL does not seem to contradict static linking, but it requires sharing the files necessary for linking.

In general, using dynamic linking will save you from having to provide something.

Building C/C++ Applications

To build C / C ++ applications for different platforms and distributions, it is enough to select or build the appropriate version of gcc and use cross compilers for specific architectures, build the entire set of libraries. This work is quite feasible, but rather troublesome. And there is no guarantee that the chosen compiler and libraries will provide a workable option.

An obvious plus: the infrastructure is greatly simplified, since the entire build process can be performed on one machine. In addition, it is enough to build one set of binaries for one architecture and you can package them into packages for different distributions. This is how veeam packages for Veeam Agent for Linux are built.

As opposed to this option, you can simply prepare a build farm, that is, several machines for assembly. Each such machine will compile the application and build the package for a specific distribution and a specific architecture. In this case, the compilation is performed by the means prepared by the distributor. That is, the stage of preparing the compiler and selecting libraries is eliminated. In addition, the build process can be easily parallelized.

There is, however, a downside to this approach: for each distribution within the same architecture, you will have to build your own set of binary files. Also, the downside is that so many machines need to be maintained, allocate a large amount of disk space and RAM.

This is how KMOD packages of the veeamsnap kernel module are built for Red Hat distributions.

Open Build Service

Colleagues from SUSE tried to implement some middle ground in the form of a special service for compiling applications and building packages - openbuildservice.

In fact, this is a hypervisor that creates a virtual machine, installs all the necessary packages in it, compiles the application and builds the package in this isolated environment, after which such a virtual machine is released.

Linux has many faces: how to work on any distribution

The scheduler implemented in OpenBuildService will determine how many virtual machines it can run for optimal package build speed. The built-in signing mechanism will automatically sign packages and push them to the built-in repository. The built-in version control system will save the history of changes and builds. It remains just to add your sources to this system. Even the server itself is not necessary to raise, but you can use the open one.

Here, however, there is a problem: such a combine is difficult to fit into the existing infrastructure. For example, version control is not needed, we already have our own for source codes. Our signature mechanism is different: a special server is used. You don't need a repository either.

In addition, support for other distributions - for example, Red Hat - is rather poorly implemented, which is understandable.

The advantage of such a service is fast support for the next version of the SUSE distribution. Before the official announcement of the release, the packages necessary for assembly are posted on the public repository. A new one appears in the list of available distributions on OpenBuildService. We put a tick, and it is added to the assembly plan. Thus, adding a new version of the distribution kit is performed almost in one click.

In our infrastructure, using the OpenBuildService, the entire variety of KMP packages for the veeamsnap kernel module for SUSE distributions is assembled.

Next, I would like to dwell on issues specific to kernel modules.

kernel ABI

Linux kernel modules have historically been distributed as source code. The point is that the creators of the kernel don't bother with maintaining a stable API for kernel modules, and even more so at the binary level, hereinafter referred to as kaBI.

To build a module for a vanilla kernel, headers of this particular kernel are required, and it will only work on this kernel.

DKMS allows you to automate the process of building modules when updating the kernel. As a result, users of the Debian repository (and its many relatives) use kernel modules either from the distributor's repository or built from source using DKMS.

However, this situation does not particularly suit the Enterprise segment. Proprietary code distributors want to distribute the product as compiled binaries.

Administrators don't want to keep development tools on production servers for security reasons. Enterprise Linux distributors such as Red Hat and SUSE decided that they could support a stable kaBI for their users. The result is KMOD packages for Red Hat and KMP packages for SUSE.

The essence of this decision is quite simple. For a particular distribution version, the kernel API is freeze-able. The distributor declares that he uses the kernel, for example, 3.10, and makes only corrections and improvements that do not affect the kernel interfaces in any way, and modules compiled for the very first kernel can be used for all subsequent kernels without recompilation.

Red Hat claims kABI compatibility for the distribution throughout its life cycle. That is, the compiled module for rhel 6.0 (release November 2010) should also work on version 6.10 (release June 2018). And this is almost 8 years. Naturally, the task is quite difficult.
We have fixed several cases when due to problems with kABI compatibility the veeamsnap module stopped working.

After the veeamsnap module built for RHEL 7.0 turned out to be incompatible with the kernel from RHEL 7.5, but it loaded and guaranteed to crash the server, we stopped using kABI compatibility for RHEL 7 altogether.

Currently, the KMOD package for RHEL 7 contains a build for each version of the release and a script that ensures that the module is loaded.

SUSE approached the task of kABI compatibility more carefully. They provide kABI compatibility only within one service pack.

For example, the release of SLES 12 took place in September 2014. And SLES 12 SP1 was already in December 2015, that is, a little more than a year has passed. Although both releases use the 3.12 kernel, they are kABI incompatible. Obviously, maintaining kABI compatibility for only a year is much easier. The annual update cycle of a kernel module should not cause problems for module creators.

As a result of this SUSE policy, we haven't seen any kABI compatibility issues with our veeamsnap module. True, the number of packages for SUSE is almost an order of magnitude larger.

Patches and backports

Although distributors try to ensure kABI compatibility and stability of the kernel, they also try to improve the performance and fix the defects of this stable kernel.

At the same time, in addition to their own β€œwork on bugs”, the developers of the enterprise linux kernel track changes in the vanilla kernel and transfer them to their β€œstable” one.

Sometimes this leads to new mistakes.

In the latest release of Red Hat 6, a bug was made in one of the minor updates. It led to the fact that the veeamsnap module was guaranteed to crash the system when the snapshot was released. Comparing the kernel sources before and after the update, we found out that the backport was to blame. A similar fix was made in the vanilla kernel version 4.19. But in the vanilla kernel, this fix worked fine, but when it was transferred to the "stable" 2.6.32, there was a problem with the spinlock.

Of course, everyone has bugs and always, but was it worth dragging the code from 4.19 to 2.6.32, risking stability? .. I'm not sure ...

The worst thing is when marketing is connected to the tug-of-war "stability" <-> "modernization". The marketing department needs the core of the updated distribution to be stable on the one hand, and at the same time be better in performance and have new features. This leads to strange compromises.

When I tried to build a module on the 4.4 kernel from SLES 12 SP3, I was surprised to find functionality from vanilla 4.8 in it. In my opinion, the block I/O implementation of the 4.4 kernel from SLES 12 SP3 looks more like the 4.8 kernel than the previous release of the stable 4.4 kernel from SLES12 SP2. What was the percentage of the ported code from the 4.8 kernel to SLES 4.4 for SP3, I can’t judge, but I can’t call the kernel the same stable 4.4.

The most annoying thing about this is that when writing a module that would work equally well on different kernels, you can no longer rely on the kernel version. We also have to take into account the distribution. It's good that sometimes you can get involved in the define that appears along with the new functionality, but this opportunity does not always appear.

As a result, the code is overgrown with fancy conditional compilation directives.

There are also patches that change the documented kernel API.
I came across a distribution Kde neon 5.16 and was very surprised to see that the call to lookup_bdev in this version of the kernel changed the list of input parameters.

To assemble, I had to add a script to the makefile that checks if the lookup_bdev function has a mask parameter.

Signature of kernel modules

But back to the issue of package distribution.

One of the advantages of stable kABI is that kernel modules can be signed as a binary file. In this case, the developer can be sure that the module has not been accidentally corrupted or intentionally changed. You can check this with the modinfo command.

Red Hat and SUSE distributions allow you to check the signature of a module and load it only if the system has a corresponding certificate registered. The certificate is a public key used to sign the module. We distribute it as a separate package.

The problem here is that certificates can either be built into the kernel (they are used by distributors) or must be written to EFI non-volatile memory using the utility mokutil... Utility mokutil when installing a certificate, it requires a reboot of the system and, even before loading the operating system kernel, prompts the administrator to allow loading a new certificate.

Thus, adding a certificate requires physical administrator access to the system. If the machine is located somewhere in the cloud or just in a remote server room and access is only via the network (for example, via ssh), then it will be impossible to add a certificate.

EFI in virtual machines

Despite the fact that EFI has been supported by almost all motherboard manufacturers for a long time, when installing the system, the administrator may not think about the need for EFI, and it can be disabled.

Not all hypervisors support EFI. VMWare vSphere supports EFI since version 5.
Microsoft Hyper-V also received EFI support starting with Hyper-V for Windows Server 2012R2.

However, in the default configuration, this functionality is disabled for Linux machines, which means that the certificate cannot be installed.

In vSphere 6.5, set the option Secure Boot is possible only in the old version of the web interface that works through Flash. Web UI on HTML-5 is still far behind.

Experimental distributions

And finally, consider the issue of experimental distributions and distributions without official support. On the one hand, such distributions are unlikely to be found on the servers of serious organizations. There is no official support for such distributions. Therefore, to provide those. product support on such a distribution is not possible.

However, such distributions become a convenient platform for trying new experimental solutions. For example, Fedora, OpenSUSE Tumbleweed or Unstable versions of Debian. They are pretty stable. They always have new versions of programs and always a new kernel. In a year, this experimental functionality may be in the updated RHEL, SLES or Ubuntu.

So if something does not work on an experimental distribution, this is an occasion to look into the problem and solve it. You need to be prepared for the fact that this functionality will soon appear on users' production servers.

The current list of officially supported distributions for version 3.0 you can study here. But the real list of distributions on which our product is able to work is much wider.

Personally, I was interested in the experiment with the Elbrus OS. After finalizing the veeam package, our product was installed and started working. I wrote about this experiment on HabrΓ© in article.

Well, support for new distributions continues. We are waiting for the release of version 4.0. The beta is about to arrive, so stay tuned whats new!

Source: habr.com

Add a comment