Guidelines for running Buildah inside a container

What is the beauty of separating the container runtime into separate tool components? In particular, the fact that these tools can start to be combined so that they protect each other.

Guidelines for running Buildah inside a container

Many people are attracted by the idea of ​​building OCI container images within Kubernetes or similar system. Let's say we have a CI / CD that constantly builds images, then something like Red Hat OpenShift/Kubernetes would be very useful in terms of build load balancing. Until recently, most people simply gave containers access to a Docker socket and allowed them to run the docker build command. We showed a few years agothat this is very insecure, in fact, it's even worse than giving passwordless root or sudo.

So people are constantly trying to run Buildah in a container. In short, we have created example how, in our opinion, it is best to run Buildah inside a container, and posted the corresponding images on quay.io/buildah. Let's get started...

Setting

These images are built from Dockerfiles, which can be found in the Buildah repository in the folder buildahimage.
Here we will consider stable version of Dockerfile.

# stable/Dockerfile
#
# Build a Buildah container image from the latest
# stable version of Buildah on the Fedoras Updates System.
# https://bodhi.fedoraproject.org/updates/?search=buildah
# This image can be used to create a secured container
# that runs safely with privileges within the container.
#
FROM fedora:latest

# Don't include container-selinux and remove
# directories used by dnf that are just taking
# up space.
RUN yum -y install buildah fuse-overlayfs --exclude container-selinux; rm -rf /var/cache /var/log/dnf* /var/log/yum.*

# Adjust storage.conf to enable Fuse storage.
RUN sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' /etc/containers/storage.conf

Instead of OverlayFS implemented at the level of the Linux kernel of the host, we use the program inside the container fuse overlay, because currently OverlayFS can only mount if you give it SYS_ADMIN permissions through Linux capabilities. And we want to run our Buildah containers without any root privileges. Fuse-overlay is pretty fast and performs better than the VFS storage driver. Note that when running a Buildah container using Fuse, the /dev/fuse device must be provided.

podman run --device /dev/fuse quay.io/buildahctr ...
RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock

Next, we create a directory for additional repositories. container/storage supports the concept of connecting additional read-only image repositories. For example, you can set up an overlay storage area on one machine, and then use NFS to mount this storage on another machine and use images from it without downloading via pull. We need this storage in order to be able to connect some image storage from the host as a volume and use it inside the container.

# Set up environment variables to note that this is
# not starting with user namespace and default to
# isolate the filesystem with chroot.
ENV _BUILDAH_STARTED_IN_USERNS="" BUILDAH_ISOLATION=chroot

Finally, we use the BUILDAH_ISOLATION environment variable to tell the Buildah container to start with chroot isolation by default. Additional isolation is not required here, since we are already working in a container. In order for Buildah to create its own namespace-separated containers, the SYS_ADMIN privilege is required, which would require loosening the container's SELinux and SECCOMP rules, which would conflict with our setup to build from a secure container.

Run Buildah inside a container

The Buildah container image scheme discussed above allows you to flexibly vary how such containers are launched.

Speed ​​versus safety

Computer security is always a compromise between the speed of a process and how much protection is wrapped around it. This statement is also true when assembling containers, so below we will consider options for such a compromise.

The container image discussed above will keep its storage in /var/lib/containers. Therefore, we need to mount content to this folder, and how we do this will greatly affect the speed of building container images.

Let's consider three options.

Option 1. If maximum security is required, then for each container you can create your own folder for containers / image and connect it to the container via volume-mount. And besides, place the context directory in the container itself, in the /build folder:

# mkdir /var/lib/containers1
# podman run -v ./build:/build:z -v /var/lib/containers1:/var/lib/containers:Z quay.io/buildah/stable
buildah  -t image1 bud /build
# podman run -v /var/lib/containers1:/var/lib/containers:Z quay.io/buildah/stable buildah  push  image1 registry.company.com/myuser
# rm -rf /var/lib/containers1

Security. A Buildah running in such a container has maximum security: it is not given any root privileges by capabilities, and all SECOMP and SELinux restrictions apply to it. Such a container can even be run with User Namespace isolation by adding an option like --uidmap 0:100000:10000.

Performance. But the performance here is minimal, since any images from container registries are copied to the host each time, and caching does not work from the word “no way”. When it finishes its work, the Buildah container must send the image to the registry and destroy the content on the host. The next time the container image is built, it will have to be downloaded again from the registry, since nothing will be left on the host by that time.

Option 2. If you need Docker-level performance, you can mount the host's container/storage directly into the container.

# podman run -v ./build:/build:z -v /var/lib/containers:/var/lib/containers --security-opt label:disabled quay.io/buildah/stable buildah  -t image2 bud /build
# podman run -v /var/lib/containers:/var/lib/containers --security-opt label:disabled  quay.io/buildah/stable buildah push image2 registry.company.com/myuser

Security. This is the least secure way to build containers, as it allows the container to modify the storage on the host and could potentially slip a malicious image into Podman or CRI-O. In addition, you will need to disable SELinux separation so that the processes in the Buildah container can interact with the repository on the host. Note that this option is still better than a Docker socket, as the container is blocked by the remaining security features and cannot simply pick up and run any container on the host.

Performance. Here it is maximum, since caching is fully involved. If Podman or CRI-O has already downloaded the desired image to the host, then the Buildah process inside the container will not have to download it again, and subsequent builds based on this image will also be able to take the necessary one from the cache.

Option 3. The essence of this method is to combine several images into one project with a common folder for container images.

# mkdir /var/lib/project3
# podman run --security-opt label_level=s0:C100, C200 -v ./build:/build:z 
-v /var/lib/project3:/var/lib/containers:Z quay.io/buildah/stable buildah  -t image3 bud /build
# podman run --security-opt label_level=s0:C100, C200 
-v /var/lib/project3:/var/lib/containers quay.io/buildah/stable buildah push image3  registry.company.com/myuser

In this example, we don't delete the project folder (/var/lib/project3) between runs, so all subsequent builds within the project take advantage of caching.

Security. Something between options 1 and 2. On the one hand, containers do not have access to content on the host and, accordingly, cannot slip something bad into the Podman / CRI-O image storage. On the other hand, within its own project, a container can interfere with the assembly of other containers.

Performance. Here it is worse than using a shared cache at the host level, since you cannot use images that have already been downloaded using Podman / CRI-O. However, once Buildah has downloaded the image, that image can be used in any subsequent builds within the project.

Additional storage

У containers/storage there is such a cool thing as additional stores (additional stores), thanks to which, when starting and building containers, container engines can use external image stores in read-only overlay mode. In fact, you can add one or more read-only storages to the storage.conf file, so that when the container starts, the container engine will look for the desired image in them. Moreover, it will download the image from the registry only if it does not find it in any of these storages. The container engine will only be able to write to writable storage...

If we scroll up and look at the Dockerfile we use to build the quay.io/buildah/stable image, there are lines like this:

# Adjust storage.conf to enable Fuse storage.
RUN sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' /etc/containers/storage.conf
RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock

On the first line, we modify /etc/containers/storage.conf inside the container image, telling the storage driver to use "additionalimagestores" in the /var/lib/shared folder. And in the next line, we create a shared folder and add a couple of lock files so that there is no abuse from containers / storage. Essentially, we're just creating an empty container image store.

If you mount containers/storage a level above this folder, Buildah will be able to use the images.

Now let's return to Option 2 discussed above, when the Buildah container can read and write to containers / store on hosts and, accordingly, has maximum performance due to image caching at the Podman / CRI-O level, but gives a minimum of security, since it can write directly in storage. And now we'll screw in additional storage here and get the best of both worlds.

# mkdir /var/lib/containers4
# podman run -v ./build:/build:z -v /var/lib/containers/storage:/var/lib/shared:ro -v  /var/lib/containers4:/var/lib/containers:Z  quay.io/buildah/stable 
 buildah  -t image4 bud /build
# podman run -v /var/lib/containers/storage:/var/lib/shared:ro  
-v >/var/lib/containers4:/var/lib/containers:Z quay.io/buildah/stable buildah push image4  registry.company.com/myuser
# rm -rf /var/lib/continers4

Note that the host's /var/lib/containers/storage is mounted to /var/lib/shared inside the container in read-only mode. Therefore, working in a container, Buildah can use any images that have already been downloaded using Podman / CRI-O (hello, speed), but can only write to its own storage (hello, security). Also note that this is done without disabling SELinux separation for the container.

Important nuance

Under no circumstances should any images be deleted from the underlying repository. Otherwise, the Buildah container may crash.

And that's not all the benefits.

The possibilities for additional storage are not limited to the above scenario. For example, you can place all container images in a shared network storage and give access to it to all Buildah containers. Let's say we have hundreds of images that our CI/CD system regularly uses to build containerized images. We concentrate all these images on a single storage host and then, using the preferred network storage tools (NFS, Gluster, Ceph, iSCSI, S3 ...), share this storage with all Buildah or Kubernetes nodes.

Now it is enough to mount this network storage into the Buildah container on /var/lib/shared and that's it - Buildah containers no longer have to download images via pull at all. Thus, we throw out the pre-population phase and are immediately ready to roll out the containers.

And of course, this can be used within a live Kubernetes system or container infrastructure to launch and run containers anywhere without any image pull. Moreover, when a container registry receives a push request to upload an updated image to it, it can automatically send this image to a shared network storage, where it is instantly available to all nodes.

Container images can sometimes be many gigabytes in size. The functionality of additional storages eliminates the need for cloning such images by nodes and makes the launch of containers almost instantaneous.

In addition, we are currently working on a new overlay volume mounts feature that will make building containers even faster.

Conclusion

Running Buildah inside a container in a Kubernetes/CRI-O, Podman, or even Docker environment is quite possible, and it's simple and much safer than using docker.socket. We have greatly increased the flexibility of working with images, and now you can run them in a variety of ways for the best balance between security and performance.

The functionality of additional storages allows you to speed up or even completely eliminate the download of images to the nodes.

Source: habr.com

Add a comment