What is the beauty of separating the container runtime into separate tool components? In particular, the fact that these tools can start to be combined so that they protect each other.
Many people are attracted by the idea of building OCI container images within
So people are constantly trying to run Buildah in a container. In short, we have created
Setting
These images are built from Dockerfiles, which can be found in the Buildah repository in the folder
Here we will consider
# stable/Dockerfile
#
# Build a Buildah container image from the latest
# stable version of Buildah on the Fedoras Updates System.
# https://bodhi.fedoraproject.org/updates/?search=buildah
# This image can be used to create a secured container
# that runs safely with privileges within the container.
#
FROM fedora:latest
# Don't include container-selinux and remove
# directories used by dnf that are just taking
# up space.
RUN yum -y install buildah fuse-overlayfs --exclude container-selinux; rm -rf /var/cache /var/log/dnf* /var/log/yum.*
# Adjust storage.conf to enable Fuse storage.
RUN sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' /etc/containers/storage.conf
Instead of OverlayFS implemented at the level of the Linux kernel of the host, we use the program inside the container
podman run --device /dev/fuse quay.io/buildahctr ...
RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock
Next, we create a directory for additional repositories.
# Set up environment variables to note that this is
# not starting with user namespace and default to
# isolate the filesystem with chroot.
ENV _BUILDAH_STARTED_IN_USERNS="" BUILDAH_ISOLATION=chroot
Finally, we use the BUILDAH_ISOLATION environment variable to tell the Buildah container to start with chroot isolation by default. Additional isolation is not required here, since we are already working in a container. In order for Buildah to create its own namespace-separated containers, the SYS_ADMIN privilege is required, which would require loosening the container's SELinux and SECCOMP rules, which would conflict with our setup to build from a secure container.
Run Buildah inside a container
The Buildah container image scheme discussed above allows you to flexibly vary how such containers are launched.
Speed versus safety
Computer security is always a compromise between the speed of a process and how much protection is wrapped around it. This statement is also true when assembling containers, so below we will consider options for such a compromise.
The container image discussed above will keep its storage in /var/lib/containers. Therefore, we need to mount content to this folder, and how we do this will greatly affect the speed of building container images.
Let's consider three options.
Option 1. If maximum security is required, then for each container you can create your own folder for containers / image and connect it to the container via volume-mount. And besides, place the context directory in the container itself, in the /build folder:
# mkdir /var/lib/containers1
# podman run -v ./build:/build:z -v /var/lib/containers1:/var/lib/containers:Z quay.io/buildah/stable
buildah -t image1 bud /build
# podman run -v /var/lib/containers1:/var/lib/containers:Z quay.io/buildah/stable buildah push image1 registry.company.com/myuser
# rm -rf /var/lib/containers1
Security. A Buildah running in such a container has maximum security: it is not given any root privileges by capabilities, and all SECOMP and SELinux restrictions apply to it. Such a container can even be run with User Namespace isolation by adding an option like --uidmap 0:100000:10000.
Performance. But the performance here is minimal, since any images from container registries are copied to the host each time, and caching does not work from the word “no way”. When it finishes its work, the Buildah container must send the image to the registry and destroy the content on the host. The next time the container image is built, it will have to be downloaded again from the registry, since nothing will be left on the host by that time.
Option 2. If you need Docker-level performance, you can mount the host's container/storage directly into the container.
# podman run -v ./build:/build:z -v /var/lib/containers:/var/lib/containers --security-opt label:disabled quay.io/buildah/stable buildah -t image2 bud /build
# podman run -v /var/lib/containers:/var/lib/containers --security-opt label:disabled quay.io/buildah/stable buildah push image2 registry.company.com/myuser
Security. This is the least secure way to build containers, as it allows the container to modify the storage on the host and could potentially slip a malicious image into Podman or CRI-O. In addition, you will need to disable SELinux separation so that the processes in the Buildah container can interact with the repository on the host. Note that this option is still better than a Docker socket, as the container is blocked by the remaining security features and cannot simply pick up and run any container on the host.
Performance. Here it is maximum, since caching is fully involved. If Podman or CRI-O has already downloaded the desired image to the host, then the Buildah process inside the container will not have to download it again, and subsequent builds based on this image will also be able to take the necessary one from the cache.
Option 3. The essence of this method is to combine several images into one project with a common folder for container images.
# mkdir /var/lib/project3
# podman run --security-opt label_level=s0:C100, C200 -v ./build:/build:z
-v /var/lib/project3:/var/lib/containers:Z quay.io/buildah/stable buildah -t image3 bud /build
# podman run --security-opt label_level=s0:C100, C200
-v /var/lib/project3:/var/lib/containers quay.io/buildah/stable buildah push image3 registry.company.com/myuser
In this example, we don't delete the project folder (/var/lib/project3) between runs, so all subsequent builds within the project take advantage of caching.
Security. Something between options 1 and 2. On the one hand, containers do not have access to content on the host and, accordingly, cannot slip something bad into the Podman / CRI-O image storage. On the other hand, within its own project, a container can interfere with the assembly of other containers.
Performance. Here it is worse than using a shared cache at the host level, since you cannot use images that have already been downloaded using Podman / CRI-O. However, once Buildah has downloaded the image, that image can be used in any subsequent builds within the project.
Additional storage
У
If we scroll up and look at the Dockerfile we use to build the quay.io/buildah/stable image, there are lines like this:
# Adjust storage.conf to enable Fuse storage.
RUN sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' /etc/containers/storage.conf
RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock
On the first line, we modify /etc/containers/storage.conf inside the container image, telling the storage driver to use "additionalimagestores" in the /var/lib/shared folder. And in the next line, we create a shared folder and add a couple of lock files so that there is no abuse from containers / storage. Essentially, we're just creating an empty container image store.
If you mount containers/storage a level above this folder, Buildah will be able to use the images.
Now let's return to Option 2 discussed above, when the Buildah container can read and write to containers / store on hosts and, accordingly, has maximum performance due to image caching at the Podman / CRI-O level, but gives a minimum of security, since it can write directly in storage. And now we'll screw in additional storage here and get the best of both worlds.
# mkdir /var/lib/containers4
# podman run -v ./build:/build:z -v /var/lib/containers/storage:/var/lib/shared:ro -v /var/lib/containers4:/var/lib/containers:Z quay.io/buildah/stable
buildah -t image4 bud /build
# podman run -v /var/lib/containers/storage:/var/lib/shared:ro
-v >/var/lib/containers4:/var/lib/containers:Z quay.io/buildah/stable buildah push image4 registry.company.com/myuser
# rm -rf /var/lib/continers4
Note that the host's /var/lib/containers/storage is mounted to /var/lib/shared inside the container in read-only mode. Therefore, working in a container, Buildah can use any images that have already been downloaded using Podman / CRI-O (hello, speed), but can only write to its own storage (hello, security). Also note that this is done without disabling SELinux separation for the container.
Important nuance
Under no circumstances should any images be deleted from the underlying repository. Otherwise, the Buildah container may crash.
And that's not all the benefits.
The possibilities for additional storage are not limited to the above scenario. For example, you can place all container images in a shared network storage and give access to it to all Buildah containers. Let's say we have hundreds of images that our CI/CD system regularly uses to build containerized images. We concentrate all these images on a single storage host and then, using the preferred network storage tools (NFS, Gluster, Ceph, iSCSI, S3 ...), share this storage with all Buildah or Kubernetes nodes.
Now it is enough to mount this network storage into the Buildah container on /var/lib/shared and that's it - Buildah containers no longer have to download images via pull at all. Thus, we throw out the pre-population phase and are immediately ready to roll out the containers.
And of course, this can be used within a live Kubernetes system or container infrastructure to launch and run containers anywhere without any image pull. Moreover, when a container registry receives a push request to upload an updated image to it, it can automatically send this image to a shared network storage, where it is instantly available to all nodes.
Container images can sometimes be many gigabytes in size. The functionality of additional storages eliminates the need for cloning such images by nodes and makes the launch of containers almost instantaneous.
In addition, we are currently working on a new overlay volume mounts feature that will make building containers even faster.
Conclusion
Running Buildah inside a container in a Kubernetes/CRI-O, Podman, or even Docker environment is quite possible, and it's simple and much safer than using docker.socket. We have greatly increased the flexibility of working with images, and now you can run them in a variety of ways for the best balance between security and performance.
The functionality of additional storages allows you to speed up or even completely eliminate the download of images to the nodes.
Source: habr.com