Tiny Docker images that believed in themselves*

[reference to the American children's fairy tale " The Little Engine That Could" - approx. transl.]*

Tiny Docker images that believed in themselves*

How to automagically create tiny docker images for your needs

Unusual obsession

For the last couple of months, I've been obsessed with the idea of ​​how much can I shrink the Docker image so that the application still works?

I know it's a strange idea.

Before I get into the details and technicalities, I'd like to explain what this issue has got me so hooked on, and how it affects you.

Why Size Matters

By reducing the contents of the Docker image, we are thereby reducing the list of vulnerabilities. Additionally, we're making the images cleaner, because they contain only what is needed to run applications.

There is one more small advantage - the images download a little faster, but, as for me, this is not so important.

Please Note: If size is your concern, the Alpine skins are small on their own and will probably fit you.

Distroless images

Project Distroless offers a selection of basic "distroless" images, they do not contain package managers, shells and other utilities that you are used to seeing on the command line. As a result, use package managers like pip и apt will not work:

FROM gcr.io/distroless/python3
RUN  pip3 install numpy

Dockerfile using Python 3 distroless image

Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM gcr.io/distroless/python3
 ---> 556d570d5c53
Step 2/2 : RUN  pip3 install numpy
 ---> Running in dbfe5623f125
/bin/sh: 1: pip3: not found

Pip is not in the image

Usually such a problem is solved by a multi-stage build:

FROM python:3 as builder
RUN  pip3 install numpy

FROM gcr.io/distroless/python3
COPY --from=builder /usr/local/lib/python3.7/site-packages /usr/local/lib/python3.5/

Multi-stage assembly

The result is a 130MB image. Not so bad! For comparison: the default Python image weighs 929MB, and the “thinner” (3,7-slim) - 179MB, alpine image (3,7-alpine) is 98,6MB, while the base distroless image used in the example is 50,9MB.

It can be rightly pointed out that in the previous example we are copying an entire directory /usr/local/lib/python3.7/site-packages, which may contain dependencies we do not need. Although it is clear that the size difference of all existing Python base images fluctuates.

As of this writing, Google distroless doesn't support many images: Java and Python are still experimental, and Python only exists for 2,7 and 3,5.

tiny images

Back to my obsession with creating small images.

In general, I wanted to see how distroless images are arranged. The distroless project uses Google's build tool bazel. However, installing Bazel and writing your own images took a lot of work (and to be completely honest, reinventing the wheel is fun and educational). I wanted to simplify the creation of reduced images: the act of creating an image should be as simple as possible, banal. No configuration files for you, just one line in the console: просто собрать образ для <приложение>.

So, if you want to create your own images, then you should know: there is such a unique docker image, scratch. Scratch is an "empty" image, there are no files in it, although it weighs by default - wow! - 77 bytes.

FROM scratch

scratch image

The idea behind the scratch image is that you can copy any dependencies from the host machine into it and either use them inside a Dockerfile (that's like copying them into apt and install from scratch), or later when the Docker image is materialized. This gives you full control over the contents of the Docker container, and thus full control over the size of the image.

And now we need to somehow collect these dependencies. Existing tools like apt allow you to download packages, but they are tied to the current machine and, after all, do not support Windows or MacOS.

And so I undertook to build my own tool that would automatically build the smallest possible base image and make it run any application. I used Ubuntu/Debian packages, fetched (getting packages straight from the repositories) and recursively found their dependencies. The program was supposed to automatically download the latest stable version of the package, minimizing security risks as much as possible.

I named the tool fetchybecause he… finds and brings… what you need [from English. "fetch", "bring" - approx. per.]. The tool works through a command line interface but offers an API at the same time.

To build an image using fetchy (take a Python image this time), you just need to use the CLI like this: fetchy dockerize python. You may be prompted for the target operating system and codename because fetchy so far only uses Debian and Ubuntu based packages.

Now you can choose which dependencies are not needed at all (in our context) and exclude them. For example, Python depends on perl, although it works fine without Perl installed.

The results

Python image created with the command fetchy dockerize python3.5 weighs only 35MB (I'm more than sure that in the future it can be lightened even more). It turns out that it was possible to “shave off” another 15MB from the distroless image.

You can see all images collected at the moment here.

Project - here.

If you're missing features, just create a ticket - I'll be happy to help 🙂 Even more, I'm currently working on integrating other package managers into fetchy, so that there is no need for multi-stage builds.

Source: habr.com

Add a comment