[reference to the American children's fairy tale " The Little Engine That Could" - approx. transl.]*
How to automagically create tiny docker images for your needs
Unusual obsession
For the last couple of months, I've been obsessed with the idea of how much can I shrink the Docker image so that the application still works?
I know it's a strange idea.
Before I get into the details and technicalities, I'd like to explain what this issue has got me so hooked on, and how it affects you.
Why Size Matters
By reducing the contents of the Docker image, we are thereby reducing the list of vulnerabilities. Additionally, we're making the images cleaner, because they contain only what is needed to run applications.
There is one more small advantage - the images download a little faster, but, as for me, this is not so important.
Please Note: If size is your concern, the Alpine skins are small on their own and will probably fit you.
Distroless images
pip
и apt
will not work:
FROM gcr.io/distroless/python3
RUN pip3 install numpy
Dockerfile using Python 3 distroless image
Sending build context to Docker daemon 2.048kB
Step 1/2 : FROM gcr.io/distroless/python3
---> 556d570d5c53
Step 2/2 : RUN pip3 install numpy
---> Running in dbfe5623f125
/bin/sh: 1: pip3: not found
Pip is not in the image
Usually such a problem is solved by a multi-stage build:
FROM python:3 as builder
RUN pip3 install numpy
FROM gcr.io/distroless/python3
COPY --from=builder /usr/local/lib/python3.7/site-packages /usr/local/lib/python3.5/
Multi-stage assembly
The result is a 130MB image. Not so bad! For comparison: the default Python image weighs 929MB, and the “thinner” (3,7-slim
) - 179MB, alpine image (3,7-alpine
) is 98,6MB, while the base distroless image used in the example is 50,9MB.
It can be rightly pointed out that in the previous example we are copying an entire directory /usr/local/lib/python3.7/site-packages
, which may contain dependencies we do not need. Although it is clear that the size difference of all existing Python base images fluctuates.
As of this writing, Google distroless doesn't support many images: Java and Python are still experimental, and Python only exists for 2,7 and 3,5.
tiny images
Back to my obsession with creating small images.
In general, I wanted to see how distroless images are arranged. The distroless project uses Google's build tool bazel
. However, installing Bazel and writing your own images took a lot of work (and to be completely honest, reinventing the wheel is fun and educational). I wanted to simplify the creation of reduced images: the act of creating an image should be as simple as possible, banal. No configuration files for you, just one line in the console: просто собрать образ для <приложение>
.
So, if you want to create your own images, then you should know: there is such a unique docker image, scratch
. Scratch is an "empty" image, there are no files in it, although it weighs by default - wow! - 77 bytes.
FROM scratch
scratch image
The idea behind the scratch image is that you can copy any dependencies from the host machine into it and either use them inside a Dockerfile (that's like copying them into apt
and install from scratch), or later when the Docker image is materialized. This gives you full control over the contents of the Docker container, and thus full control over the size of the image.
And now we need to somehow collect these dependencies. Existing tools like apt
allow you to download packages, but they are tied to the current machine and, after all, do not support Windows or MacOS.
And so I undertook to build my own tool that would automatically build the smallest possible base image and make it run any application. I used Ubuntu/Debian packages, fetched (getting packages straight from the repositories) and recursively found their dependencies. The program was supposed to automatically download the latest stable version of the package, minimizing security risks as much as possible.
I named the tool fetchy
because he… finds and brings… what you need [from English. "fetch", "bring" - approx. per.]. The tool works through a command line interface but offers an API at the same time.
To build an image using fetchy
(take a Python image this time), you just need to use the CLI like this: fetchy dockerize python
. You may be prompted for the target operating system and codename because fetchy
so far only uses Debian and Ubuntu based packages.
Now you can choose which dependencies are not needed at all (in our context) and exclude them. For example, Python depends on perl, although it works fine without Perl installed.
The results
Python image created with the command fetchy dockerize python3.5
weighs only 35MB (I'm more than sure that in the future it can be lightened even more). It turns out that it was possible to “shave off” another 15MB from the distroless image.
You can see all images collected at the moment
Project -
If you're missing features, just create a ticket - I'll be happy to help 🙂 Even more, I'm currently working on integrating other package managers into fetchy, so that there is no need for multi-stage builds.
Source: habr.com