A few tips on how to speed up building Docker images. For example, up to 30 seconds

In today's time of complex orchestrators and CI/CD, there is a long way to go from commit to tests to delivery before a feature gets to production. Previously, you could upload new files via FTP (nobody does that anymore, right?), And the β€œdeploy” process took seconds. Now we need to create a merge request and wait a long time for the feature to reach users.

Part of this path is building the Docker image. Sometimes the assembly takes minutes, sometimes tens of minutes, which can hardly be called normal. In this article, we will take a simple application that we will pack into an image, apply several methods to speed up the build, and consider the nuances of these methods.

A few tips on how to speed up building Docker images. For example, up to 30 seconds

We have good experience in creating and supporting media sites: TASS, The Bell, "New Newspaper", Republic… Not so long ago, we replenished our portfolio by releasing a website for production Reminder. And while we were quickly adding new features and fixing old bugs, slow deployment became a big problem.

We do the deployment on GitLab. We collect images, push them to the GitLab Registry and roll them out for sale. In this list, the longest is the assembly of images. For example: without optimization, each backend build took 14 minutes.

A few tips on how to speed up building Docker images. For example, up to 30 seconds

In the end, it became clear that it was no longer possible to live like this, and we sat down to figure out why the images were gathering for so long. As a result, we managed to reduce the assembly time to 30 seconds!

A few tips on how to speed up building Docker images. For example, up to 30 seconds

For this article, in order not to be tied to the Reminder environment, let's consider an example of building an empty application in Angular. So let's create our application:

ng n app

Add PWA to it (we are progressive):

ng add @angular/pwa --project app

While a million npm packages are being downloaded, let's take a look at how the docker image works. Docker provides the ability to package applications and run them in an isolated environment called a container. Thanks to isolation, you can run many containers at the same time on the same server. Containers are much lighter than virtual machines because they run directly on the system kernel. To run a container with our application, we first need to create an image in which we will package everything that is necessary for our application to work. In essence, an image is a snapshot of the file system. For example, let's take a Dockerfile:

FROM node:12.16.2
WORKDIR /app
COPY . .
RUN npm ci
RUN npm run build --prod

Dockerfile is a set of instructions; by executing each of them, Docker will save changes to the file system and overlay them on the previous ones. Each team creates its own layer. And the finished image is layers combined together.

What is important to know: each docker layer can cache. If nothing has changed since the last build, then instead of executing the command, the docker will take the already finished layer. Since the main increase in build speed will be due to the use of the cache, in measuring the build speed we will pay attention to building the image with the cache ready. So, step by step:

  1. We remove the images locally so that previous runs do not affect the test.
    docker rmi $(docker images -q)
  2. We launch the build for the first time.
    time docker build -t app .
  3. We change the src/index.html file - we imitate the work of a programmer.
  4. We run the build for the second time.
    time docker build -t app .

If the environment for building images is set up correctly (more on that below), then the docker will already have a bunch of caches on board when the build starts. Our task is to learn how to use the cache so that the assembly is as fast as possible. Since we are assuming that the build without the cache only happens once - the very first time - we can therefore ignore how slow that first time was. In tests, the second run of the assembly is important for us, when the caches are already warmed up and we are ready to bake our pie. However, some tips will affect the first assembly too.

Let's put the Dockerfile described above in the project folder and run the build. All listings have been abbreviated for readability.

$ time docker build -t app .
Sending build context to Docker daemon 409MB
Step 1/5 : FROM node:12.16.2
Status: Downloaded newer image for node:12.16.2
Step 2/5 : WORKDIR /app
Step 3/5 : COPY . .
Step 4/5 : RUN npm ci
added 1357 packages in 22.47s
Step 5/5 : RUN npm run build --prod
Date: 2020-04-16T19:20:09.664Z - Hash: fffa0fddaa3425c55dd3 - Time: 37581ms
Successfully built c8c279335f46
Successfully tagged app:latest

real 5m4.541s
user 0m0.000s
sys 0m0.000s

Change the content of src/index.html and run it a second time.

$ time docker build -t app .
Sending build context to Docker daemon 409MB
Step 1/5 : FROM node:12.16.2
Step 2/5 : WORKDIR /app
 ---> Using cache
Step 3/5 : COPY . .
Step 4/5 : RUN npm ci
added 1357 packages in 22.47s
Step 5/5 : RUN npm run build --prod
Date: 2020-04-16T19:26:26.587Z - Hash: fffa0fddaa3425c55dd3 - Time: 37902ms
Successfully built 79f335df92d3
Successfully tagged app:latest

real 3m33.262s
user 0m0.000s
sys 0m0.000s

To see if we got the image, run the command docker images:

REPOSITORY   TAG      IMAGE ID       CREATED              SIZE
app          latest   79f335df92d3   About a minute ago   1.74GB

Before building, docker takes all the files in the current context and sends them to its daemon Sending build context to Docker daemon 409MB. The build context is specified as the last argument to the build command. In our case, this is the current directory - ".", - and the docker drags everything that we have in this folder. 409 MB is a lot: let's think about how to fix it.

Reduce the context

To reduce the context, there are two options. Or put all the files needed for the build in a separate folder and point the docker context to this folder. This may not always be convenient, so it is possible to specify exceptions: what should not be dragged into the context. To do this, put the .dockerignore file in the project and specify what is not needed for the build:

.git
/node_modules

and run the build again:

$ time docker build -t app .
Sending build context to Docker daemon 607.2kB
Step 1/5 : FROM node:12.16.2
Step 2/5 : WORKDIR /app
 ---> Using cache
Step 3/5 : COPY . .
Step 4/5 : RUN npm ci
added 1357 packages in 22.47s
Step 5/5 : RUN npm run build --prod
Date: 2020-04-16T19:33:54.338Z - Hash: fffa0fddaa3425c55dd3 - Time: 37313ms
Successfully built 4942f010792a
Successfully tagged app:latest

real 1m47.763s
user 0m0.000s
sys 0m0.000s

607.2 KB is much better than 409 MB. We also reduced the image size from 1.74GB to 1.38GB:

REPOSITORY   TAG      IMAGE ID       CREATED         SIZE
app          latest   4942f010792a   3 minutes ago   1.38GB

Let's try to reduce the size of the image even more.

Using Alpine

Another way to save on image size is to use a small parent image. The parent image is the image that our image is based on. The bottom layer is specified by the command FROM in Dockerfile. In our case, we are using an Ubuntu-based image that already has nodejs installed. And he weighs...

$ docker images -a | grep node
node 12.16.2 406aa3abbc6c 17 minutes ago 916MB

... almost a gigabyte. You can significantly reduce the volume using an image based on Alpine Linux. Alpine is a very small Linux. The docker image for nodejs based on alpine is only 88.5 MB. So let's replace our living house image:

FROM node:12.16.2-alpine3.11
RUN apk --no-cache --update --virtual build-dependencies add 
    python 
    make 
    g++
WORKDIR /app
COPY . .
RUN npm ci
RUN npm run build --prod

We had to install some things that are necessary to build the application. Yes, Angular won't build without python Β―(Β°_o)/Β―

But on the other hand, the size of the image dropped 150 MB:

REPOSITORY   TAG      IMAGE ID       CREATED          SIZE
app          latest   aa031edc315a   22 minutes ago   761MB

Let's go even further.

Multistage assembly

Not everything that is in the image is what we need in production.

$ docker run app ls -lah
total 576K
drwxr-xr-x 1 root root 4.0K Apr 16 19:54 .
drwxr-xr-x 1 root root 4.0K Apr 16 20:00 ..
-rwxr-xr-x 1 root root 19 Apr 17 2020 .dockerignore
-rwxr-xr-x 1 root root 246 Apr 17 2020 .editorconfig
-rwxr-xr-x 1 root root 631 Apr 17 2020 .gitignore
-rwxr-xr-x 1 root root 181 Apr 17 2020 Dockerfile
-rwxr-xr-x 1 root root 1020 Apr 17 2020 README.md
-rwxr-xr-x 1 root root 3.6K Apr 17 2020 angular.json
-rwxr-xr-x 1 root root 429 Apr 17 2020 browserslist
drwxr-xr-x 3 root root 4.0K Apr 16 19:54 dist
drwxr-xr-x 3 root root 4.0K Apr 17 2020 e2e
-rwxr-xr-x 1 root root 1015 Apr 17 2020 karma.conf.js
-rwxr-xr-x 1 root root 620 Apr 17 2020 ngsw-config.json
drwxr-xr-x 1 root root 4.0K Apr 16 19:54 node_modules
-rwxr-xr-x 1 root root 494.9K Apr 17 2020 package-lock.json
-rwxr-xr-x 1 root root 1.3K Apr 17 2020 package.json
drwxr-xr-x 5 root root 4.0K Apr 17 2020 src
-rwxr-xr-x 1 root root 210 Apr 17 2020 tsconfig.app.json
-rwxr-xr-x 1 root root 489 Apr 17 2020 tsconfig.json
-rwxr-xr-x 1 root root 270 Apr 17 2020 tsconfig.spec.json
-rwxr-xr-x 1 root root 1.9K Apr 17 2020 tslint.json

With docker run app ls -lah we started the container based on our image app and run the command in it ls -lah, after which the container completed its work.

On the sale, we only need a folder dist. In this case, the files somehow need to be given out. You can run any HTTP server on nodejs. But we will make it easier. Guess the Russian word in which there are four letters "y". Right! Ynzhynyksy. Take an image from nginx, put a folder in it dist and a small config:

server {
    listen 80 default_server;
    server_name localhost;
    charset utf-8;
    root /app/dist;

    location / {
        try_files $uri $uri/ /index.html;
    }
}

Multi-stage build will help us to do all this. Let's change our Dockerfile:

FROM node:12.16.2-alpine3.11 as builder
RUN apk --no-cache --update --virtual build-dependencies add 
    python 
    make 
    g++
WORKDIR /app
COPY . .
RUN npm ci
RUN npm run build --prod

FROM nginx:1.17.10-alpine
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx/static.conf /etc/nginx/conf.d
COPY --from=builder /app/dist/app .

Now we have two instructions FROM in the Dockerfile, each of them runs its own build step. We named the first builder, but starting from the last FROM, our final image will be prepared. The last step is to copy the artifact of our assembly in the previous step into the final image with nginx. The size of the image has been significantly reduced:

REPOSITORY   TAG      IMAGE ID       CREATED          SIZE
app          latest   2c6c5da07802   29 minutes ago   36MB

Let's start the container with our image and make sure everything works:

docker run -p8080:80 app

With the -p8080:80 option, we forwarded port 8080 on our host machine to port 80 inside the container where nginx is running. Open in browser http://localhost:8080/ and see our application. Everything is working!

A few tips on how to speed up building Docker images. For example, up to 30 seconds

Reducing the image size from 1.74 GB to 36 MB significantly reduces the time it takes to deliver your application to production. But let's get back to build time.

$ time docker build -t app .
Sending build context to Docker daemon 608.8kB
Step 1/11 : FROM node:12.16.2-alpine3.11 as builder
Step 2/11 : RUN apk --no-cache --update --virtual build-dependencies add python make g++
 ---> Using cache
Step 3/11 : WORKDIR /app
 ---> Using cache
Step 4/11 : COPY . .
Step 5/11 : RUN npm ci
added 1357 packages in 47.338s
Step 6/11 : RUN npm run build --prod
Date: 2020-04-16T21:16:03.899Z - Hash: fffa0fddaa3425c55dd3 - Time: 39948ms
 ---> 27f1479221e4
Step 7/11 : FROM nginx:stable-alpine
Step 8/11 : WORKDIR /app
 ---> Using cache
Step 9/11 : RUN rm /etc/nginx/conf.d/default.conf
 ---> Using cache
Step 10/11 : COPY nginx/static.conf /etc/nginx/conf.d
 ---> Using cache
Step 11/11 : COPY --from=builder /app/dist/app .
Successfully built d201471c91ad
Successfully tagged app:latest

real 2m17.700s
user 0m0.000s
sys 0m0.000s

Changing the order of layers

We have cached the first three steps (hint Using cache). At the fourth step, all project files are copied and dependencies are installed at the fifth step RUN npm ci - as many as 47.338s. Why re-install dependencies every time if they change very rarely? Let's see why they are not cached. The point is that docker will check layer by layer to see if the command and the files associated with it have changed. At the fourth step, we copy all the files of our project, and among them, of course, there are changes, so the docker not only does not take this layer from the cache, but all subsequent ones! Let's make some small changes to the Dockerfile.

FROM node:12.16.2-alpine3.11 as builder
RUN apk --no-cache --update --virtual build-dependencies add 
    python 
    make 
    g++
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build --prod

FROM nginx:1.17.10-alpine
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx/static.conf /etc/nginx/conf.d
COPY --from=builder /app/dist/app .

First, package.json and package-lock.json are copied, then dependencies are installed, and only after that the entire project is copied. As a result:

$ time docker build -t app .
Sending build context to Docker daemon 608.8kB
Step 1/12 : FROM node:12.16.2-alpine3.11 as builder
Step 2/12 : RUN apk --no-cache --update --virtual build-dependencies add python make g++
 ---> Using cache
Step 3/12 : WORKDIR /app
 ---> Using cache
Step 4/12 : COPY package*.json ./
 ---> Using cache
Step 5/12 : RUN npm ci
 ---> Using cache
Step 6/12 : COPY . .
Step 7/12 : RUN npm run build --prod
Date: 2020-04-16T21:29:44.770Z - Hash: fffa0fddaa3425c55dd3 - Time: 38287ms
 ---> 1b9448c73558
Step 8/12 : FROM nginx:stable-alpine
Step 9/12 : WORKDIR /app
 ---> Using cache
Step 10/12 : RUN rm /etc/nginx/conf.d/default.conf
 ---> Using cache
Step 11/12 : COPY nginx/static.conf /etc/nginx/conf.d
 ---> Using cache
Step 12/12 : COPY --from=builder /app/dist/app .
Successfully built a44dd7c217c3
Successfully tagged app:latest

real 0m46.497s
user 0m0.000s
sys 0m0.000s

46 seconds instead of 3 minutes - much better! The correct order of the layers is important: first we copy what does not change, then what rarely changes, and at the end - what often.

Next, a few words about building images in CI / CD systems.

Using previous images for cache

If we use some kind of SaaS solution to build, then the local docker cache can be clean and fresh. In order for the docker to get the baked layers from, give it the previous built image.

For example, consider building our application in GitHub Actions. We use this config

on:
  push:
    branches:
      - master

name: Test docker build

jobs:
  deploy:
    name: Build
    runs-on: ubuntu-latest
    env:
      IMAGE_NAME: docker.pkg.github.com/${{ github.repository }}/app
      IMAGE_TAG: ${{ github.sha }}

    steps:
    - name: Checkout
      uses: actions/checkout@v2

    - name: Login to GitHub Packages
      env:
        TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        docker login docker.pkg.github.com -u $GITHUB_ACTOR -p $TOKEN

    - name: Build
      run: |
        docker build 
          -t $IMAGE_NAME:$IMAGE_TAG 
          -t $IMAGE_NAME:latest 
          .

    - name: Push image to GitHub Packages
      run: |
        docker push $IMAGE_NAME:latest
        docker push $IMAGE_NAME:$IMAGE_TAG

    - name: Logout
      run: |
        docker logout docker.pkg.github.com

The image is built and pushed to GitHub Packages in two minutes and 20 seconds:

A few tips on how to speed up building Docker images. For example, up to 30 seconds

Now let's change the build to use the cache based on the previous built images:

on:
  push:
    branches:
      - master

name: Test docker build

jobs:
  deploy:
    name: Build
    runs-on: ubuntu-latest
    env:
      IMAGE_NAME: docker.pkg.github.com/${{ github.repository }}/app
      IMAGE_TAG: ${{ github.sha }}

    steps:
    - name: Checkout
      uses: actions/checkout@v2

    - name: Login to GitHub Packages
      env:
        TOKEN: ${{ secrets.GITHUB_TOKEN }}
      run: |
        docker login docker.pkg.github.com -u $GITHUB_ACTOR -p $TOKEN

    - name: Pull latest images
      run: |
        docker pull $IMAGE_NAME:latest || true
        docker pull $IMAGE_NAME-builder-stage:latest || true

    - name: Images list
      run: |
        docker images

    - name: Build
      run: |
        docker build 
          --target builder 
          --cache-from $IMAGE_NAME-builder-stage:latest 
          -t $IMAGE_NAME-builder-stage 
          .
        docker build 
          --cache-from $IMAGE_NAME-builder-stage:latest 
          --cache-from $IMAGE_NAME:latest 
          -t $IMAGE_NAME:$IMAGE_TAG 
          -t $IMAGE_NAME:latest 
          .

    - name: Push image to GitHub Packages
      run: |
        docker push $IMAGE_NAME-builder-stage:latest
        docker push $IMAGE_NAME:latest
        docker push $IMAGE_NAME:$IMAGE_TAG

    - name: Logout
      run: |
        docker logout docker.pkg.github.com

First you need to tell why two commands are run build. The fact is that in a multistage assembly, the resulting image will be a set of layers from the last stage. In this case, the layers from the previous layers will not fall into the image. Therefore, when using the final image from the previous build, Docker will not be able to find ready-made layers for building the image with nodejs (builder stage). In order to solve this problem, an intermediate image is created $IMAGE_NAME-builder-stage and pushed to GitHub Packages so that it can be used in a subsequent build as a cache source.

A few tips on how to speed up building Docker images. For example, up to 30 seconds

The total assembly time was reduced to one and a half minutes. Half a minute is spent pulling up the previous images.

Preimaging

Another way to solve the problem of a clean Docker cache is to move some of the layers into another Dockerfile, build it separately, push it to the Container Registry and use it as a parent.

We create our own nodejs image to build the Angular application. Create a Dockerfile.node in the project

FROM node:12.16.2-alpine3.11
RUN apk --no-cache --update --virtual build-dependencies add 
    python 
    make 
    g++

Building and pushing a public image in Docker Hub:

docker build -t exsmund/node-for-angular -f Dockerfile.node .
docker push exsmund/node-for-angular:latest

Now in our main Dockerfile we use the finished image:

FROM exsmund/node-for-angular:latest as builder
...

In our example, the build time did not decrease, but pre-built images can be useful if you have many projects and you have to put the same dependencies in each of them.

A few tips on how to speed up building Docker images. For example, up to 30 seconds

We looked at several methods to speed up the build of docker images. If you want the deployment to be fast, try to apply in your project:

  • context reduction;
  • use of small parent images;
  • multistage assembly;
  • changing the order of instructions in the Dockerfile to make efficient use of the cache;
  • setting up a cache in CI/CD systems;
  • pre-imaging.

I hope that the example will make it clearer how Docker works, and you will be able to optimally configure your deployment. In order to play around with the examples from the article, a repository has been created https://github.com/devopsprodigy/test-docker-build.

Source: habr.com

Add a comment