Docker Tips: Clean your machine of junk

Docker Tips: Clean your machine of junk

Hey Habr! I present to your attention the translation of the article "Docker Tips: Clean Up Your Local Machine" Author Luc Juggery.

Today we'll talk about how Docker uses the disk space of the host machine, and also figure out how to free this space from the scraps of unused images and containers.


Docker Tips: Clean your machine of junk

Total consumption

Docker is a cool thing, probably few people doubt it today. Just a few years ago, this product gave us a completely new way to build, deliver and run any environment, allowing us to significantly save CPU and RAM resources. In addition to this (and for some it will even be the most important thing), Docker has allowed us to incredibly simplify and unify the lifecycle management of the working environments that we use.

However, you have to pay for all these delights of modern life. When we run containers, download or build our own images, deploy complex ecosystems, we have to pay. And we pay, including disk space.

If you have never thought about how much space is actually taken up by Docker on your machine, then you may be unpleasantly surprised by the output of this command:

$ docker system df

Docker Tips: Clean your machine of junk

This shows Docker's disk usage in various sections:

  • images (images) - the total size of the images that were downloaded from the image repositories and built on your system;
  • containers (containers) - the total amount of disk space used by running containers (meaning the total amount of read-write layers of all containers);
  • local volumes – volume of local storages mounted to containers;
  • build cache - temporary files generated by the image building process (when using the BuildKit tool, available since Docker version 18.09).

I bet that after this simple transfer, you are burning with the desire to clean the disk of garbage and return precious gigabytes to life

Disk usage by containers

Each time a container is created on the host machine, several files and directories are created in the /var/lib/docker directory, among which the following are worth noting:

  • Directory /var/lib/docker/containers/container_ID - when using the standard logging driver, this is where the event logs are stored in JSON format. Too detailed logs, as well as logs that no one reads or otherwise processes, often cause disks to fill up.
  • The /var/lib/docker/overlay2 directory contains the container read/write layers (overlay2 is the preferred driver on most Linux distributions). If the container saves data in its file system, then it is in this directory that they will be placed.

Let's imagine a system with a pristine Docker installed, never involved in running containers and building images. Its disk usage report will look like this:

$ docker system df
TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE
Images         0          0          0B         0B
Containers     0          0          0B         0B
Local Volumes  0          0          0B         0B
Build Cache    0          0          0B         0B

Let's start some container, for example, NGINX:

$ docker container run --name www -d -p 8000:80 nginx:1.16

What happens to the disk:

  • images (images) occupy 126 MB, this is the same NGINX that we launched in the container;
  • containers take up a ridiculous 2 bytes.

$ docker system df
TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE
Images         1          1          126M       0B (0%)
Containers     1          1          2B         0B (0%)
Local Volumes  0          0          0B         0B
Build Cache    0          0          0B         0B

Based on the output, we don't have any space yet to free up. Since 2 bytes is completely frivolous, let's imagine that our NGINX, unexpectedly for everyone, wrote 100 megabytes of data somewhere and created the test.img file inside itself of exactly that size.

$ docker exec -ti www 
  dd if=/dev/zero of=test.img bs=1024 count=0 seek=$[1024*100]

Let's examine the disk space usage on the host again. We will see that the container (containers) occupies 100 megabytes there.

$ docker system df
TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE
Images         1          1          126M       0B (0%)
Containers     1          1          104.9MB    0B (0%)
Local Volumes  0          0          0B         0B
Build Cache    0          0          0B         0B

I think your inquisitive brain is already wondering where our test.img file is located. Let's eat it:

$ find /var/lib/docker -type f -name test.img
/var/lib/docker/overlay2/83f177...630078/merged/test.img
/var/lib/docker/overlay2/83f177...630078/diff/test.img

Without going into details, it can be noted that the test.img file is conveniently located at the read-write level, controlled by the overlay2 driver. If we stop our container, then the host will tell us that this place, in principle, can be released:

# Stopping the www container
$ docker stop www

# Visualizing the impact on the disk usage
$ docker system df
TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE
Images         1          1          126M       0B (0%)
Containers     1          0          104.9MB    104.9MB (100%)
Local Volumes  0          0          0B         0B
Build Cache    0          0          0B         0B

How can we do this? Deleting the container, which will clear the corresponding space at the read-write level.

With the following command, you can remove all installed containers in one fell swoop and clean up your disk of all read-write-level files created by them:

$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
5e7f8e5097ace9ef5518ebf0c6fc2062ff024efb495f11ccc89df21ec9b4dcc2

Total reclaimed space: 104.9MB

So, we freed up 104,9 megabytes by deleting the container. But since we are no longer using the previously downloaded image, it also becomes a candidate for deletion and release of our resources:

$ docker system df
TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE
Images         1          0          126M       126M (100%)
Containers     0          0          0B         0B
Local Volumes  0          0          0B         0B
Build Cache    0          0          0B         0B

Warning: as long as the image is being used by at least one container, you won't be able to use this trick.

The prune subcommand we used above only has an effect on stopped containers. If we want to remove not only stopped, but also running containers, we should use one of these commands:

# Historical command
$ docker rm -f $(docker ps –aq)

# More recent command
$ docker container rm -f $(docker container ls -aq)

Side note: If you use the --rm option when starting a container, then stopping it will free up all the disk space it occupied.

Using disk images

Several years ago, an image size of several hundred megabytes was perfectly normal: an Ubuntu image was 600 megabytes, and a Microsoft .Net image was several gigabytes. Back in those shaggy times, downloading just one image could take a big toll on your free disk space, even if you shared levels between images. Today - praise be to the greats - images weigh a lot less, but even so, you can quickly fill up the available resources if you do not take some precautions.

There are several types of images that are not directly visible to the end user:

  • intermediate images, on the basis of which other images are built into - they cannot be deleted if you use containers based on these same "other" images;
  • dangling images are intermediate images that are not referenced by any of the running containers - they can be removed.
  • You can check if your system has dangling images with the following command:

$ docker image ls -f dangling=true
REPOSITORY  TAG      IMAGE ID         CREATED             SIZE
none      none   21e658fe5351     12 minutes ago      71.3MB

You can remove them in the following way:

$ docker image rm $(docker image ls -f dangling=true -q)

We can also use the prune subcommand:

$ docker image prune
WARNING! This will remove all dangling images.
Are you sure you want to continue? [y/N] y
Deleted Images:
deleted: sha256:143407a3cb7efa6e95761b8cd6cea25e3f41455be6d5e7cda
deleted: sha256:738010bda9dd34896bac9bbc77b2d60addd7738ad1a95e5cc
deleted: sha256:fa4f0194a1eb829523ecf3bad04b4a7bdce089c8361e2c347
deleted: sha256:c5041938bcb46f78bf2f2a7f0a0df0eea74c4555097cc9197
deleted: sha256:5945bb6e12888cf320828e0fd00728947104da82e3eb4452f

Total reclaimed space: 12.9kB

If we suddenly want to remove all images in general (and not just dangling) with one command, then we can do this:

$ docker image rm $(docker image ls -q)

Disk usage by volumes

Volumes are used to store data outside of the container's file system. For example, if we want to save the results of an application in order to use them somehow else. Databases are a common example.

Let's start the MongoDB container, mount a volume external to the container, and restore the database backup from it (we have it in the bck.json file):

# Running a mongo container
$ docker run --name db -v $PWD:/tmp -p 27017:27017 -d mongo:4.0

# Importing an existing backup (from a huge bck.json file)
$ docker exec -ti db mongoimport 
  --db 'test' 
  --collection 'demo' 
  --file /tmp/bck.json 
  --jsonArray

The data will be located on the host machine in the /var/lib/docker/volumes directory. But why not at the read-write level of the container? Because the Dockerfile of the MongoDB image defines the /data/db directory (where MongoDB stores its data by default) as a volume.

Docker Tips: Clean your machine of junk

Side note: Many images that are supposed to produce data use volumes to store that data.

When we play enough with MongoDB and stop (or even delete) the container, the volume will not be deleted. It will continue to take up our precious disk space until we explicitly remove it with this command:

$ docker volume rm $(docker volume ls -q)

Well, or we can use the prune subcommand we already know:

$ docker volume prune
WARNING! This will remove all local volumes not used by at least one container.
Are you sure you want to continue? [y/N] y
Deleted Volumes:
d50b6402eb75d09ec17a5f57df4ed7b520c448429f70725fc5707334e5ded4d5
8f7a16e1cf117cdfddb6a38d1f4f02b18d21a485b49037e2670753fa34d115fc
599c3dd48d529b2e105eec38537cd16dac1ae6f899a123e2a62ffac6168b2f5f
...
732e610e435c24f6acae827cd340a60ce4132387cfc512452994bc0728dd66df
9a3f39cc8bd0f9ce54dea3421193f752bda4b8846841b6d36f8ee24358a85bae
045a9b534259ec6c0318cb162b7b4fca75b553d4e86fc93faafd0e7c77c79799
c6283fe9f8d2ca105d30ecaad31868410e809aba0909b3e60d68a26e92a094da

Total reclaimed space: 25.82GB
luc@saturn:~$

Disk usage for image build cache

In Docker 18.09, the process of building images has undergone some changes thanks to the BuildKit tool. With the help of this thing, the speed of the process is increased, data storage and security management is optimized. Here we will not go into all the details of this wonderful tool, we will only focus on how it affects the use of disk space.

Let's say we have a perfectly simple Node.Js application:

  • the index.js file starts a simple HTTP server that responds with a string to every request it receives:
  • the package.json file defines the dependencies, of which only expressjs is used to run the HTTP server:

$ cat index.js
var express = require('express');
var util    = require('util');
var app = express();
app.get('/', function(req, res) {
  res.setHeader('Content-Type', 'text/plain');
  res.end(util.format("%s - %s", new Date(), 'Got Request'));
});
app.listen(process.env.PORT || 80);

$ cat package.json
    {
      "name": "testnode",
      "version": "0.0.1",
      "main": "index.js",
      "scripts": {
        "start": "node index.js"
      },
      "dependencies": {
        "express": "^4.14.0"
      }
    }

The Dockerfile to build the image looks like this:

FROM node:13-alpine
COPY package.json /app/package.json
RUN cd /app && npm install
COPY . /app/
WORKDIR /app
EXPOSE 80
CMD ["npm", "start"]

Let's build the image in the usual way, without using BuildKit:

$ docker build -t app:1.0 .

If we check the disk space usage, we can see that only the base image (node:13-alpine) and the target image (app:1.0) take up space:

TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE
Images         2          0          109.3MB    109.3MB (100%)
Containers     0          0          0B         0B
Local Volumes  0          0          0B         0B
Build Cache    0          0          0B         0B

Let's build the second version of our application, this time using BuildKit. To do this, we just need to set the DOCKER_BUILDKIT variable to 1:

$ DOCKER_BUILDKIT=1 docker build -t app:2.0 .

If we now check the disk usage, we will see that the build cache (buid-cache) is now involved there:

$ docker system df
TYPE           TOTAL      ACTIVE     SIZE       RECLAIMABLE
Images         2          0          109.3MB    109.3MB (100%)
Containers     0          0          0B         0B
Local Volumes  0          0          0B         0B
Build Cache    11         0          8.949kB    8.949kB

To clear it, use the following command:

$ docker builder prune
WARNING! This will remove all dangling build cache.
Are you sure you want to continue? [y/N] y
Deleted build cache objects:
rffq7b06h9t09xe584rn4f91e
ztexgsz949ci8mx8p5tzgdzhe
3z9jeoqbbmj3eftltawvkiayi

Total reclaimed space: 8.949kB

Clear all!

So, we have considered cleaning up disk space occupied by containers, images and volumes. The prune subcommand helps us with this. But it can also be used at the docker system level and will clean up everything it can:

$ docker system prune
WARNING! This will remove:
  - all stopped containers
  - all networks not used by at least one container
  - all dangling images
  - all dangling build cache

Are you sure you want to continue? [y/N]

If you're saving disk space on your Docker machine for any reason, it's worth getting into the habit of running this command periodically.

Source: habr.com

Add a comment