History of docker storage (docker root) migration issue

Not more than a couple of days ago, it was decided on one of the servers to move docker storage (the directory where docker stores all container files, images) to a separate partition, which
had more capacity. The task, it would seem, is trivial and did not portend trouble ...

Let's proceed:

1. Stop and kill all containers of our application:

docker-compose down

if there are many containers, and they are in different compose, you can do this:

docker rm -f $(docker ps -q)

2. Stop the docker daemon:

systemctl stop docker

3. Move the directory to the right place:

cp -r /var/lib/docker /docker/data/storage

4. Tell the docker daemon to look in the new directory. There are several options: either use the -g flag to point the daemon to a new path, or the systemd configs that we used. Or a symlink. I won’t paint it strongly, on the Internet is full tutorials on moving docker root to a new location.

5. We start the docker daemon, and observe that it looks where it needs to:

systemctl status docker

In one of the output lines we should see:

├─19493 /usr/bin/dockerd --data-root=/docker/data/storage

We made sure that the option was passed to the demon, now let's check if he applied it (thanks inkvizitor68sl)!

docker info | awk '/Root Dir/ {print $NF}' 

6. Start our application:

docker-compose up -d

7. Checking

And here the fun begins, DBMS, MQ, everything is fine! The base is intact, everything works ... except for nginx. We have our own nginx build with kerberos and courtesans. And viewing the container's logs indicated that it could not write to /var/tmp - Permission denied. I knead my temples with my fingers and try to analyze the situation ... How so? The Docker image did not change. We just moved the directory. It always worked, and here it is for you ... For the sake of experiment, I went into the container with pens and changed the rights to this directory, there were root, root 755, gave root, root 777. And everything started up ... A thought sounded in my head - some kind of nonsense ... I thought, well, maybe I didn’t take something into account ...

I decided that we fell in love with file access rights during the transfer. We stopped the application, the docker daemon, deleted the new directory and copied the /var/lib/docker directory already using rsync -a.

I think everything is fine now, we are raising the docker, the application.

Eee… the problem remained… My eye twitched. I rushed to the console of my virtual machine, where I run various tests, I had this nginx image, and I climbed inside the container, and here the rights to the /var/tmp directory are root, root 777. That is, the same ones that I had to set manually . But the images are identical!

FS xfs was used everywhere.

I compared through the command

docker inspect my-nginx:12345

All hashes are identical, all one to one. Both on the server and on my virtual machine. I deleted the local nginx image and re-spooled from the registry, which for a number of reasons is on the same machine. And the problem is the same ... Now my other eye twitched.

I no longer remember what thoughts were in my head, besides screaming "AAAAAA" and other things. It’s 4 o’clock in the morning on the street, the docker source codes were used in order to understand the principle of hashing image layers. Opened the third energy bank. And in the end it dawned on me that hashing takes into account only the file, its contents, but NOT ACCESS RIGHTS! That is, in some mysterious way, our rights were broken, and selinux is disabled, acl is not used, there is no sticky bit.

I removed the local image, also removed the image from the docker registry and re-pushed. And everything worked. It turns out that during the transfer, the rights were broken, both inside the local image and inside the image lying in the registry. As I said, for a number of reasons, he was located on the same wheelbarrow. And as a result, in one directory /var/lib/docker.

And anticipating the question, did they try to return the look of the docker to the old directory - no, they didn’t try, alas, the circumstances did not allow it. Yes, I really wanted to figure it out.

After writing this article, the solution to the problem seems obvious to me, but at the time of analysis it did not seem so. Honestly, I googled, and did not find similar situations.

Conclusion: I solved the problem, I did not understand the reason =(

If someone knows, there was a vision about the possible causes of this problem, I will be extremely glad to see you in the comments!

Source: habr.com

Add a comment