CentOS system failure | Comparison of container storage drivers caused by a "blood case"

Write in front:

Due to the influence of Red Hat in the Linux world, I believe that many of my friends use RedHat or CentOS systems for testing and production systems. This time I encountered a very interesting fault on the CentOS system. Through the analysis and solution of the cause of the fault , I wrote this article specially to share with you.


We deployed a Docker system on CentOS. After running for a period of time, we suddenly found that all containers were running abnormally, and the host kernel reported a disk I/O error: kernel: Buffer I/O error on devicekernel: EXT4-fs warning: ext4_end_bio:332: I/O errorkernel: EXT4-fs: Remounting filesystem read-only
The first reflection of the problem is to check the disk status and space usage, and find that the root directory of the system has been used up: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/centos-root xfs 50G 50G 134M 100% /Devtmpfs devtmpfs 1.9G 0 1.9G 0% /devTmpfs tmpfs 1.9G 0 1.9G 0% /dev/shmTmpfs tmpfs 1.9G 8.7M 1.9G 1% / runTmpfs tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup/dev/sda1 xfs 497M 123M 375M 25% /bootTmpfs tmpfs 378M 0 378M 0% /run/user/0
We know that the default storage directory of Docker is under /var/lib/docker/, and we also know that the default storage path of Docker can be modified by using the -g, --graph=”/var/lib/docker” parameter. After knowing the problem, we can mount a large hard disk to the system and change the directory of Docker to the newly mounted hard disk: Filesystem Type Size Used Avail Use% Mounted on/dev/mapper/centos-root xfs 50G 50G 134M 100% /Devtmpfs devtmpfs 1.9G 0 1.9G 0% /devTmpfs tmpfs 1.9G 0 1.9G 0% /dev/shmTmpfs tmpfs 1.9G 8.7M 1.9G 1% /runTmpfs tmpfs 1.9G 0 1.9G 0% /sys/fs /cgroup/dev/sda1 xfs 497M 123M 375M 25% /boot/dev/vdb xfs 300G 7G 293G 3% /dataTmpfs tmpfs 378M 0 378M 0% /run/user/0
I set the storage directory of Docker to the newly added /data directory, but the original image and container cannot be found because the path has changed. The original image is in /var/lib/docker/devicemapper/devicemapper/{data, metadata}. After transferring the files, we continue to run the Docker service, so that we have a 300G big house for Dockers to use.
Do you think it's over here? In fact, I thought about it too, but I tossed it by the way, so the next thing happened again. It’s okay to say that I’m cheap or tossing around. After importing a bunch of container images and running a bunch of containers, the system proudly told me that all the container root directories have become read-only, and the host kernel also reported a disk I/O error. , at first I thought the data directory was full again, but after viewing it with the df –Th command, I found that there is still a lot of space in the directory:Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/centos-root xfs 50G 4.7G 45.3 9% /Devtmpfs devtmpfs 1.9G 0 1.9G 0% /devTmpfs tmpfs 1.9G 0 1.9G 0% /dev/shmTmpfs tmpfs 1.9 G 8.7M 1.9G 1% /runTmpfs tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup/dev/sda1 xfs 497M 123M 375M 25% /boot/dev/vdb xfs 300G 145G 155G 48% /dataTmpfs tmpfs 378M 0 378M 0% /run/user/0
But the cruel reality is that after only using less than half of the space, all the containers are abnormal. This is my classic three-pronged axe: restart the container, restart the Docker service , restart the server. However, the container still runs abnormally. By crawling a bunch of information on the Internet, at http://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/It is found that CentOS uses Device Mapper as the storage driver of the container by default. You can use the dockers info command to view it. When the Docker service starts, it will create a 100G in the /var/lib/docker/devicemapper/devicemapper/ directory by default (due to The relationship between 1000 and 1024 conversion, the system actually displays the data file of 107.4G, and other numbers are the same), and then all the changed data of the started container are saved to this data file; that is, when the relevant data generated in the container is generated After the data data exceeds 100G, the container will no longer have extra space available, resulting in the root directory of all containers becoming read-only! At the same time it will limit each container to a maximum of 10GB. It's too pitiful, there are woods, and the big house can only use 100G!
In order to find the root cause, we need to understand the principle of the devicemapper storage driver: The Device Mapper storage driver operates in a thin-provisioned manner, which is actually a snapshot of the target block device. When Docker starts, a 100G sparse file (/var/lib/docker/devicemapper/devicemapper/data, metadata is /var/lib/docker/devicemapper/devicemapper/metadata) will be set up and used as the storage pool of Device Mapper , and all containers are allocated the default 10G storage space usage from this storage pool, as shown in the following figure:

When there are actual reads and writes, these storage blocks will be marked as used in the storage pool (or taken away from the pool) ). When the actual read and write block capacity is greater than the pool capacity, the container runs out of space, so an I/O error is reported.
The Device Mapper storage driver is very convenient, you can use it without any installation and deployment: such as creating additional partitions to store Docker containers, or setting up LVM. However, it also has two disadvantages: • The storage pool will have a default capacity of 100GB, which cannot meet the needs of large storage.• It will be supported by sparse files (thin provisioning, which takes almost no space at first, and only uses disk chunklets when actually writing) but with poor performance.
For these problems, there are two solutions: 1. Use a larger file/disk/logical volume to create a data file using the file: dd if=/dev/zero of=/var/lib/docker/devicemapper/devicemapper/data bs =1G count=0 seek=1000 This will create a virtual 1000G data file. If the count is directly 1000 without the seek parameter, a solid 1000G file will be created. Use the disk: ln -s / dev/sdb /var/lib/docker/devicemapper/devicemapper/data uses logical volumes: ln -s /dev/mapper/centos-dockerdata /var/lib/docker/devicemapper/devicemapper/data
2. Via Docker startup parameters - The -storage-opt option is used to limit the initial disk size of each container, such as -storage-opt dm.basesize=80G. After each container is started, the total space of the root directory is 80G.
But I always feel that this solution is not elegant enough, and requires multiple operations to meet the needs. At the same time, the space of the container is still limited, but the size of the limit changes. Is there a better way?  Let's continue to crawl the information, on Docker's official website: https://docs.docker.com/engine/reference/commandline/dockerd/
Docker supports AUFS, Device Mapper, Btrfs, ZFS, Overlay, Overlay2 and other multi-access methods in terms of storage drivers. Now, because AUFS is not incorporated into the kernel, only Ubuntu systems can use aufs as the storage engine of docker, while in CentOS systems Device Mapper is used by default, but fortunately, the Overlay driver can be natively supported in Linux kernel versions above 3.18.0. Overlayfs is similar to AUFS , but its performance is better than AUFS , and it has better memory utilization.
Docker selects the storage driver through the -s parameter, and through -s=overlay, we set the storage driver to Overlay mode, and then restart the Docker application.
As you can see, Docker is already using OverlayFS (here you should pay attention, if the system has stored images and running containers, it will not be available after changing the storage driver, please back it up first).
By modifying it to OverlayFS, the storage pool capacity limit of Device Mapper and the maximum space limit for running a single container are all gone. At the same time, the read and write performance of Overlay is also better than that of Device Mapper. You only need to pass the -s=overlay parameter to use more elegantly. Good filesystem to run containers.
So far, the cause of the I/O error when the container is running has been perfectly solved. I hope this article can help friends who encounter the same problem during use. http://mp.weixin.qq.com/s?__biz=MzA3NzUwMDg1Mg==&mid=2651291797&idx=1&sn=178f6d7d9a390d588c817994659c095d&mpshare=1&scene=23&

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326825397&siteId=291194637