06 | Write the best Dockerfile

06 | Write the best Dockerfile

In production practice, Dockerfile must be used first to build images . Because using Dockerfile to build an image can bring many benefits:

  • Easy version management, Dockerfile itself is a text file, which is convenient to store in the code warehouse for version management, and it is easy to find the change history between various versions;

  • The process can be traced back, each line of instructions in the Dockerfile represents a mirror layer, and the complete construction process of the mirror can be clearly viewed according to the content of the Dockerfile;

  • Shielding the heterogeneous build environment, using Dockerfile to build a mirror does not need to consider the build environment, based on the same Dockerfile wherever it runs, the build results are consistent.

Although there are so many benefits, if your Dockerfile is used improperly, it can also cause many problems. For example, the image building time is too long, or even the image building fails; too many image layers cause the image file to be too large.

Dockerfile writing principles

(1) Single responsibility

Since the essence of a container is a process, and a container represents a process, applications with different functions should be divided into different containers as much as possible, and each container is only responsible for a single business process.

(2) Provide annotation information

Dockerfile is also a kind of code, and we should maintain good coding habits.

(3) Keep the container to a minimum

Avoid installing useless software packages. For example, in an nginx image, I don’t need to install vim, gcc and other development tools. This not only speeds up the construction of the container, but also avoids the image volume being too large.

(4) Reasonably choose the basic mirror

The core of the container is the application, so as long as the basic image can meet the operating environment of the application. For example, a Java-type application only needs JRE when running, and does not need JDK, so the mirror only needs to install the JRE environment.

(5) Use .dockerignore file

When using git, we can use the .gitignore file to ignore some software that does not require version management. In the same way, using the .dockerignore file allows us to ignore some files that do not need to participate in the build when building, thereby improving the efficiency of the build. The definition of .dockerignore is similar to that of .gitignore.

The essence of .dockerignore is a text file. When docker builds, you can use line breaks to parse the file definition, and each line ignores some files or folders. The specific usage is as follows:
Insert picture description here

(6) Try to use the build cache

During the build process of Docker, each Dockerfile instruction is submitted as a mirroring layer, and the next instruction is built based on the previous instruction. If the parent mirror layer of the mirror layer to be built is found during construction, and the same instruction is used in the next command, the build cache can be hit.

The rules for judging whether to use the cache when Docker builds are as follows:

Starting from the current build layer, compare all sub-mirrors and check whether all build instructions are completely consistent with the current one. If they are inconsistent, the cache is not used;

Under normal circumstances, you only need to compare the build instructions to determine whether you need to use the cache, except for some instructions (such as ADD and COPY);

For the ADD and COPY instructions, it is not only necessary to check whether the commands are consistent, but also to calculate a checksum for the file to be copied to the container (a value calculated based on the content of the file, if the value calculated for the two files is the same, it means the content of the two files Consistent), the command and checksum are exactly the same, it is considered to hit the cache.

Therefore, based on the caching feature of Docker build time, we can put instructions that are not easily changed in front of the Dockerfile (such as installing software packages), and instructions that may change frequently are placed at the end of the Dockerfile (such as compiling applications).

For example, if we want to define some environment variables and install some software packages, we can write a Dockerfile in the following order:

FROM centos:7
# 设置环境变量指令放前面
ENV PATH /usr/local/bin:$PATH
# 安装软件指令放前面
RUN yum install -y make
# 把业务软件的配置,版本等经常变动的步骤放最后
...

When the Dockerfile written in accordance with the above principles builds an image, the probability of hitting the cache in the previous steps will increase, which can greatly shorten the image construction time.

(7) Set the time zone correctly

Most of the official operating system images we pull from Docker Hub are in UTC time (Coordinated Universal Time). If you want to use China Standard Time (East Eighth District) in the container, please modify the corresponding time zone information according to the operating system you are using. Below I will introduce several common operating system modification methods:

  • Ubuntu and Debian systems
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo "Asia/Shanghai" >> /etc/timezone
  • CentOS system
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

(8) Use domestic software sources to speed up mirroring construction

Here I take CentOS 7 as an example to introduce how to use the 163 software source (there are many domestic manufacturers, such as Alibaba, Tencent, NetEase and other companies provide free software acceleration sources) to speed up the image construction.

First, create the file CentOS7-Base-163.repo in the container build directory. The content of the file is as follows:

# CentOS-Base.repo
#
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client.  You should use this for CentOS updates
# unless you are manually picking other mirrors.
#
# If the mirrorlist= does not work for you, as a fall back you can try the 
# remarked out baseurl= line instead.
#
#
[base]
name=CentOS-$releasever - Base - 163.com
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
baseurl=http://mirrors.163.com/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-7
#released updates
[updates]
name=CentOS-$releasever - Updates - 163.com
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
baseurl=http://mirrors.163.com/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-7
#additional packages that may be useful
[extras]
name=CentOS-$releasever - Extras - 163.com
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
baseurl=http://mirrors.163.com/centos/$releasever/extras/$basearch/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-7
#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever - Plus - 163.com
baseurl=http://mirrors.163.com/centos/$releasever/centosplus/$basearch/
gpgcheck=1
enabled=0
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-7

Then add the following instructions in the Dockerfile:

COPY CentOS7-Base-163.repo /etc/yum.repos.d/CentOS7-Base.repo

After performing the above steps, when you use the yum install command to install the software, the software package will be obtained from 163 by default, which can greatly improve the build speed.

(9) Minimize the number of mirroring layers

Reduce the number of Dockerfile instruction lines as much as possible when building the image. For example, if we want to install make and net-tools in CentOS system, we should use the following instructions in Dockerfile:

RUN yum install -y make net-tools

It should not be written like this:

RUN yum install -y make
RUN yum install -y net-tools

Docker instruction writing advice

(1)RUN

The RUN instruction will generate a new mirror layer during construction and execute the content after the RUN instruction.

The following principles should be followed when using the RUN instruction:

  • When the content following the RUN instruction is more complicated, it is recommended to end with a backslash (\) and wrap ;

  • The content after the RUN instruction should be sorted in alphabetical order as much as possible to improve readability.

For example, I want to install some software under the official CentOS mirror. A suggested Dockerfile instruction is as follows:

FROM centos:7
RUN yum install -y automake \
                   curl \
                   python \
                   vim

(2) CMD and ENTRYPOINT

CMD and ENTRYPOINT instructions are both command entry points for container operation. There are many similarities in the use of these two instructions, but there are also some differences.

The similarities between these two instructions, the basic usage format of CMD and ENTRYPOINT is divided into two types.

  • The first is CMD/ENTRYPOINT["command", "param"]. This format is implemented using Linux's exec, which is generally called exec mode. This writing format is CMD/ENTRYPOINT followed by a json array, which is also a format recommended by Docker.

  • The other format is CMD/ENTRYPOINTcommand param. This format is implemented based on the shell, usually called shell mode. When using shell mode, Docker will execute commands in the way of /bin/sh -c command.

When the container is started in exec mode, the No. 1 process of the container is the command specified in CMD/ENTRYPOINT, and when the container is started in shell mode, it is equivalent to putting the startup command in the shell process for execution, which is equivalent to executing /bin/sh -c "task command" command. Therefore, the process started in shell mode is not actually process No. 1 in the container.

The difference between these two instructions:

  • If the ENTRYPOINT instruction is used in the Dockerfile, you need to use the --entrypoint parameter when starting the Docker container to overwrite the ENTRYPOINT instruction in the Dockerfile, and the command set by CMD can be directly overwritten by the parameters behind docker run.

  • The ENTRYPOINT instruction can be used in combination with the CMD instruction or used alone, while the CMD instruction can only be used alone.

When should I use ENTRYPOINT and when should I use CMD?

If you want your image to be flexible enough, it is recommended to use the CMD command.
If your image only executes a single specific program and does not want users to overwrite the default program when executing docker run, it is recommended to use ENTRYPOINT.

(3) ADD and COPY

The ADD and COPY instructions have similar functions. They both add files to the container from the outside. However, the COPY command only supports basic file and folder copy functions, while ADD supports more file source types, such as automatic extraction of tar packages, and can support source files in URL format.

So in daily applications, which command should we use to add files to the container? You may be thinking, since the ADD instruction supports more functions, of course you should use the ADD instruction. However, on the contrary, I recommend you to use the COPY instruction, because the COPY instruction is more transparent and only supports the copying of local files to the container, and the COPY instruction can make better use of the build cache and effectively reduce the image volume.

When you want to use ADD to add URL files to the container, please try to consider using other methods instead. For example, if you want to install memtester (a memory pressure measurement tool) in a container, you should avoid using the following format:

ADD http://pyropus.ca/software/memtester/old-versions/memtester-4.3.0.tar.gz /tmp/
RUN tar -xvf /tmp/memtester-4.3.0.tar.gz -C /tmp
RUN make -C /tmp/memtester-4.3.0 && make -C /tmp/memtester-4.3.0 install

The following is the recommended way of writing:

RUN wget -O /tmp/memtester-4.3.0.tar.gz http://pyropus.ca/software/memtester/old-versions/memtester-4.3.0.tar.gz \
&& tar -xvf /tmp/memtester-4.3.0.tar.gz -C /tmp \
&& make -C /tmp/memtester-4.3.0 && make -C /tmp/memtester-4.3.0 install

(4)WORKDIR

In order to make the build process clearer, it is recommended to use WORKDIR to specify the working path of the container, and you should try to avoid using instructions such as RUN cd /work/path && do some work.

Finally, I give some links to the official Dockerfile examples of commonly used software, I hope it can be helpful to you.

to sum up

  • Defining multiple ENVs can be written directly in multiple lines, and basically does not take up extra space. Generally, it is recommended to write ENV key=value.

  • The compilation environment is packaged separately and only the compiled binary is provided (note the running CPU architecture). The operating environment is mirrored separately, and only the configuration of the basic operating environment is considered. The benefits are still obvious, the program releases a separate version management, and the operating environment is also combined separately. Another advantage is that the operating environment is patched and upgraded, etc., without affecting the program release.

  • Separating the compilation environment and the runtime environment is very important for the production environment

  • CMD ["nginx", "-g", "daemon off;"]
    is to start nginx in the foreground, and do not exit the current window after starting

Guess you like

Origin blog.csdn.net/Cirtus/article/details/108982176