Docker installation, deployment and usage guide

1 Introduction

2. Concept

3. Installation

3.1 Installation preparation

3.1.1 Install curl

sudo apt-get install libcurl3-gnutls=7.47.0-1ubuntu2
sudo apt  install curl
sudo apt-get install x11-xserver-utils
sudo apt-get remove docker docker-engine docker.io containerd runc
xhost +

3.2 Install docker

3.2.1

curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

Check if the installation is successful

sudo docker help

Use the docker help command to view all docker commands, indicating that the installation has been successful.
Insert image description here

3.2.3 Install docker2

# 1. 
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
# 2. 
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
# 3.
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# 4
sudo apt-get update
# 5
sudo apt-get install -y nvidia-docker2

Modify file:

sudo vim /etc/docker/daemon.json

The red box is the added parameter

{
    
    
  "registry-mirrors": ["https://xxxxxxx.mirror.aliyuncs.com"],
  "runtimes": {
    
    
       "nvidia": {
    
    
           "path": "/usr/bin/nvidia-container-runtime",
           "runtimeArgs": []
        }	
   }
}

3.2.2 Mirror acceleration

Sometimes you encounter difficulties in pulling images from DockerHub in China. In this case, you can configure the image accelerator. For example: HKUST mirror, Alibaba Cloud, etc. Taking Alibaba Cloud as an example, Alibaba Cloud image acquisition address:. After logging in, select Image Accelerator on the left menu to see your exclusive address:
Insert image description here
Then write the following content in /etc/docker/daemon.json (if the file does not exist, please create a new file):

{"registry-mirrors":["https://XXX.mirror.aliyuncs.com/"]}

Then restart the service:

sudo systemctl daemon-reload
sudo systemctl restart docker

3.2.3 Local login

Docker officially maintains a public warehouse, Docker Hub, which contains most of the basic images we need.

First register an account, and then log in locally:
Pay attention to the username and password used. Not email

sudo docker login

3.2.4 Add user permissions

  1. Create a group named docker. If the group already exists, an error will be reported. You can ignore this error:
sudo groupadd docker
  1. Add the current user to the group docker:
sudo gpasswd -a ${USER} docker
  1. Restart the docker service (please use with caution in production environment):
sudo systemctl restart docker
  1. Add access and execution permissions:
sudo chmod a+rw /var/run/docker.sock
  1. After the operation is completed, verify it. You don’t need to bring sudo now:
docker info

4. Operation

4.1 Pull the image

sudo docker pull lingjunlh/torch1.9.1-cuda11.1

4.2 Containers

4.2.1 Create container

4.2.1.1 Setting docker in the terminal can display the relevant visual interface
  1. Host terminal runs
DISPLAY=:0.0
xhost +
  1. View environment variables
echo ${
    
    DISPLAY}

4.2.1.2

sudo nvidia-docker run -it --privileged=true -p 7777:8888 --gpus all --ipc=host -v /data:/data  -e DISPLAY=unix$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -e GDK_SCALE -e GDK_DPI_SCALE --name test1  b7a4c  /bin/bash
  • -i:Interactive operation
  • -t:terminal
  • -p 7777:8888:Map port 7777 of the host to port 8888 of the container
  • –privileged=true: call GPU resources
  • -ipc=host: Let the container share memory with the host
  • --name xxxxx:Define a personalized name for the container
  • -v /home/shcd/Documents/gby:/gby: Mount the /home/shcd/Documents/gby address on the host into the container and name it the /data folder
    • This way the contents of this folder can be shared between the container and the host.
    • Because once the container is closed, all changes in the container will be cleared, so mounting an address like this can save the data in the container locally. -
  • 90be7604e476is the ID of the image you installed.
    • You can view it after the docker images command just now. Of course, you can also directly write the full name ufoym/deepo:all-py36-jupyter
  • /bin/bash: The command is placed after the image name. Here we hope to have an interactive shell, so we use /bin/bash

4.2.2 Enter the running container

docker attach d6a0f155273a
  • d6a0f155273a: container id

4.2.3 Exit the container

ctrl+D

4.2.4 Delete container

1) First you need to stop all containers

docker stop $(docker ps -a -q)

2)删除所有的容器(只删除单个时把后面的变量改为container id即可)

docker rm $(docker ps -a -q)

4.2.5 Start container

sudo docker start 容器id

other

docker logsView the container running log using the container ID

docker logs -tf 容器id
docker logs --tail num 容器id  # num为要显示的日志条数

docker top container id to view process information in the container

docker top 容器id

docker inspect container id to view the metadata of the container

docker inspect 容器id

4.3 Uninstall

sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
sudo apt-get purge -y nvidia-docker2

5 commonly used commands

5.1 Commands

sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-compose-plugin

docker ps 查看当前运行中的容器
docker ps -a 查看所有容器
docker images 查看镜像列表
docker rm container-id 删除指定 id 的容器
docker stop/start container-id 停止/启动指定 id 的容器
docker rmi image-id 删除指定 id 的镜像
docker volume ls 查看 volume 列表
docker network ls 查看网络列表
docker ps -s 查看docker 容器大小

5.2 Docker View container and image size

  • View overall size
docker system df
  • View the detailed size of each image and container
docker system df -v

5.3 Stopping and killing containers

  • When docker stop is executed, it first sends a TERM signal to the container, allowing the container to do some protective and security operations that must be done before exiting, and then allows the container to automatically stop running. If the container does not stop running within a period of time, execute kill - 9 command to forcefully terminate the container.
sudo docker stop test
  • test: container name

  • When docker kill is executed, no matter what state the container is in or what program is running, the kill -9 command is directly executed to forcefully terminate the container.

5.4 Delete image

docker rmi image-id

5.5 Copy files to the imageDocker cp

Reference
Function: Copy files in the host to the target docker container

# 将容器内的文件拷贝到主机内
docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH]

# 将主机内的文件拷贝到容器中
docker cp [OPTIONS] SRC_PATH CONTAINER:DEST_PATH|

Example

  1. Copy the host/www/runoob directory to the container 96f7f14e99ab, and rename the directory to www.
    docker cp /www/runoob 96f7f14e99ab:/www
    
  2. Copy the container96f7f14e99ab's/www directory to the host's /tmp directory.
    docker cp  96f7f14e99ab:/www /tmp/
    

buildimage

reference

run

Case

  1. Download the required pytorch
    1.8 version URL
    Other versions

Download the following three

torch-1.8.2+cu111-cp38-cp38-linux_x86_64.whl
torchaudio-0.8.2-cp38-cp38-linux_x86_64.whl
torchvision-0.9.2+cu111-cp38-cp38-linux_x86_64.whl
  1. Open dockerfile
gedit Dockerfile
  1. Dockerfile
#安装python运行环境
#
################################################
 
#基于哪个镜像生成新的镜像
FROM nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
 
RUN rm /etc/apt/sources.list.d/cuda.list
 
 
#作者名
MAINTAINER SunPengfei
 
#设置环境变量
ENV TZ Asia/Shanghai
ENV LANG zh_CN.UTF-8
# 拷贝下载好的whl文件到镜像中
#COPY torch-1.10.1+cu111-cp38-cp38-linux_x86_64.whl /tmp
#COPY torchaudio-0.10.0+cu111-cp38-cp38-linux_x86_64.whl /tmp
#COPY torchvision-0.11.0+cu111-cp38-cp38-linux_x86_64.whl /tmp
 
 
#执行命令
#替换为阿里源
RUN sed -i 's#http://archive.ubuntu.com/#http://mirrors.aliyun.com/#' /etc/apt/sources.list \
    && sed -i 's#http://security.ubuntu.com/#http://mirrors.aliyun.com/#' /etc/apt/sources.list
 
#更新软件源并安装软件
RUN apt-get update -y \
    && apt-get -y install iputils-ping \
    && apt-get -y install wget \
    && apt-get -y install net-tools \
    && apt-get -y install vim \
    && apt-get -y install openssh-server \
    && apt-get -y install python3.8 \
    && apt-get -y install python3-pip python3-dev python3.8-dev \
    && apt-get -y install libgl1 \
    && apt-get -y install git \
    && cd /usr/local/bin \
    && rm -f python \
    && rm -f python3 \
    && rm -f pip \
    && rm -f pip3 \
    && ln -s /usr/bin/python3.8 python \
    && ln -s /usr/bin/python3.8 python3 \
    && ln -s /usr/bin/pip3 pip \
    && ln -s /usr/bin/pip3 pip3 \
    && python -m pip install --upgrade pip \
	&& cd /tmp \
	&& pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html \
    && apt-get clean \
    && rm -rf /tmp/* /var/lib/apt/lists/* /var/tmp/* \

  1. build
sudo docker build -t ubuntu18:v0 .
  • -t :- Write image tag
  • ImageName: - This is the name you want to give the image.
  • TagName: - This is the label you want to give the image.
  • dir: - The directory where the Dockerfile is located.

Submit and save

Container submission generates image

docker   commit -m="描述信息" -a="作者" 容器id 目标镜像名:[TAG]
  • -a:The submitted image author;

  • -m:Explanatory text when submitting;

  • -p: Pause the container when committing.

keep

There are two ways, one is to submit first and then save; the other is to export the container directly

save image

docker save ID > xxx.tar

docker load < xxx.tar

save container

docker export ID >xxx.tar

docker import xxx.tar containr:v1

How to solve insufficient disk space in Docker

1. Check the usage of all disks on the server:

df -h

As you can see, the red box is the size of the system disk. The total size is 188G (much smaller than other disks). It was full before, but the blogger has done the migration, so there is a lot of space.
Insert image description here

2. Check the space size of docker image and container storage directory

du -sh /var/lib/docker/

3. Stop the docker service

service docker stop

4. Migrate docker to a large-capacity disk

4.1 Method 1: Create a soft connection(推荐)

  1. Enter root
su root
  1. Move file location
#移动文件位置
cp -a /var/lib/docker  /data/
  1. Create a soft link
#创建软连接
sudo ln -fs /data/docker /var/lib/docker
  1. Reload
#重新加载配置&查看位置
systemctl daemon-reload
systemctl restart docker
service docker start
  1. Verify
    If it is still there, it proves valid
docker images

4.2 Method 2

  1. First create the directory
mkdir -p 大磁盘目录/docker/lib/
  1. migrate
rsync -avz /var/lib/docker /mnt/docker/lib/

5. Edit /etc/docker/daemon.json, add parameters, and bind the docker directory migration

Modify file:

sudo vim /etc/docker/daemon.json

The red box is the added parameter

{
    
    
  "registry-mirrors": ["https://xxxxxxx.mirror.aliyuncs.com"],
  "runtimes": {
    
    
       "nvidia": {
    
    
           "path": "/usr/bin/nvidia-container-runtime",
           "runtimeArgs": []
        }	
   },
  "data-root":"/data/docker/lib/docker"
}

Insert image description here

6. Reload and restart the docker service

systemctl daemon-reload && systemctl restart docker

But I still failed to run systemctl, so I restarted docker using the following command:

service docker restart

7. Check whether docker is bound to the new directory

docker info

If the Docker Root Dir changes from /var/lib/docker to the directory you specified, the migration is successful.
Insert image description here

8. Delete the old docker directory

rm -rf /var/lib/docker

9. Set up proxy

9.1 Valid

Configure the host/etc/default/docker

export http_proxy="http://127.0.0.1:8889/"
export https_proxy="http://127.0.0.1:8889/"
export HTTP_PROXY="http://127.0.0.1:8889/"
export HTTPS_PROXY="http://127.0.0.1:8889/"
export all_proxy="socks5h://localhost:1089"
export ALL_PROXY="socks5h://localhost:1089"

Restart docker

sudo systemctl daemon-reload
sudo systemctl restart docker

9.2 Method 2

Invalid
Open the file on the host machine

sudo vim ~/.docker/config.json

Join an agent

9.2 Test is invalid

export ALL_PROXY='socks5://127.0.0.1:1080'

The IP address here is that of the host computerip
2. Shared network

Use directly in the container when sharing the network with the host machine
Use when creating the container--network=hostparameters

sudo nvidia-docker run -it --privileged=true -p 7777:8888 --network=host --gpus all --ipc=host -v /data:/data --name test1  b7a4c  /bin/bash

Then set the proxy within docker, such as global proxy

export ALL_PROXY='socks5://127.0.0.1:1080'
  1. After mapping the proxy port, use it directly in the container
    When docker run, take the parameter -p to map the proxy port to the container, and use it in the container, for example:
docker run  -p 1080:1080 .....
export ALL_PROXY='socks5://127.0.0.1:1080'

10.0, docker library configuration

1. libGL

Report an error

ImportError: libGL.so.1: cannot open shared object file

Install

apt-get update && apt-get install libgl1

2. ping installation

apt-get install -y iputils-ping

3. Apex installation

Install:RuntimeError: Error compiling objects for extension

git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext
  1. Error one:

Compile again

git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0

4. boost

Report an error

fatal error: boost/geometry.hpp: No such file or directory

Solution

apt-get update
apt-get install libboost-all-dev

5. hidden

Reference 1 Reference 2

  1. Go tothe corresponding website to download the corresponding version of cudnn, the Linux computer is X86-64
    Insert image description here
  2. Unzip
tar -xzvf cudnn-10.1-linux-x64-v8.0.5.39.tgz //XXX.tgz是下载的cudnn的压缩包
  1. Move the corresponding file
sudo cp cuda/include/cudnn.h    /usr/local/cuda/include
sudo cp cuda/include/cudnn_version.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn*    /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h   /usr/local/cuda/lib64/libcudnn*

11. docker clear cache

reference

docker system pruneOrder:

  • Used to clean disks, delete closed containers, useless data volumes and networks, and dangling images (ie, untagged images)
    • Stopped container (container)
    • A volume that is not used by any container
    • A network that is not associated with any container
    • All dangling images (image)

docker system prune -aOrder

  • For a more thorough cleanup, you can delete all containers that do not use Docker images.

Note that these two commands will delete the containers you have temporarily closed and the Docker images that are not used... so be sure to think clearly before using them.

2. docker remote debugging

2.1 vscode plug-in installation

  1. remote-ssh
  2. remote development

2.2 docker container configuration

  1. Start the container and install ssh
apt-get update
apt-get install openssh-server
  1. Set the password for remote login
    If you want to log in to the container directly using the root account, set the root password
passwd 
  1. Add root account login permission
  2. edited article
    vim /etc/ssh/sshd_config
    Revised below
#注释掉 
PermitRootLogin prohibit-password
#添加
PasswordAuthentication yes
PermitRootLogin yes
#Port 写容器端口
Port 9901 

Insert image description here
Restart ssh

service ssh restart

2.3 vscode configuration

  1. ctrl+shift+p
    Insert image description here
  2. Open configuration
    Insert image description here

Insert image description here
4. Start configuration

# 随便起
Host 2080Ti
	# 主机IP
    HostName 10.119.XXX.XXX
    # DOCKER root用户
    User root
    # User ubuntu
    # docker 端口
    Port 9901

Insert image description here

2.4 Connection

ctrl_shift+P+连接到对应名字即可

2.5 Remote visualization such as open3d

2.5.1 Container installation

  1. Installation inside the container
apt-get install x11-xserver-utils
apt-get install x11-apps
  1. First do not log in to docker and run it under the current terminal
    If the experiment is unsuccessful, restart the container. After restarting, reset the current instructions
DISPLAY=:0.0
xhost +
  1. After logging into the container, run again
DISPLAY=:0.0
xhost +
  1. View environment variables
echo ${
    
    DISPLAY}
  1. Modify the configuration file under ubuntu server
    Reference 1

I have experienced remote connection failure, which manifested as

  • Client vscode connection container stuck
  • But connecting to a remote computer, yes.

Try the following configuration to resolve

  • open a file
vim /etc/ssh/sshd_config
  • Will
AllowTcpForwarding no

AllowAgentForwarding no

Replace with

AllowTcpForwarding yes

AllowAgentForwarding yes
  • After saving, restart the sshd service
systemctl restart sshd

2.5.2 Local installation

  1. Install vcxsrv locally
    vcxsrv free download link
  • Generally, when customizing the download path, one is a permission issue and the other is a convenience issue in path search.
  • Continue all the way until the download is complete.
  1. Start the service
    Open XLaunch, remember thisDisplay number 0, and default to [Next] for the rest until completion.
    Insert image description here
    Insert image description here

  2. Modify vcxsrv configuration. Find the files in the installation directory

  • 0 represents the above mentionedDISPLAY:0
    Insert image description here
  • Add the IP of the remote server and save
    ![Insert image description here](https://img-blog.csdnimg.cn/9410122aba6b417194a5220a4e0eafe9.pn

2.5.3 vscode configuration

  1. Open the fileC:\Users\用户名.ssh\config and add the following 3 lines:
    ForwardX11 yes
    ForwardX11Trusted yes
    ForwardAgent yes

Insert image description here
2. launchplacement
in.vscode/launch.jsonin case

"env":{
    
    "DISPLAY":":0.0"}

Insert image description here

2.6 tensorboard call

  1. OpeningvscodeEnd
  2. conda activate env_name
  3. Enter the tf_log directory and run the commandtensorboard --logdir=work_dirs_name --port='6009'
  4. Click on the URL to enter the browser to view
    Insert image description here

3 Docker packages the local image and copies it to other hosts to run

Reference
In addition to pull, the other way for docker to obtain the image is to package and copy the local image to other hosts for running. Assuming that the connection between the local warehouse and the remote warehouse is abnormal in the real environment, then it is also a solution for us to distribute the pre-packaged image to other docker nodes.

The specific steps are as follows:

  1. Execute the following command to find the name and version number of the packaged image (version number=TAG)
docekr images 
  1. Two ways for docker to package images (just choose one to execute)
docker save 镜像名字:版本号 > /root/打包名字.tar
docker save -o /root/打包名字.tar 镜像名字:版本号
  1. Distribute the packaged image to the /root/ directory of other hosts

  2. Load the image into a tarball package

docker load < /root/打包名字.tar
  1. Check the image ID from load
docekr images
  1. The name and version number of the image just loaded are none. We need to use the tag command to give the name and version number.
docker tag 镜像ID 镜像名字:版本号

Guess you like

Origin blog.csdn.net/tiger_panda/article/details/130338413