Detailed explanation of docker configuration ubuntu16.04+cuda10.1+cudnn7

1. Install docker

  • Uninstall the official old version of docker with apt:
sudo apt-get remove docker docker-engine docker-ce docker.io
  • Update the apt package:
sudo apt-get update
  •  Install the following packages to make the repository available to apt over HTTPS:
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
  • Add docker official key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
  • Set up the stable repository
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  • Update the apt package again:
sudo apt-get update
  • Install latest docker CE:
sudo apt-get install -y docker-ce
  • Check whether the docker service is started:
systemctl status docker
  • If not started, start docker
sudo systemctl start docker

2. Install nvidia-docker

If you want to run only cpu programs in docker, if you want to call the host gpu, you need to install nvidia-docker officially provided by nvidia.

Official address: https://github.com/NVIDIA/nvidia-docker

If the docker version > 19.03, you don't need to install nvidia-docker, you only need to install nvidia-container-tookit.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

To test whether the installation is successful, the image will be downloaded from the official docker warehouse.

#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

# Start a GPU enabled container on two GPUs
docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi

# Starting a GPU enabled container on specific GPUs
docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi
docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi

# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi

If the gpu information is output, it is successful.

Tue Apr 24 18:58:50 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   53C    P5    27W / 280W |      0MiB / 11177MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

 

The official download mirror is very slow (please skip the following part if you are over the wall), and you need to configure a domestic mirror warehouse.

sudo vim /etc/docker/daemon.json

Open as shown below.

{

    "runtimes":{
                "nvidia":{
                    "path":"nvidia-container-runtime","
                     runtimeArgs":[]
        }
     }

 }

Modified to: (The text is the Alibaba Cloud warehouse, which is available for personal testing, and  https://registry.docker-cn.com , http://hub-mirror.c.163.com  and other warehouses)

{
    "registry-mirrors":["https://3laho3y3.mirror.aliyuncs.com"],
    "runtimes":{
                "nvidia":{
                    "path":"nvidia-container-runtime","
                     runtimeArgs":[]
        }
     }

 }

3. Download cuda/nvidia-ubuntu image

Docker image official website: https:// hub.docker .com/

Enter the official website to search for nvidia/cuda

 

Select tags, find 10.1-cudnn7-devel-ubuntu16.04 (including ubuntu system library, cuda10.1, cudnn7), if you don't want to include system library, you can choose other images.

Download the mirror.

sudo docker pull nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04

Wait for the download to complete, run docker images to see if there is an image.

Because the image may be too large, it is necessary to adjust the size of the local docker image repository, which is configured in docker.service.

Generally speaking, docker.service is in the /usr/lib/systemed/system/ directory, but when I tested it, it was in the /lib/systemed/system/ directory, pay attention to lightning protection.

Open docker.service.

# cat /usr/lib/systemd/system/docker.service[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target
Wants=docker-storage-setup.service
Requires=docker-cleanup.timer

[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/run/containers/registries.conf
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
Environment=DOCKER_HTTP_HOST_COMPAT=1
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
ExecStart=/usr/bin/dockerd-current \
          --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
          --default-runtime=docker-runc \
          --exec-opt native.cgroupdriver=systemd \
          --userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
          $OPTIONS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $ADD_REGISTRY \
          $BLOCK_REGISTRY \
          $INSECURE_REGISTRY\
      $REGISTRIES
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Restart=on-abnormal
MountFlags=slave
KillMode=process

[Install]
WantedBy=multi-user.target

change container size

[Service]
...
ExecStart=/usr/bin/dockerd 
--storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
...

DOCKER最大空间为100G,容器最大空间为30G

After the change, you need to reload the file and restart docker

systemctl daemon-reload

#重启docker
service docker restart

Modify the docker image storage path

sudo docker info

输出如下:


Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 1
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version:  (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: df5c38a9167e87f53a9894d77c0950e178a745e7 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
Kernel Version: 3.10.0-862.14.4.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 1
Total Memory: 991.7 MiB
Name: fuqiang
ID: F2MD:SKQC:HSZG:LN7H:L3KI:7SN2:JHRP:HMQI:3KK2:4RTO:TPTJ:UCYZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Registries: docker.io (secure)

You can see that Docker Root Dir:/var/lib/docker is the default storage location for images and container instances. Often when the image is large, this directory is not enough to store, and the directory needs to be replaced.

Mirror target location: /home/docker

Stop the docker service:

systemctl stop docker

data migration:

sudo cp -r /var/lib/docker/ /home/docker

 

docker.service add --graph

[Service]
...
ExecStart=/usr/bin/dockerd --graph=your_docker_image_path
--storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
...

 

Start the docker service:

systemctl start docker
systemctl status docker

Then the replacement is successful.

 

4. The ubuntu host displays the docker graphical interface

Through the network, the host needs to install xserver

A.在宿主机
查看宿主机IP
$ ifconfig                          ##假设为xxx.xxx.xxx.xx
查看当前显示的环境变量值
$ echo $DISPLAY   (要在显示屏查看,其他ssh终端不行)  ##假设为:0
或通过socket文件分析:
$ ll /tmp/.X11-unix/                            ##假设为X0= ---> :0

安装xserver
$ sudo apt install x11-xserver-utils
$ sudo vim /etc/lightdm/lightdm.conf 
增加许可网络连接
[SeatDefaults]
xserver-allow-tcp=true
重启xserver
$ sudo systemctl restart lightdm
许可所有用户都可访问xserver
xhost +


B.在docker 容器内
# export DISPLAY=xxx.xxx.xxx.xx:0

Stepping on the pit summary:

1. Customize the ubuntu image, install cuda, cudnn is successful, but c++ fails to call cudnnapi, download nvidia/cuda image and call successfully, the reason is unknown.

2. The size of the container is insufficient, and the size of the container needs to be increased

3. The local mirror library is insufficient, and the mirror library needs to be replaced. Before replacement, all files in the source directory need to be copied to the target directory.

 

Comments and private messages are welcome.

 

 

 

 

 

Guess you like

Origin blog.csdn.net/a454193977/article/details/106383605