First, install

1.1, kubernetes hardware support issues Description

Kubernetes mainly to a small extent supported CPU and memory discoveries. Kubelet processing device itself is very small.
Kubernetes for hardware use hardware vendors rely on independent research and development kubernetes plug-in, plug-ins by hardware vendors so that kubernetes hardware support.

Logic implemented as follows:

1.2 for the NVIDIA device plug Kubernetes Description

Kubernetes the NVIDIA device plug-in is a Daemonset, allows you to automatically:

The number of public GPU on each node of the cluster
tracking the health of the GPU
to run GPU-enabled vessel in Kubernetes cluster.
The repository contains NVIDIA's Kubernetes device plug- official implementation.

1.3, the NVIDIA device plug Kubernetes conditions ( Official )

Prerequisites run Kubernetes NVIDIA apparatus widget list as follows:

~ = 361.93 NVIDIA driver
nvidia-docker version> 2.0 (see How to Install and prerequisites)
docker nvidia configured as the default operation.
Kubernetes version> = 1.11

Run nvidia-docker 2.0 prerequisite list as follows:

Kernel version> 3.10 GNU / Linux x86_64
Docker> = 1.12
Uses architecture NVIDIA GPU> Fermi (2.1)
NVIDIA Drivers ~ = 361.93 (older versions of untested)

1.4, delete nvidia-docker 1.0

Before you proceed, you must completely remove the version 1.0 software package nvidia-docker.
You must stop and remove all start using nvidia-docker 1.0 container.

1.4.1, Ubuntu release delete nvidia-docker 1.0

docker volume ls -q -f driver = nvidia-docker | xargs -r -I {} -n1 docker ps -q -a -f volume = {} | xargs -r docker rm -f
sudo apt-get purge nvidia-docker

1.4.2, CentOS release delete nvidia-docker 1.0

docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

1.5, installation nvidia-docker 2.0

Make sure you have installed the NVIDIA drivers and Docker supported version for your distribution (see Prerequisites).
If you have custom /etc/docker/daemon.json , the nvidia-docker2 packages may cover it, to do a good backup .

1.5.1, Ubuntu release to install nvidia-docker 2.0

Install nvidia-docker2 package and reload the Docker daemon configuration:

sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd

1.5.2, CentOS distributions install nvidia-docker 2.0

Install nvidia-docker2 package and reload the Docker daemon configuration:

sudo yum install nvidia-docker2
sudo pkill -SIGHUP dockerd

1.5.3, older versions of Docker install nvidia-docker 2.0 ( not recommended )

If you have to use the old version of the docker to install nvidia-docker 2.0
must be fixed nvidia-docker2 version and install nvidia-container-runtime, such as:

sudo apt-get install -y nvidia-docker2=2.0.1+docker1.12.6-1 nvidia-container-runtime=1.1.0+docker1.12.6-1

use

apt-cache madison nvidia-docker2 nvidia-container-runtime

yum search --showduplicates nvidia-docker2 nvidia-container-runtime

Lists the available versions.

The basic usage of
a new container running nvidia-docker registered with the Docker daemon. You must select docker run nvidia runtime use:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

nvidia-docker 2.0 installation and usage detailed in "Fast Learning in Ubuntu Docker 1 hour"

Second, the configuration

2.1, configuration docker

You need to be enabled as the default on the node running nvidia runtime. Edit the docker daemon configuration file, which usually appears in /etc/docker/daemon.json , configured as follows:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

If runtimes does not exist, reinstall nvidia-docker, or reference nvidia-docker official page

2.2, Kubernetes enabled GPU support

When you enable this option on all GPU nodes you want to use, you can deploy the following Daemonset focus enabled GPU support group:

 kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

2.3, running GPU jobs

You may be used Dev nvidia.com/gpu configured to use the container by NVIDIA GPU level resource requirements:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvidia/cuda:9.0-devel
      resources:
        limits:
          nvidia.com/gpu: 2 #请求2个GPU
    - name: digits-container
      image: nvidia/digits:6.0
      resources:
        limits:
          nvidia.com/gpu: 2 #请求2个GPU

Warning: If the use of the device with a plug NVIDIA image, the number of requests is not configured GPU, the GPU on the host are all exposed in the vessel.

2.4, Kubernetes GPU to container

2.4.1, using notices

Nvidia's GPU device plug-in function is Kubernetes v1.11 beta
NVIDIA device plug-in is still considered beta and missing
- GPU more comprehensive health check function
- GPU cleanup function
- ...
Only official NVIDIA device plug-in to provide support.

2.4.2, kubernetes GPU to the container in dependence Docker

1, get a mirror

1, the image extracted from the pre-built in Docker Hub:

docker pull nvidia/k8s-device-plugin:1.11

Method 2, without using the mirror, using the official method build:

docker build -t nvidia/k8s-device-plugin:1.11 https://github.com/NVIDIA/k8s-device-plugin.git#v1.11

Method 3, using a custom build files Method:

git clone https://github.com/NVIDIA/k8s-device-plugin.git && cd k8s-device-plugin
docker build -t nvidia/k8s-device-plugin:1.11 .

2, run locally

docker run --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.11

3, kubernetes deployed as a set of daemons:

kubectl create -f nvidia-device-plugin.yml

2.4.3, kubernetes operation does not depend on the GPU container Docker

1, building

C_INCLUDE_PATH=/usr/local/cuda/include LIBRARY_PATH=/usr/local/cuda/lib64 go build