使用Docker运行tensorflow进行隔离式的深度学习

每次换了个地方就要重新配置自己的开发环境那是特别蛋疼的,尤其是要弄到服务器跑的时候,不小心把环境弄崩了是非常惨的。

下载tensorflow-gpu版本的源

docker pull daocloud.io/daocloud/tensorflow:0.11.0-gpu

编辑方便的脚本文件启动docker

先查下你有几个GPU设备

[root@XXX ~]# ls -la /dev | grep nvidia
crw-rw-rw-.  1 root root    195,   0 Sep 16 13:49 nvidia0
crw-rw-rw-.  1 root root    195, 255 Sep 16 13:49 nvidiactl
crw-rw-rw-.  1 root root    247,   0 Sep 16 13:54 nvidia-uvm

然后再查你的docker镜像

y@y:~$ sudo docker images
[sudo] password for y: 
REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
daocloud.io/daocloud/tensorflow   0.11.0-gpu          dd645f420f1d        8 weeks ago         2.713 GB
daocloud.io/daocloud/tensorflow   0.10.0-devel-gpu    fa886c09638d        3 months ago        5.014 GB
hello-world                    

然后就可以启动咯

sudo docker run -ti -v /home/:/mnt/home --privileged=true --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm daocloud.io/daocloud/tensorflow:0.11.0-gpu /bin/bash

上面这句有点长把它写到docker.sh文件,然后

sh docker.sh

完成。 上面的意思是把本地的/home映射到docker的/mnt目录
以及各种显卡设备也映射进去

进去之后

别急着用tensorflow,可能会报错,因为我发现LD_LIBLABRARY_PATH环境变量设置的不对。但是又没有vim。于是先更新软件源。把软件源文件放到本机的/home,再去docker的/mnt/home里面复制到

/etc/apt/sources.list

deb http://mirrors.aliyun.com/ubuntu/ trusty main multiverse restricted universe
deb http://mirrors.aliyun.com/ubuntu/ trusty-backports main multiverse restricted universe
deb http://mirrors.aliyun.com/ubuntu/ trusty-proposed main multiverse restricted universe
deb http://mirrors.aliyun.com/ubuntu/ trusty-security main multiverse restricted universe
deb http://mirrors.aliyun.com/ubuntu/ trusty-updates main multiverse restricted universe
deb-src http://mirrors.aliyun.com/ubuntu/ trusty main multiverse restricted universe
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-backports main multiverse restricted universe
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-proposed main multiverse restricted universe
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-security main multiverse restricted universe
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-updates main multiverse restricted universe

然后

apt-get update
apt-get install 你要安装的东西

安装常用软件

在~/.bashrc里面最后加上
export LD_LIBLABRARY_PATH = /usr/local/cuda-8.0/lib64:$LD_LIBLABRARY_PATH

退出终端

主机保存镜像为新版本

sudo docker ps -l
y@y:~$ sudo docker ps -l
CONTAINER ID        IMAGE                                        COMMAND             CREATED             STATUS              PORTS                NAMES
a1f2ac36a2c9        daocloud.io/daocloud/tensorflow:0.11.0-gpu   "/bin/bash"         10 minutes ago      Up 10 minutes       6006/tcp, 8888/tcp   

把a1f2ac36a2c9这个名字记住
然后

docker commit a1f2ac36a2c9 新名字

OK了

把镜像存到移动硬盘里

sudo docker save -o "要存的地址" daocloud.io/daocloud/tensorflow:0.11.0-gpu

加载本地的

sudo docker load --input "本地地址"

删除镜像

docker rmi "镜像IDa1f2ac36a2c9 "

使用tensorflow

在docker里面发现import tensorflow报错

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: ad8b0d82bec1
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.44  Wed Aug 17 22:24:07 PDT 2016

估计是cuda.so这个没有集成进来,干脆直接把本机的放进docker来好了。

-v /usr/lib/x86_64-linux-gnu/:/usr/lib/x86_64-linux-gnu/

这一句重新加入docker run里面去。重启docker OK

root@282f3d4a2193:/notebooks# python\
> 
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
>>> 

测试一下效果

...
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print(sess.run(a + b))
42
>>>

猜你喜欢

转载自blog.csdn.net/cq361106306/article/details/54094517