Ubuntu16.04下Tensorflow-gpu安装

踩了无数坑之后(基本上网上出现的所有问题我都不止一次地遇到过:循环登录,分辨率变差,黑屏等等),终于把带GPU加速的tensorflow所有相关软件都安装好了。因为过程太不容易了,光linux系统我就重装了两次,每一次重装都是相当地绝望了,所以我希望把安装方法好好总结一下,希望能对别人有所帮助。

在我的电脑上各种软件的版本:

linux: Ubuntu 16.04;      

uname -m && cat /etc/*release

anaconda:  conda 4.5.11

conda --version

python:    python3.7

显卡:一般有两块显卡,一是:Intel集成显卡,二是:Nvidia显卡 (参考:https://blog.csdn.net/yan_chou/article/details/72847943)

我的nvidia显卡的型号:GeForce GTX 950M

显卡驱动:410.48

cuda: cuda10.0

cudnn:  cudnn 7.3.0 (针对cuda10.0)

tensorflow:  1.11.0

关于cuda的介绍:

CUDA ® is a parallel computing platform and programming model invented by NVIDIA.
It enables dramatic increases in computing performance by harnessing the power of the
graphics processing unit (GPU).
CUDA was developed with several design goals
in mind: ‣Provide a small set of extensions to standard programming languages, like C, that enable a straightforward implementation of parallel algorithms. With CUDA C/C++, programmers can focus on the task of parallelization of the algorithms rather than spending time on their implementation.
Support heterogeneous computation
where applications use both the CPU and GPU. Serial portions of applications are run on the CPU, and parallel portions are offloaded to the GPU. As such, CUDA can be incrementally applied to existing applications. The CPU and GPU are treated as separate devices that have their own memory spaces. This configuration also allows simultaneous computation on the CPU and GPU without contention for memory resources.

关于cudnn的介绍:

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.
cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling,
normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK. Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration.
It allows them to focus on training neural networks and developing software applications rather than spending time on
low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe,Caffe2, Chainer,
Keras,MATLAB, MxNet, TensorFlow, and PyTorch. For access to NVIDIA optimized deep learning framework containers,
that has cuDNN integrated into the frameworks, visit NVIDIA GPU CLOUD to learn more and get starte

一个简单的理解就是,cuda在GPU上进行并行计算的平台,cuda针对deep learning计算时,又有了一个针对deep learning进行优化的深度学习的库。显卡驱动是底层的硬件驱动。

介绍结束了,下面简短归纳一下网上的几类安装方法(我这里写的不是具体的步骤,只是现有方法的大概总结):

1.先安装NVIDIA显卡驱动,再cuda,再cudnn,再tensorflow-gpu:

这种方法再安装驱动部分又分为以下几种类型:

(1).直接从ubuntu里的系统设置,软件与更新,附加驱动里选择添加新驱动(不成功,因为根本无法选);

(2). 终端添加ppa安装源,然后apt-get install的形式安装,会造成循环登录,解决方法是卸载安装的驱动(不可行);

(3). 直接从nvidia官网下载驱动文件,以sh **.run形式单独进行驱动安装(可以成功安装驱动,但后续会出问题)。

关于第三种情况,我尝试了很久,网上方法很多,我认为这两篇最有帮助:

https://blog.csdn.net/Zafir_410/article/details/73188228?utm_source=blogxgwz0

https://blog.csdn.net/ksws0292756/article/details/79160742

但他们在安装驱动这一步上也并没有完全解决我的问题,最后我在google上直接贴上错误搜索,基本解决了安装驱动的问题,

也就是在终端运行nvidia-smi能得到正确输出:

Wed Nov  7 16:05:20 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 950M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   34C    P0    N/A /  N/A |      0MiB /  4046MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

先前我是卡在了:

Error:Unable to load the 'nvidia-drm' kernel module .

按那两篇博客第二篇那样并不能解决,在google搜到是修改一个.h文件中的两行代码,但历史记录找不到了,不能详细告诉大家了。

然而就算我之前驱动安装成功,依然还有一个大难题:无法将安装好的驱动加载除了,打开nvidia x server setting时总是跳出:

You do not appear to be using the NVIDIA X driver on Linux Ubuntu

解决这个问题又有两种主要方法:

(1). 再安装Nvidia-prime显卡驱动管理工具(我的并不好使);

(2).Google上说安装bumlebee工具,我安装之后确实消除了上面那个问题,然而因为我自己的误操作又归零了,而且这个是否有后续错误我也不知道。

https://blog.csdn.net/geange/article/details/79284727

2.终极的正确安装方法:安装cuda时顺带安装显卡驱动,这才是正解,单独安装驱动问题一个接着一个。之前我还纳闷,为啥nvidia显卡驱动没有官方的安装指导,原来人家只给了cuda的安装指导,但这个通过安装是选择要不要安装Nvidia驱动。都怪没有过来人指点,才走了这么多弯路,否则可以节省很多时间。

贴上一篇给我指点了一条明路的博客:https://blog.csdn.net/QLULIBIN/article/details/78714596

这篇博客基本上也是翻译了官方的cuda安装指导,但是人家在一些地方都做了特别强调,所以非常有用。

可以看到底下博客评论,照着那个方法做基本都很顺利。

当然你要自己愿意去看官方安装指导,也非常有帮助,这里给出链接:

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

这里提醒一点:一定要注意官方指导文档里的

table 1 Native Linux Distribution Support in CUDA 10.0

这个表,这里说明了ubuntu类型,gcc版本,kernel版本,怎样的匹配才能支持cuda10.0,这个必须符合。

安装前对整个过程一定要提前熟练,最好不要误操作,否则又要走很多弯路。

cudnn的安装比较简单,只需要注意跟cuda的对应关系,nvidai官网上都明确指明了。

tensorflow-gpu的安装我都是通过anaconda安装的,网上有很多教程,这个比较简单。贴上一篇我用到的:

https://blog.csdn.net/lukaslong/article/details/81092032

就这样,希望每一个人都能安装顺利!

ps: 最后再说一下我的最终导致重装linux的经历,一次是手残把linux自带的python3.5文件夹删除了,然后各种系统问题,后面在网上查,直有效率的方法就是重装系统了;

一次是安装了nvidia驱动后,分辨率变差,错误修改了/etc/X11/xorg.conf配置文件,之前却没有备份,然后就是终端也打不开了,桌面也进不去了,甚至是ctrl+alt+F1都不行。希望大家不要犯我这样的错误,配置文件修改之前一定要备份,考虑周全,防止误操作。

猜你喜欢

转载自www.cnblogs.com/nanjingli/p/9923431.html