Ubuntu16.04 source code to install GPU version of tensorflow

Update on February 16, 2017:
tensoflow 1.0 is released, simply update cuda and cudnn
, but the device cannot be found, and it shows that libcuda.so.1 cannot be loaded! ! !
Later, running cuda's deviceQuery routine, it shows that the cuda device cannot be found! ! ! The reason is that the graphics card driver is not compatible. . . . (I updated it through the system settings, it is no problem to drop back to the previous driver and restart.)
As for the update of cuda, run "sudo sh cuda_8.0.61_375.26_linux.run" under the .run installation package on the official website. (More pages can be skipped directly to the end with +2200).

The compilation of tensorflow 1.0 is very smooth, just follow the order of the official website. However, it should be noted that bazel must be updated to 0.4.4, otherwise


there Let’s talk about the pits encountered :
1. After installing cuda, run the test example, prompting that the graphics card driver version is incorrect, and the cuda device cannot be found, just restart it.

2. After installing, don’t forget to set the environment variables,
edit or create~/ .bash_profile file, add the following two lines: (Note to check the cuda installation directory by yourself)
quote
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

Do not know if adding to the ~/.bash_profile file is only valid under the command line?
When running in eclipse, it is prompted that cuda can be loaded, but cudnn cannot be found. Finally , these two lines are added to the end of the /etc/profile file , and after restarting, cudnn can be loaded

. 3. The installation of bazel can be done by apt-get , very simple. For
details please refer to Steps 2 and 3 in the Using Bazel custom API repository (recommended) section of http://bazel.io/docs/install.html . Bazel is a build tool similar to make by Google. It is said to be fast. ,, I really didn't see it. . . . 4. Due to well-known network reasons, the installation involving github source code may require multiple retries . tensorflow can only be downloaded through git clone. Downloading the .zip file directly from the web page will lack dependent projects. 5. Various warnings during compilation are ignored. That's it; compile times are long. . . . 6. The installed tensorflow is not under python/dist-packages, this is different from the CPU version! ! You can view the tensorboard installation directory through which tensorboard, but I really don't know where tensorflow is installed. . . . 7. cuda8 and cudnn4 are well compatible (gtx960m can, but there is a problem with the calculation result on gtx 1080) Note:














cudnn is only called when computing convolutions! ! ! cuDNN is not called at all when computing a fully connected network, so it doesn't matter if this part is installed correctly! ! (But it must be installed)


8. When prompted that the cuda header file cannot be found
Symptom : undeclared inclusion(s) in rule '//tensorflow/core/kernels:depth_space_ops_gpu'
Edit CROSSTOOL.tpl in tensorflow/third_party/gpus/crosstool, add A line that explicitly specifies the configuration of the cuda location (the cuda version is automatically set when the estimate is not set)
  cxx_builtin_include_directory: "/usr/local/cuda%{cuda_version}/include"
  cxx_builtin_include_directory: "/usr/local/cuda-8.0/include"


9. _objs/batchtospace_op_gpu/tensorflow/core/kernels/batchtospace_op_gpu.cu.pic.d (No such file or directory)
solution:
modify tensorflow/third_party/gpus/crosstool/CROSSTOOL
in each cxx_flag: "-std=c+ Add a line after +11":
cxx_flag: "-D_MWAITXINTRIN_H_INCLUDED"

See: https://github.com/tensorflow/tensorflow/issues/2143


10. nvcc does not support gcc 5.4
symptoms: error -- unsupported GNU version! gcc versions later than 5.3 are not supported! 

Solution: Comment out the corresponding line in /usr/local/cuda/include/host_config.h directly with double slashes

See (this tutorial also has theano, caffe configuration): http://blog.csdn .net/hjimce/article/details/51999566


11. NaN appears when running the test case
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

It is said that you can leave it alone, it may be fine to run only one session

https://github.com/tensorflow/tensorflow/issues/2037

quote
Please try "--num_concurrent_sessi and "--num_concurrent_steps=1" for your experiments. If you don't see any exceptions with those, then everything is good.



12. No GPU kernel for XXX
is not a bug, just remove with tf.device('/gpu:0') in your own code: this statement that explicitly specifies the gpu device (or only for certain statements, such as convolution, explicitly specifying the gpu run). The reason is that certain operations can only be performed on the CPU. 
See: http://stackoverflow.com/questions/37439299/no-gpu-kernel-for-an-int32-variable-op

13. Reinstall tensorflow
and uninstall the original one first:
sudo pip uninstall tensorflow


For details, please refer to the following articles:
Deep Learning Host Environment Configuration: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0: http://www.tuicool.com/articles/JvUvQjZ

Install Tensorflow(GPU) under Ubuntu 16.04 http:/ /blog.sina.com.cn/s/blog_672f698e0102wavp.html


Nvidia GTX 1080 on Ubuntu 16.04 for Deep Learning http://yangcha.github.io/GTX-1080/

Note the --override option when installing cuda8
http: //cn.soulmachine.me/2016-08-17-deep-learning-cuda-development-environment/

Official installation documentation: https://www.tensorflow.org/versions/master/get_started/os_setup.html


This tutorial also There are theano and caffe configurations: http://blog.csdn.net/hjimce/article/details/51999566

tensorflow for GPU compiled by others: https://github.com/tensorflow/tensorflow/issues/4030

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326580213&siteId=291194637