CentOS 7.2 + Python 3.7 + cuda 10.0 + cuDNN 7.6 + tensorflow 1.13.1 stepped pit filled pit in mind

PphhCentOS process of installing tensorflow-gpu paved pit, the man was still conditional compliance Tensorflow official recommendations, directly on Ubuntu.

Below contains several main branches:

  • Install Python 3
  • Graphics Driver
  • CUDA / cuDNN
  • tensorflow-gpu
  • glibc和gcc

CentOS 7.2 is installed Python 3.7

CentOS 7.2 is the default version of Python 2.7.5, this uses source compiler installed Python 3.7.

$ ll /usr/bin/ | grep python
-rwxr-xr-x    1 root root      11312 Nov 14  2018 abrt-action-analyze-python
-rwxr-xr-x    1 root root       7280 Nov  3  2018 pmpython
lrwxrwxrwx    1 root root          7 May 24 12:39 python -> python2
lrwxrwxrwx    1 root root          9 May 24 12:39 python2 -> python2.7
-rwxr-xr-x    1 root root       7216 Oct 31  2018 python2.7

$ python
Python 2.7.5 (default, Oct 30 2018, 23:45:53) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Python compiler package related to the installation of 3 ,

$ sudo yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make libffi-devel

Compile and install Python ,

$ tar -zxvf Python-3.7.3.tgz
$ cd Python-3.7.3
$ ./configure prefix=/data1/python3
$ make && make install

Add soft links ,

$ sudo mv /usr/bin/python /usr/bin/python.bak
$ sudo ln -s /data1/python3/bin/python3.7 /usr/bin/python3.7
$ sudo ln -s /usr/bin/python3.7 /usr/bin/python3
$ sudo ln -s /usr/bin/python3 /usr/bin/python
# 验证是否安装成功
$ Python -V

Yum configuration change ,
because yum Python 2 will need to perform, otherwise it will lead yum not working.

$ sudo vi /usr/bin/yum
#!/usr/bin/python 修改为 #!/usr/bin/python2.7

$ sudo vi /usr/libexec/urlgrabber-ext-down
#!/usr/bin/python 修改为 #!/usr/bin/python2.7

Nvidia graphics driver

1340192-84d4114be261dcb5
CUDA Toolkit and Compatible Driver Versions
1340192-06ed595948aaf72a
Meta Packages Available for CUDA 10.0
  • 如果不小心通过NVIDIA-Linux-x86_64-410.104.run安装了驱动,可以执行NVIDIA-Linux-x86_64-410.104.run -uninstall卸载。

cuda

$ sh cuda_10.0.130_410.48_linux.run
  • .bashrc进行环境变量设置,
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  • 验证是否安装成功,
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

cuDNN

下载和cuda版本对应的cuDNN【cuDNN v7.6.0 (May 20, 2019), for CUDA 10.0】,并将cudnn文件复制到cuda目录。下载地址为:
https://developer.nvidia.com/rdp/cudnn-download

$ tar -zxvf cudnn-10.0-linux-x64-v7.6.0.64.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/

tensorflow-gpu

Tensorflow的安装要依赖很多其他安装包,先安装底层依赖的安装包,最后安装TensorFlow,在tensorFlow的setup.py找到直接和间接的依赖项,其中在setup.py中,1.13.1的依赖项如下,

_VERSION = '1.13.1'

REQUIRED_PACKAGES = [
    'absl-py >= 0.7.0',
    'astor >= 0.6.0',
    'gast >= 0.2.0',
    'google_pasta >= 0.1.6',
    'keras_applications >= 1.0.6',
    'keras_preprocessing >= 1.0.5',
    'numpy >= 1.14.5, < 2.0',
    'six >= 1.10.0',
    'protobuf >= 3.6.1',
    'tensorboard >= 1.13.0, < 1.14.0',
    'tensorflow_estimator >= 1.13.0rc0, < 1.14.0rc0',
    'termcolor >= 1.1.0',
    'wrapt >= 1.11.1',
]

# python3 requires wheel 0.26
if sys.version_info.major == 3:
REQUIRED_PACKAGES.append('wheel >= 0.26')

if sys.byteorder == 'little':
  # grpcio does not build correctly on big-endian machines due to lack of
  # BoringSSL support.
  # See https://github.com/tensorflow/tensorflow/issues/17882.
  REQUIRED_PACKAGES.append('grpcio >= 1.8.6')

# python3 requires wheel 0.26
if sys.version_info.major == 3:
  REQUIRED_PACKAGES.append('wheel >= 0.26')
else:
  REQUIRED_PACKAGES.append('wheel')
  # mock comes with unittest.mock for python3, need to install for python2
  REQUIRED_PACKAGES.append('mock >= 2.0.0')

在实际安装过程中,还提示缺少h5py、Markdown、Werkzeug,在https://pypi.org/中下载合适的版本。

  • 以whl结尾的文件可以使用pip命令安装
$ pip install xxx.whl
  • 以.tar.gz结尾的文件先解压再进入解压文件夹,使用python setup install安装,示例如下,
$ tar zxvf gast-0.2.0.tar.gz
$ cd gast-0.20
$ python setup.py install

安装完所有直接和间接依赖的包以后,下载TensorFlow的gpu版本安装,

$ pip install tensorflow_gpu-1.13.1-cp37-cp37m-manylinux1_x86_64.whl

验证安装是否成功,

>>> import tensorflow as tf
ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found 

内心是崩溃的,革命尚未成功,同志仍需努力。

升级glibc

首先查看现有情况,

$ ll /lib64/libc.so.6

$ strings /lib64/libc.so.6 | grep GLIBC

libc.so.6是一个软连接,当前的glibc是2.19版本,报错信息为GLIBC_2.13找不到,所以需至少升级到2.23。

$ tar -xvf glibc-2.23.tar.gz
$ mkdir glibc-build-2.23
$ cd glibc-build-2.23
$ ../glibc-2.23/configure --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include $ --with-binutils=/usr/bin 
$ make
$ sudo make install

$ ll -lt /lib64/libc*

首先make成功后,在glibc-build-2.23目录下,编译出了一个新的libc.so.6(glibc-build-2.23/libc.so.6), 我们会发现这实际上也是一个软连接,真实的lib文件是libc.so

$ ll libc.so.6
$ strings libc.so | grep GLIBC

然后更新系统的库,这里要注意,更新系统里的链接(我的是/lib64/libc.so.6)很容易出错,一般删除旧链接,建立新链接。但删除旧链接后,很多命令直接不能用了,因为此时系统中找不到glibc的库了。这个时候就需要临时指定一个glibc库,方法如下(libc.so改个名以便好以后更新的其他版本区分):

$ cp glibc-build-2.23/libc.so /lib64/libc-2.23.so

$ sudo rm -f /lib64/libc.so.6
$ LD_PRELOAD=/lib64/libc-2.23.so
$ sudo ln -s /lib64/libc-2.23.so  lib64/libc.so.6

$ strings /lib64/libc.so.6 | grep GLIBC

链接更新成功。

再次验证TensorFlow的安装,需升级gcc版本。

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found

升级gcc

CentOS 7.2默认的GCC版本是4.8.5,

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)

刚开始发现gcc-4.9.x版本含GLIBCXX_3.4.20,所以第一次编译安装了gcc-4.9.4版本,崩溃的发现,

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found

决定直接上高版本7.x,首先做一些准备工作,

$ tar xjvf gcc-7.4.0.tar.bz2 && cd gcc-7.4.0
  • 编译安装gcc需要其他组件,从./contrib/download_prerequisites查找指定的版本,自行下载这些组件到目录gcc-7.4.0,将download_prerequisites里面的wget全部注释。
gmp='gmp-6.1.0.tar.bz2'
mpfr='mpfr-3.1.4.tar.bz2'
mpc='mpc-1.0.3.tar.gz'
isl='isl-0.16.1.tar.bz2'

再执行:

$ ./contrib/download_prerequisites
  • 接下来就可以configure了,新建一个目录存在编译文件,默认安装目录为/usr/local,可以使用-prefix修改自定义路径,编译选项,参考默认的gcc-4.8.5的编译选项。建议不使用make -j进行编译,可以缩短编译时间,但可能会编译失败。
$ mkdir gcc-7.4.0-build && cd gcc-7.4.0-build
$ ../configure --prefix=/usr/local/gcc-7.4.0 --enable-bootstrap --enable-build-with-cxx --enable-cloog-backend=isl --disable-libjava-multilib --enable-checking=release --enable-gold --enable-ld --enable-libada --enable-libssp --enable-lto --enable-objc-gc --enable-vtable-verify --enable-checking=release --enable-languages=c,c++,objc,obj-c++,fortran --disable-multilib
$ make -j8
$ make install

配置库文件和头文件路径,在.bashrc配置,

export PATH=/usr/local/gcc-7.4.0/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64/:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=/usr/local/include/:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/include/:$CPLUS_INCLUDE_PATH

验证gcc版本,

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.4.0/libexec/gcc/x86_64-redhat-linux/7.4.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr/local/gcc-7.4.0 --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 7.4.0 (GCC) 

The next problem is not found 'GLIBCXX_3.4.21', after the upgrade gcc, generated dynamic library does not replace older versions of gcc dynamic library, will replace the latest version of gcc dynamic library systems in older versions of dynamic libraries can be solved,
run the following command to check the dynamic libraries,

$ strings /lib64/libstdc++.so.6 | grep GLIBC

GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBC_2.3
GLIBC_2.2.5
GLIBC_2.14
GLIBC_2.4
GLIBC_2.3.2

See from the output and there is no 'GLIBCXX_3.4.21', loaded runtime dynamic library is old, you need to link to the current link to a file into the latest DLL address,

$ cp /usr/local/gcc-7.4.0/lib64/libstdc++.so.6.0.24 /lib64
$ cd /lib64
$ rm -rf libstdc++.so.6
$ ln -s libstdc++.so.6.0.24 libstdc++.so.6

Verify, find success.

$ strings /lib64/libstdc++.so.6 | grep GLIBC

Finally, verify whether TensorFlow calculated using the gpu,

>>> import tensorflow as tf
>>> print(tf.__version__)
1.13.1
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Micro-channel public number "padluo" self-cultivation share data scientists, since met, it is better to grow together.

1340192-ef6c2a9c812d9d73.jpg
data analysis

Telegraph readers exchange group

https://t.me/sspadluo


Planet knowledge exchange group

1340192-b0b17d26f88a7772
Planet knowledge exchange group

Guess you like

Origin blog.csdn.net/weixin_34146986/article/details/91033512