PphhCentOS process of installing tensorflow-gpu paved pit, the man was still conditional compliance Tensorflow official recommendations, directly on Ubuntu.
Below contains several main branches:
- Install Python 3
- Graphics Driver
- CUDA / cuDNN
- tensorflow-gpu
- glibc和gcc
CentOS 7.2 is installed Python 3.7
CentOS 7.2 is the default version of Python 2.7.5, this uses source compiler installed Python 3.7.
$ ll /usr/bin/ | grep python
-rwxr-xr-x 1 root root 11312 Nov 14 2018 abrt-action-analyze-python
-rwxr-xr-x 1 root root 7280 Nov 3 2018 pmpython
lrwxrwxrwx 1 root root 7 May 24 12:39 python -> python2
lrwxrwxrwx 1 root root 9 May 24 12:39 python2 -> python2.7
-rwxr-xr-x 1 root root 7216 Oct 31 2018 python2.7
$ python
Python 2.7.5 (default, Oct 30 2018, 23:45:53)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Python compiler package related to the installation of 3 ,
$ sudo yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make libffi-devel
Compile and install Python ,
$ tar -zxvf Python-3.7.3.tgz
$ cd Python-3.7.3
$ ./configure prefix=/data1/python3
$ make && make install
Add soft links ,
$ sudo mv /usr/bin/python /usr/bin/python.bak
$ sudo ln -s /data1/python3/bin/python3.7 /usr/bin/python3.7
$ sudo ln -s /usr/bin/python3.7 /usr/bin/python3
$ sudo ln -s /usr/bin/python3 /usr/bin/python
# 验证是否安装成功
$ Python -V
Yum configuration change ,
because yum Python 2 will need to perform, otherwise it will lead yum not working.
$ sudo vi /usr/bin/yum
#!/usr/bin/python 修改为 #!/usr/bin/python2.7
$ sudo vi /usr/libexec/urlgrabber-ext-down
#!/usr/bin/python 修改为 #!/usr/bin/python2.7
Nvidia graphics driver
- Nvidia graphics drivers can go to the official website to download and install, and the correspondence between the cuda driver version, see https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html
- cuda installation package also included depends driven directly driven mounting cuda problem can be solved. Reference: https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html
- 如果不小心通过
NVIDIA-Linux-x86_64-410.104.run
安装了驱动,可以执行NVIDIA-Linux-x86_64-410.104.run -uninstall
卸载。
cuda
安装cuda方法,参考:https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html
rpm方式安装见Package Manager Installation,runfile方式安装见Runfile Installation,这里采用runfile安装。
$ sh cuda_10.0.130_410.48_linux.run
- .bashrc进行环境变量设置,
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
- 验证是否安装成功,
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
cuDNN
下载和cuda版本对应的cuDNN【cuDNN v7.6.0 (May 20, 2019), for CUDA 10.0】,并将cudnn文件复制到cuda目录。下载地址为:
https://developer.nvidia.com/rdp/cudnn-download
$ tar -zxvf cudnn-10.0-linux-x64-v7.6.0.64.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
tensorflow-gpu
Tensorflow的安装要依赖很多其他安装包,先安装底层依赖的安装包,最后安装TensorFlow,在tensorFlow的setup.py
找到直接和间接的依赖项,其中在setup.py
中,1.13.1的依赖项如下,
_VERSION = '1.13.1'
REQUIRED_PACKAGES = [
'absl-py >= 0.7.0',
'astor >= 0.6.0',
'gast >= 0.2.0',
'google_pasta >= 0.1.6',
'keras_applications >= 1.0.6',
'keras_preprocessing >= 1.0.5',
'numpy >= 1.14.5, < 2.0',
'six >= 1.10.0',
'protobuf >= 3.6.1',
'tensorboard >= 1.13.0, < 1.14.0',
'tensorflow_estimator >= 1.13.0rc0, < 1.14.0rc0',
'termcolor >= 1.1.0',
'wrapt >= 1.11.1',
]
# python3 requires wheel 0.26
if sys.version_info.major == 3:
REQUIRED_PACKAGES.append('wheel >= 0.26')
if sys.byteorder == 'little':
# grpcio does not build correctly on big-endian machines due to lack of
# BoringSSL support.
# See https://github.com/tensorflow/tensorflow/issues/17882.
REQUIRED_PACKAGES.append('grpcio >= 1.8.6')
# python3 requires wheel 0.26
if sys.version_info.major == 3:
REQUIRED_PACKAGES.append('wheel >= 0.26')
else:
REQUIRED_PACKAGES.append('wheel')
# mock comes with unittest.mock for python3, need to install for python2
REQUIRED_PACKAGES.append('mock >= 2.0.0')
在实际安装过程中,还提示缺少h5py、Markdown、Werkzeug,在https://pypi.org/中下载合适的版本。
- 以whl结尾的文件可以使用pip命令安装
$ pip install xxx.whl
- 以.tar.gz结尾的文件先解压再进入解压文件夹,使用
python setup install
安装,示例如下,
$ tar zxvf gast-0.2.0.tar.gz
$ cd gast-0.20
$ python setup.py install
安装完所有直接和间接依赖的包以后,下载TensorFlow的gpu版本安装,
$ pip install tensorflow_gpu-1.13.1-cp37-cp37m-manylinux1_x86_64.whl
验证安装是否成功,
>>> import tensorflow as tf
ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
内心是崩溃的,革命尚未成功,同志仍需努力。
升级glibc
首先查看现有情况,
$ ll /lib64/libc.so.6
$ strings /lib64/libc.so.6 | grep GLIBC
libc.so.6是一个软连接,当前的glibc是2.19版本,报错信息为GLIBC_2.13找不到,所以需至少升级到2.23。
$ tar -xvf glibc-2.23.tar.gz
$ mkdir glibc-build-2.23
$ cd glibc-build-2.23
$ ../glibc-2.23/configure --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include $ --with-binutils=/usr/bin
$ make
$ sudo make install
$ ll -lt /lib64/libc*
首先make
成功后,在glibc-build-2.23
目录下,编译出了一个新的libc.so.6
(glibc-build-2.23/libc.so.6
), 我们会发现这实际上也是一个软连接,真实的lib文件是libc.so
。
$ ll libc.so.6
$ strings libc.so | grep GLIBC
然后更新系统的库,这里要注意,更新系统里的链接(我的是/lib64/libc.so.6)很容易出错,一般删除旧链接,建立新链接。但删除旧链接后,很多命令直接不能用了,因为此时系统中找不到glibc的库了。这个时候就需要临时指定一个glibc库,方法如下(libc.so改个名以便好以后更新的其他版本区分):
$ cp glibc-build-2.23/libc.so /lib64/libc-2.23.so
$ sudo rm -f /lib64/libc.so.6
$ LD_PRELOAD=/lib64/libc-2.23.so
$ sudo ln -s /lib64/libc-2.23.so lib64/libc.so.6
$ strings /lib64/libc.so.6 | grep GLIBC
链接更新成功。
再次验证TensorFlow的安装,需升级gcc版本。
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found
升级gcc
CentOS 7.2默认的GCC版本是4.8.5,
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)
刚开始发现gcc-4.9.x版本含GLIBCXX_3.4.20
,所以第一次编译安装了gcc-4.9.4版本,崩溃的发现,
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found
决定直接上高版本7.x,首先做一些准备工作,
- 到官网下载gcc,https://ftp.gnu.org/gnu/gcc/,
$ tar xjvf gcc-7.4.0.tar.bz2 && cd gcc-7.4.0
- 编译安装gcc需要其他组件,从
./contrib/download_prerequisites
查找指定的版本,自行下载这些组件到目录gcc-7.4.0,将download_prerequisites
里面的wget全部注释。
gmp='gmp-6.1.0.tar.bz2'
mpfr='mpfr-3.1.4.tar.bz2'
mpc='mpc-1.0.3.tar.gz'
isl='isl-0.16.1.tar.bz2'
再执行:
$ ./contrib/download_prerequisites
- 接下来就可以configure了,新建一个目录存在编译文件,默认安装目录为/usr/local,可以使用-prefix修改自定义路径,编译选项,参考默认的gcc-4.8.5的编译选项。建议不使用
make -j
进行编译,可以缩短编译时间,但可能会编译失败。
$ mkdir gcc-7.4.0-build && cd gcc-7.4.0-build
$ ../configure --prefix=/usr/local/gcc-7.4.0 --enable-bootstrap --enable-build-with-cxx --enable-cloog-backend=isl --disable-libjava-multilib --enable-checking=release --enable-gold --enable-ld --enable-libada --enable-libssp --enable-lto --enable-objc-gc --enable-vtable-verify --enable-checking=release --enable-languages=c,c++,objc,obj-c++,fortran --disable-multilib
$ make -j8
$ make install
配置库文件和头文件路径,在.bashrc配置,
export PATH=/usr/local/gcc-7.4.0/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64/:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=/usr/local/include/:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/include/:$CPLUS_INCLUDE_PATH
验证gcc版本,
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.4.0/libexec/gcc/x86_64-redhat-linux/7.4.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr/local/gcc-7.4.0 --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 7.4.0 (GCC)
The next problem is not found 'GLIBCXX_3.4.21', after the upgrade gcc, generated dynamic library does not replace older versions of gcc dynamic library, will replace the latest version of gcc dynamic library systems in older versions of dynamic libraries can be solved,
run the following command to check the dynamic libraries,
$ strings /lib64/libstdc++.so.6 | grep GLIBC
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBC_2.3
GLIBC_2.2.5
GLIBC_2.14
GLIBC_2.4
GLIBC_2.3.2
See from the output and there is no 'GLIBCXX_3.4.21', loaded runtime dynamic library is old, you need to link to the current link to a file into the latest DLL address,
$ cp /usr/local/gcc-7.4.0/lib64/libstdc++.so.6.0.24 /lib64
$ cd /lib64
$ rm -rf libstdc++.so.6
$ ln -s libstdc++.so.6.0.24 libstdc++.so.6
Verify, find success.
$ strings /lib64/libstdc++.so.6 | grep GLIBC
Finally, verify whether TensorFlow calculated using the gpu,
>>> import tensorflow as tf
>>> print(tf.__version__)
1.13.1
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
Micro-channel public number "padluo" self-cultivation share data scientists, since met, it is better to grow together.
Telegraph readers exchange group
Planet knowledge exchange group