CentOS7搭建Tensorflow计算环境(cuda+cudnn+jupyterlab(Anaconda3)+pytorch+Tensorflow)

CentOS7搭建Tensorflow计算环境

注意:本文为安装后根据回忆编写,疏漏之处在所难免,仅供参考,后期进行测试

0. 软硬件准备

  • 服务器1台(500G硬盘,100G内存,GPU K40c一块, 内存硬盘GPU可根据需要选择)
  • CentOS-7-x86_64-DVD-1804.iso
  • Anaconda3-2019.03-Linux-x86_64.sh
  • cuda_9.0.176 384.81 linux.run
  • cudnn-9.0-linux-x64-v7.6.5.32.solitairetheme8
  • kernel-devel-3.10.0-862.3.2.el7.x86_64.rpm
  • kernel-headers-3.10.0-862.el7.x86_64.rpm
  • NVIDIA-Linux-x86_64-384.183.run
  • tensorflow_gpu-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl
  • torch-1.3.1-cp37-cp37m-manylinux1_x86_64.whl

    1. 安装CentOS7

  1. 创建虚拟机时,添加PCI设备,即显卡K40c
  2. 划分分区时根目录划分的大一些,350G以上
  3. 正常安装完成后修改静态ip及网关
    vi /etc/sysconfig/network-scripts/ifcfg-ens192
# 把文件内容修改为
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=eno192 #使用原NAME
UUID=ae0965e7-22b9-45aa-8ec9-3f0a20a85d11# 使用原UUID
ONBOOT=yes
IPADDR0=192.168.1.30  #根据需要填写
PREFIXO0=24
GATEWAY0=192.168.1.1
DNS1=8.8.8.8
DNS2=8.8.4.4
  1. 一些安装前的准备工作
  • 使用Xshell等工具内网连接该虚拟机(非必须)
  • 把需要的软件从其他机器传到待安装机器上
scp -r software/ 192.168.1.30:root/
  • 检测显卡
lspci
  • 根据显卡下载驱动(已经准备好)
    https://www.nvidia.cn/Download/index.aspx

2. 安装Anaconda3

  1. 安装依赖包bunzip2
yum install -y bzip2
  1. 安装Anaconda3
bash Anaconda3-2019.03-Linux-x86_ 64.sh
  1. 修改并执行.bashrc文件
vi /root/.bashrc
# 添加export PATH=/root/anaconda3/bin:$PATH
source ~./bashrc
python   # 验证python版本
  1. 配置远程访问jupyter notebook的功能
  • 生成配置文件
jupyter notebook --generate-config
  • 打开ipython,创建一个密文的密码,以123456为例
ipython
from notebook.auth import passwd
passwd()
Enter password: 123456
Verify password: 123456
'sha1:e00ee9ab9a42:22e8c0dc771612348eeee698cde8ec77fba42e7f'
exit()
  • 把生成的密文‘sha:xx…’复制下来,修改默认配置文件
    vi ~/.jupyter/jupyter_notebook_config.py
# 将ip设置为*,意味允许任何IP访问
c.NotebookApp.ip = '*'
# 这里的密码就是上边我们生成的那一串
c.NotebookApp.password = 'sha1:xx...'
# 服务器上并没有浏览器可以供Jupyter打开
c.NotebookApp.open_browser = False
# 监听端口设置为8888或其他自己喜欢的端口
c.NotebookApp.port = 8888
# 允许远程访问
c.NotebookApp.allow_remote_access = True
# 文件目录位置
c.NotebookApp.notebook_dir = u'/root/notebooks/'
  1. 启动jupyter notebook:
nohup jupyter lab --allow-root &
  1. 登录jupyter lab
http://服务器ip地址:8888/lab
  1. 通过路由器,内网映射到外网,可外网访问jypyterlab

3. 安装显卡驱动

  1. 查看我的内核版本:
[root@host8 ~]# uname -r
3.10.0-862.el7.x86_64
  1. 根据内核版本下载依赖包
  • kernel-devel-3.10.0-862.3.2.el7.x86_ 64.rpm
  • kernel-headers-3.10.0-862.el7.x86_64.rpm
    注:el7代表CentOS7,3.10.0-862需要和内核版本一致
rpm -ivh kernel-devel-3.10.0-862.3.2.el7.x86_ 64.rpm
rpm -ivh kernel-headers-3.10.0-862.el7.x86_64.rpm
  1. 阻止 nouveau 模块的加载
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf
  1. 重新建立initramfs image文件
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
  1. 执行安装脚本
chmod u+x NVIDIA-Linux-x86_64-415.13.run
./NVIDIA-Linux-x86_ 64-384.183.run --kernel-source-path=/usr/src/kernels/3.10.0-862.el7.x86_64

6.检测安装结果

nvidia-smi
Sun Nov 24 21:25:10 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.183      Driver Version: 384.183      CUDA Version: 9.0      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          Off  | 00000000:0B:00.0 Off |                    0 |
| 31%   68C    P0   135W / 235W |  10378MiB / 11439MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      9632      C   python                                     10365MiB |
+-----------------------------------------------------------------------------+

4. 安装cuda

  1. 安装cuda
cd %filename
sh cuda_9.0.176_384.81_linux.run
  1. 按照提示选择
# q可以直接到达协议底部
# 是否安装NVIDIA driver? no
# 其他选yes或默认
  1. 添加环境变量
    vi ~/.bashrc # 编辑
在文档末尾添加:
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} # 保存退出
  1. 环境变量生效
    $ source ~/.bashrc
  2. 验证cuda安装结果
cuda-install-samples-9.0.sh ~
cd ~/NVIDIA_CUDA-9.0_Samples/5_Simulations/nbody
make
./nbody
  1. 验证驱动版本和CUDA版本:
cat /proc/driver/nvidia/version 
nvcc -V

5. 安装cudnn

  1. 下载cudnn(已准备好)
  2. 安装cudnn
cd %filepath
mv cudnn-9.0-linux-x64-v7.6.5.32.solitairetheme8 cudnn-9.0-linux-x64-v7.6.5.32.tgz
tar -xzvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
  1. 验证cudnn
...似乎没有验证?找不到了

6. 安装TensorFlow、Pytorch

  1. 准备安装包
  • tensorflow_ gpu-2.0.0-cp37-cp37m-manylinux2010 x86 _64.whl
  • torch-1.3.1-cp37-cp37m-manylinux1_ x86 _64.whl
  1. 安装TensorFlow
pip install tensorflow_ _gpu-2.0.0-cp37-cp37m-manylinux2010_ x86 _64.whl 
  1. 验证TensorFlow
#Open a new terminal if not done yet
activate tensorflow
python
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
  1. 安装Pytorch
pip install torch-1.3.1-cp37-cp37m-manylinux1_ x86 _64.whl 
  1. 验证Pytorch
$ python
>>> import torch
>>> torch.__version__

附录1:CentOS7上lvm分区调整(resize2fs: Bad magic number in super-block while trying to open ...)

  1. 解决思路:
    ①确认分区类型为lvm
    ②查看到home分区有大量闲置空间,决定将home的空间分配给 /
    卸载home >> 删除home >> 将home的空间添加到 " / " >> 重新分配home >> 格式化home >> 完成

  2. 会用到的命令:
df -h                     # 查看磁盘空间
lsblk                    # 查看块设备详情
fdisk -l                        # 查看分区详情
lvremove\lvcreate               # 逻辑卷删除/创建
lvdisplay\vgdisplay\pvdisplay   #查看逻辑卷/卷组/物理卷
xfs_growfs                      # 加载xfs_growfs

3.操作过程:

  • 分析:查看分区详情,看到sda2为lvm逻辑卷,所以可以通过将home的空间转移到根分区
# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
fd0               2:0    1     4K  0 disk
sda               8:0    0   300G  0 disk
├─sda1            8:1    0   500M  0 part /boot
└─sda2            8:2    0 299.5G  0 part
  ├─centos-root 253:0    0    50G  0 lvm  /
  ├─centos-swap 253:1    0   9.8G  0 lvm  [SWAP]
  └─centos-home 253:2    0 239.6G  0 lvm  /home
sr0              11:0    1  1024M  0 rom
  • /home备份
# mkdir /tmp/home
# cp -r /home/* /tmp/home
  • umount卸载
# umount /home
umount: /home: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
# 如果提示busy,则使用fuser解除占用
# fuser -m -v -i -k /home
  • 删除home逻辑卷(lv),将home的空间腾出来到卷组(vg)
# lvremove /dev/mapper/centos-home
Do you really want to remove active logical volume home? [y/n]: y
  Logical volume "home" successfully removed
  • 重新调整 / 的大小
# lvextend -L 250G /dev/mapper/centos-root  # 调整到250G
Size of logical volume centos/root changed from 50.00 GiB (12800 extents) to 250.00 GiB (64000 extents).
Logical volume root successfully resized.
  • xfs_growfs刷新
# xfs_growfs /dev/mapper/centos-root
meta-data=/dev/mapper/centos-root isize=256    agcount=4, agsize=3276800 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=13107200, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=6400, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 13107200 to 65536000
  • 将剩下的空间重新划分到home中
# lvcreate -l +100%free -n home centos      # -n 指定lv的名字,centos是vg的名字 
  Logical volume "home" created.
  • 创建完成别忘了格式化
# mkfs.xfs /dev/centos/home
meta-data=/dev/centos/home       isize=256    agcount=4, agsize=2601472 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=10405888, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=5081, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
  • 重新mount并查看
# mount /dev/mapper/centos-home /home
  • 完成
(base) [root@localhost ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  400G  114G  287G  29% /
devtmpfs                  50G     0   50G   0% /dev
tmpfs                     50G     0   50G   0% /dev/shm
tmpfs                     50G  9.0M   50G   1% /run
tmpfs                     50G     0   50G   0% /sys/fs/cgroup
/dev/sda1                950M  164M  787M  18% /boot
tmpfs                    9.9G   36K  9.9G   1% /run/user/0
/dev/mapper/centos-home   20G   33M   20G   1% /home
  • 把home备份还原

附录2:

  1. 下载tensorflow: https://pypi.org
  2. 下载cudnn: https://developer.nvidia.com/cudnn
  3. 下载Pytorch: https://developer.nvidia.com/cuda-toolkit
    ...

参考文献

  1. 从零开始安装Centos7+gpu驱动+cuda9.0+cudnn+theano
  2. centos7装NVIDIA显卡驱动
  3. Ubuntu16.04+Anaconda+Cuda9.0+cudnn7.0+Tensorflow+Pytorch 深度学习环境配置
  4. CentOS7.3安装NVIDIA-1080ti驱动、cuda、cudnn、TensorFlow
  5. centos7上基于python3.5安装Tensorflow1.9.0
  6. 在centos7上搭建jupyter lab服务器
  7. Jupyter修改默认保存目录 (linux系统)
  8. CentOS7上lvm分区调整

猜你喜欢

转载自www.cnblogs.com/xianyuxianyuxian/p/11982831.html