cmake ..
出現問題
-- CUDA detected: 9.0
-- Added CUDA NVCC flags for: sm_61
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARIES)
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
cmake/Modules/FindNCCL.cmake:21 (find_package_handle_standard_args)
cmake/Dependencies.cmake:89 (find_package)
CMakeLists.txt:46 (include)
0.可设置环境变量CUDA_VISIBLE_DEVICES,指明可见的cuda设备
方法1: 在/etc/profile或~/.bashrc的配置文件中配置环境变量(/etc/profile影响所有用户,~/.bashrc影响当前用户使用的bash shell)
在~/.bashrc文件末尾添加以下行:
export CUDA_VISIBLE_DEVICES=0,1,2,3 ##仅显卡设备0,1GPU可见。可用的GPU可通过nvidia-smi -L命令查看
:wq保存并退出
source ~/.bashrc使配置文件生效
1.服務器用戶(無root)下安裝ncll
1.下载编译
https://github.com/NVIDIA/nccl
cd nccl
make CUDA_HOME=/user/local/cuda test #注意自己的cuda路径
2.测试和配置环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./build/lib
./build/test/single/all_reduce_test
./build/test/single/all_reduce_test 10000000
#注意看出現幾個gpu,如果沒出現完整,就設置“CUDA_VISIBLE_DEVICES,指明可见的cuda设备”
在~/.bashrc文件末尾添加以下行:
export LD_LIBRARY_PATH=/home/neu105/nccl/build/lib:$LD_LIBRARY_PATH
:wq保存并退出
source ~/.bashrc使配置文件生效
3.設置caffe/Makefile.config
1.
取消註釋
USE_NCLL := 1
2.最終要的一步
如果你是在用戶目錄下安裝的NCCL,那麼你需要更改”caffe-master/cmake/Modules/FindNCCL.cmake”
set(NCCL_INC_PATHS
/usr/include
/usr/local/include
/home/neu105/nccl/build/include #增加ncll地址
$ENV{NCCL_DIR}/include
)
set(NCCL_LIB_PATHS
/lib
/lib64
/usr/lib
/usr/lib64
/usr/local/lib
/usr/local/lib64
/home/neu105/nccl/build/lib #增加ncll地址
$ENV{NCCL_DIR}/lib
)
3.在Makefile.config中更改USE_NCCL 后,CMakeLists.txt中的配置是没有发生改变的,手动将OFF改为ON以后,保存再使用cmake编译caffe。
https://blog.csdn.net/u011394059/article/details/73732707
4.重新編譯安裝caffe
參考:
https://blog.csdn.net/u012235003/article/details/54576840
https://blog.csdn.net/u011394059/article/details/73732707