Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED. Possibly insufficient driver version: 3

1 安装了cuda-9.1(7.1.2.21-1+cuda9.1) . 但cudnn版本太高了(7.1.4.18-1+cuda9.2)，需要降级。

2 报错情况：

root@0d4:~/net# ./run_and_time.sh 2 | tee benchmark-`date "+%F-%T"`.log
STARTING TIMING RUN AT 2018-06-21 03:47:10 AM
running benchmark with seed 2
INFO:tensorflow:Using config: {'_master': '', '_global_id_in_cluster': 0, '_log_step_count_steps': 100, '_num_worker_replicas': 1, '_is_chief': True, '_service': None
, '_tf_random_seed': 2, '_save_checkpoints_secs': 600, '_train_distribute': <tensorflow.contrib.distribute.python.one_device_strategy.OneDeviceStrategy object at 0x7f0f934da080>, '_session_config': allow_soft_placement: true, '_task_type': 'worker', '_save_summary_steps': 100, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_keep_checkpoint_max': 5, '_num_ps_replicas': 0, '_clus
ter_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0f7f8d7240>, '_save_checkpoints_steps': None, '_evaluation_master': '', '_device_fn': None, '_model_dir': '/tmp/imn_example'}INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2018-06-21 03:47:25.068465: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: A
VX2 FMA2018-06-21 03:47:25.785164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.562
pciBusID: 0000:86:00.0
totalMemory: 11.17GiB freeMemory: 10.99GiB
2018-06-21 03:47:25.785244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-06-21 03:47:26.224662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-21 03:47:26.224748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-06-21 03:47:26.224776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-06-21 03:47:26.225463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1064
7 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:86:00.0, compute capability: 3.7)INFO:tensorflow:Restoring parameters from /tmp/imn_example/model.ckpt-2674
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 2674 into /tmp/imn_example/model.ckpt.
2018-06-21 03:47:35.910664: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-06-21 03:47:35.910895: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 387.26.0

按照建议。Doesn't work with cudnn v7.1.1.5，和 Upgrade to latest cuDNN v7 (7.1.3.16)

ARG repository
FROM ${repository}:9.1-devel-ubuntu16.04
LABEL maintainer "NVIDIA CORPORATION <[email protected]>"

ENV CUDNN_VERSION 7.1.2.21
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"

RUN apt-get update && apt-get install -y --no-install-recommends \
            libcudnn7=$CUDNN_VERSION-1+cuda9.1 \
            libcudnn7-dev=$CUDNN_VERSION-1+cuda9.1 && \
    rm -rf /var/lib/apt/lists/*

于是执行：

root@0d4660a33475:~/resnet# apt-get update && apt-get install -y --allow-downgrades --no-install-recommends libcudnn7=7.1.2.21-1+cuda9.1 libcudnn7-dev=7.1.2.21-1+cuda
9.1 && rm -rf /var/lib/apt/lists/*Hit:1 http://security.ubuntu.com/ubuntu xenial-security InRelease
Hit:2 http://archive.ubuntu.com/ubuntu xenial InRelease
Hit:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease
Ign:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease
Hit:5 http://archive.ubuntu.com/ubuntu xenial-backports InRelease
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release
Hit:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  libcudnn7-dev
The following packages will be DOWNGRADED:
  libcudnn7
0 upgraded, 1 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
Need to get 256 MB of archives.
After this operation, 340 MB of additional disk space will be used.
Get:1 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  libcudnn7 7.1.2.21-1+cuda9.1 [133 MB]
Get:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  libcudnn7-dev 7.1.2.21-1+cuda9.1 [123 MB]                              
Fetched 256 MB in 1min 59s (2143 kB/s)                                                                                                                               
debconf: delaying package configuration, since apt-utils is not installed
dpkg: warning: downgrading libcudnn7 from 7.1.4.18-1+cuda9.2 to 7.1.2.21-1+cuda9.1
(Reading database ... 14749 files and directories currently installed.)
Preparing to unpack .../libcudnn7_7.1.2.21-1+cuda9.1_amd64.deb ...
Unpacking libcudnn7 (7.1.2.21-1+cuda9.1) over (7.1.4.18-1+cuda9.2) ...
Selecting previously unselected package libcudnn7-dev.
Preparing to unpack .../libcudnn7-dev_7.1.2.21-1+cuda9.1_amd64.deb ...
Unpacking libcudnn7-dev (7.1.2.21-1+cuda9.1) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
Setting up libcudnn7 (7.1.2.21-1+cuda9.1) ...
Setting up libcudnn7-dev (7.1.2.21-1+cuda9.1) ...
update-alternatives: using /usr/include/x86_64-linux-gnu/cudnn_v7.h to provide /usr/include/cudnn.h (libcudnn) in auto mode
Processing triggers for libc-bin (2.23-0ubuntu10) ...

done!

Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED. Possibly insufficient driver version: 3

猜你喜欢