1. Main content of the article
1. Upgrade from cuda9.2 to cuda10.2 under Windows10
2. Installation of cudnn under Windows 10
3. Performance comparison of cuda9.2, cuda10.2, cudnn7.2.1, and cudnn7.6.5 in DeepLearning4j
Second, the installation process
1. Machine environment description:
CPU: i7 8700 6 cores 12 threads
GPU: GTX 1070Ti
Memory: 16G
Note: cuda and cudnn have been installed in the machine, the versions are: cuda9.2.148, cudnn7.2.1
2. Preparation of installation package
(1), cuda download
cuda download address: https://developer.nvidia.com/cuda-toolkit-archive, the download version here is: 10.2, because dl4j only supports up to 10.2
The installation machine is windows10 64-bit, select the 64-bit version of cuda win10, the installation mode selects local, and download the installation package to the local installation
(2), cuDNN download
Download address: https://developer.nvidia.com/rdp/cudnn-archive, here select the latest cudnn version matching cuda10.2: 7.6.5, the reason why 8.0.2 is not selected here is that dl4j-beta6 does not support 8 .x version.
The downloaded installation package is as follows:
3. Install cuda10.2
Directly follow the default installation path, the next step, agree and continue -> streamlined installation
enter the installation phase
After the installation, under the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA directory, the v10.2 folder appears, as shown below:
Note: There is v9.2 here, which is the previously installed cuda version, set the environment variables, you can also switch to the 9.2 version
Open the cmd console, enter the nvcc --version command and press Enter, as shown in the figure below, indicating that version 10.2 is successfully installed
At the same time, use the latest example test of dl4j to verify whether cuad10.2 is available, example address: https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-template
Modify the maven dependency to the following configuration, in which the dl4j-master.version is modified to: 1.0.0-beta6, and the nd4j-cuda-10.2-platform and deeplearning4j-cuda-10.2 dependencies are added
<properties>
<dl4j-master.version>1.0.0-beta6</dl4j-master.version>
<logback.version>1.2.3</logback.version>
<java.version>1.8</java.version>
<maven-shade-plugin.version>2.4.3</maven-shade-plugin.version>
</properties>
<dependencies>
<!-- deeplearning4j-core: contains main functionality and neural networks -->
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-core</artifactId>
<version>${dl4j-master.version}</version>
</dependency>
<!--
ND4J backend: every project needs one of these. The backend defines the hardware on which network training
will occur. "nd4j-native-platform" is for CPUs only (for running on all operating systems).
-->
<!-- <dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native</artifactId>
<version>${dl4j-master.version}</version>
</dependency>
-->
<!-- CUDA: to use GPU for training (CUDA) instead of CPU, uncomment this, and remove nd4j-native-platform -->
<!-- Requires CUDA to be installed to use. Change the version (8.0, 9.0, 9.1) to change the CUDA version -->
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-cuda-10.2-platform</artifactId>
<version>${dl4j-master.version}</version>
</dependency>
<!-- Optional, but recommended: if you use CUDA, also use CuDNN. To use this, CuDNN must also be installed -->
<!-- See: https://deeplearning4j.konduit.ai/config/backends/config-cudnn#using-deeplearning-4-j-with-cudnn -->
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-cuda-10.2</artifactId>
<version>${dl4j-master.version}</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>${logback.version}</version>
</dependency>
</dependencies>
Run as the main method, the print log is as follows, ND4J CUDA build version: 10.2.89, indicating that cuda10.2 has taken effect.
o.d.e.s.LeNetMNIST - Load data....
o.d.e.s.LeNetMNIST - Build model....
o.n.l.f.Nd4jBackend - Loaded [JCublasBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 32
o.n.n.Nd4jBlas - Number of threads used for OpenMP BLAS: 0
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [12]; Memory: [3.5GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [CUBLAS]
o.n.l.j.JCublasBackend - ND4J CUDA build version: 10.2.89
o.n.l.j.JCublasBackend - CUDA device 0: [GeForce GTX 1070 Ti]; cc: [6.1]; Total memory: [8589934592]
o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
o.d.n.l.c.ConvolutionLayer - Could not initialize CudnnConvolutionHelper
java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.initializeHelper(ConvolutionLayer.java:78)
Caused by: java.lang.UnsatisfiedLinkError: no jnicudnn in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1543)
at org.bytedeco.javacpp.Loader.load(Loader.java:1192)
In the above log, there is an exception, cudnn cannot be initialized, because cudnn has not been installed, and then cudnn can be installed.
4. cuDNN installation
Unzip cudnn-10.2-windows10-x64-v7.6.5.32.zip, the three folders shown in the figure below appear
Copy these three folders to the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2 directory. The cudnn file is added to the cuda installation directory
Run the test program again, the exception disappears, and the initialization of CudnnSubsamplingHelper and CudnnConvolutionHelper is successful
o.d.e.s.LeNetMNIST - Load data....
o.d.e.s.LeNetMNIST - Build model....
o.n.l.f.Nd4jBackend - Loaded [JCublasBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 32
o.n.n.Nd4jBlas - Number of threads used for OpenMP BLAS: 0
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [12]; Memory: [3.5GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [CUBLAS]
o.n.l.j.JCublasBackend - ND4J CUDA build version: 10.2.89
o.n.l.j.JCublasBackend - CUDA device 0: [GeForce GTX 1070 Ti]; cc: [6.1]; Total memory: [8589934592]
o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
o.d.n.l.c.ConvolutionLayer - CudnnConvolutionHelper successfully initialized
o.n.j.h.i.CudaZeroHandler - Creating bucketID: 5
o.d.n.l.c.s.SubsamplingLayer - CudnnSubsamplingHelper successfully initialized
o.d.n.l.c.ConvolutionLayer - CudnnConvolutionHelper successfully initialized
o.d.n.l.c.s.SubsamplingLayer - CudnnSubsamplingHelper successfully initialized
So far, cuda10.2 and cudnn7.6.5 have been successfully installed, and dl4j beta6 can run normally.
3. Performance comparison
Test program address: https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-template, the network structure is LeNet
Environment Description:
Operating system: Windows 10
CPU: i7 8700 3.2GHz 6 cores 12 threads
GPU: GTX 1070Ti
Memory: 16G
dl4j:beta6
compare results:
Operating environment | Time (ms) |
CPU | 26566 |
cuda9.2 | 20725 |
cuda9.2+cudnn7.2.1 | 12575 |
cuda10.2 | 19953 |
cuda10.2 + cudnn7.6.5 | 12574 |
Result description:
1. The running results of cuda9.2 and cuda10.2 are not much different
2. The running results of cuda9.2+cudnn7.2.1 and cuda10.2+cudnn7.6.5 are also similar
3. The operating efficiency of cudnn has been significantly improved
4. GPU with cudnn is 2 times more efficient than CPU
Special note: dl4j optimizes the following structure based on cudnn, as shown in the following figure:
Happiness comes from sharing.
This blog is original by the author, please indicate the source for reprinting.