Upgrade from cuda9.2 to cuda10.2 under windows10

1. Main content of the article

    1. Upgrade from cuda9.2 to cuda10.2 under Windows10

    2. Installation of cudnn under Windows 10

    3. Performance comparison of cuda9.2, cuda10.2, cudnn7.2.1, and cudnn7.6.5 in DeepLearning4j

Second, the installation process

    1. Machine environment description:

     CPU: i7 8700 6 cores 12 threads

     GPU: GTX 1070Ti

     Memory: 16G

    Note: cuda and cudnn have been installed in the machine, the versions are: cuda9.2.148, cudnn7.2.1

     

    2. Preparation of installation package

    (1), cuda download

    cuda download address: https://developer.nvidia.com/cuda-toolkit-archive, the download version here is: 10.2, because dl4j only supports up to 10.2

    

    The installation machine is windows10 64-bit, select the 64-bit version of cuda win10, the installation mode selects local, and download the installation package to the local installation

    (2), cuDNN download

    Download address: https://developer.nvidia.com/rdp/cudnn-archive, here select the latest cudnn version matching cuda10.2: 7.6.5, the reason why 8.0.2 is not selected here is that dl4j-beta6 does not support 8 .x version.

    

    The downloaded installation package is as follows:

    

    3. Install cuda10.2

    Directly follow the default installation path, the next step, agree and continue -> streamlined installation

    

    enter the installation phase

    

    After the installation, under the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA directory, the v10.2 folder appears, as shown below:

    

    Note: There is v9.2 here, which is the previously installed cuda version, set the environment variables, you can also switch to the 9.2 version

    Open the cmd console, enter the nvcc --version command and press Enter, as shown in the figure below, indicating that version 10.2 is successfully installed

    

    At the same time, use the latest example test of dl4j to verify whether cuad10.2 is available, example address: https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-template

   Modify the maven dependency to the following configuration, in which the dl4j-master.version is modified to: 1.0.0-beta6, and the nd4j-cuda-10.2-platform and deeplearning4j-cuda-10.2 dependencies are added

<properties>
        <dl4j-master.version>1.0.0-beta6</dl4j-master.version>
        <logback.version>1.2.3</logback.version>
        <java.version>1.8</java.version>
        <maven-shade-plugin.version>2.4.3</maven-shade-plugin.version>
    </properties>


    <dependencies>
        <!-- deeplearning4j-core: contains main functionality and neural networks -->
        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>

        <!--
        ND4J backend: every project needs one of these. The backend defines the hardware on which network training
        will occur. "nd4j-native-platform" is for CPUs only (for running on all operating systems).
        -->
        <!-- <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-native</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency> 
        -->
        <!-- CUDA: to use GPU for training (CUDA) instead of CPU, uncomment this, and remove nd4j-native-platform -->
        <!-- Requires CUDA to be installed to use. Change the version (8.0, 9.0, 9.1) to change the CUDA version -->
        
         <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-10.2-platform</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency> 
       

        <!-- Optional, but recommended: if you use CUDA, also use CuDNN. To use this, CuDNN must also be installed -->
        <!-- See: https://deeplearning4j.konduit.ai/config/backends/config-cudnn#using-deeplearning-4-j-with-cudnn -->
        
         <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-cuda-10.2</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency> 
       

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>${logback.version}</version>
        </dependency>
    </dependencies>

     Run as the main method, the print log is as follows, ND4J CUDA build version: 10.2.89, indicating that cuda10.2 has taken effect.

o.d.e.s.LeNetMNIST - Load data....
o.d.e.s.LeNetMNIST - Build model....
o.n.l.f.Nd4jBackend - Loaded [JCublasBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 32
o.n.n.Nd4jBlas - Number of threads used for OpenMP BLAS: 0
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [12]; Memory: [3.5GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [CUBLAS]
o.n.l.j.JCublasBackend - ND4J CUDA build version: 10.2.89
o.n.l.j.JCublasBackend - CUDA device 0: [GeForce GTX 1070 Ti]; cc: [6.1]; Total memory: [8589934592]
o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
o.d.n.l.c.ConvolutionLayer - Could not initialize CudnnConvolutionHelper
java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.initializeHelper(ConvolutionLayer.java:78)
Caused by: java.lang.UnsatisfiedLinkError: no jnicudnn in java.library.path
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
	at java.lang.Runtime.loadLibrary0(Runtime.java:870)
	at java.lang.System.loadLibrary(System.java:1122)
	at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1543)
	at org.bytedeco.javacpp.Loader.load(Loader.java:1192)

    In the above log, there is an exception, cudnn cannot be initialized, because cudnn has not been installed, and then cudnn can be installed.

    4. cuDNN installation

    Unzip cudnn-10.2-windows10-x64-v7.6.5.32.zip, the three folders shown in the figure below appear

    

    Copy these three folders to the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2 directory. The cudnn file is added to the cuda installation directory

    

    Run the test program again, the exception disappears, and the initialization of CudnnSubsamplingHelper and CudnnConvolutionHelper is successful

o.d.e.s.LeNetMNIST - Load data....
o.d.e.s.LeNetMNIST - Build model....
o.n.l.f.Nd4jBackend - Loaded [JCublasBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 32
o.n.n.Nd4jBlas - Number of threads used for OpenMP BLAS: 0
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [12]; Memory: [3.5GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [CUBLAS]
o.n.l.j.JCublasBackend - ND4J CUDA build version: 10.2.89
o.n.l.j.JCublasBackend - CUDA device 0: [GeForce GTX 1070 Ti]; cc: [6.1]; Total memory: [8589934592]
o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
o.d.n.l.c.ConvolutionLayer - CudnnConvolutionHelper successfully initialized
o.n.j.h.i.CudaZeroHandler - Creating bucketID: 5
o.d.n.l.c.s.SubsamplingLayer - CudnnSubsamplingHelper successfully initialized
o.d.n.l.c.ConvolutionLayer - CudnnConvolutionHelper successfully initialized
o.d.n.l.c.s.SubsamplingLayer - CudnnSubsamplingHelper successfully initialized

    So far, cuda10.2 and cudnn7.6.5 have been successfully installed, and dl4j beta6 can run normally.

3. Performance comparison

    Test program address: https://github.com/eclipse/deeplearning4j-examples/tree/master/mvn-project-template, the network structure is LeNet

    Environment Description:

       Operating system: Windows 10

       CPU: i7 8700 3.2GHz 6 cores 12 threads

       GPU: GTX 1070Ti

       Memory: 16G

       dl4j:beta6

     compare results:

                   Operating environment                                    Time (ms)                       
                     CPU             26566
                    cuda9.2             20725
         cuda9.2+cudnn7.2.1             12575
                    cuda10.2             19953
           cuda10.2 + cudnn7.6.5             12574

     Result description:

     1. The running results of cuda9.2 and cuda10.2 are not much different

     2. The running results of cuda9.2+cudnn7.2.1 and cuda10.2+cudnn7.6.5 are also similar

     3. The operating efficiency of cudnn has been significantly improved

     4. GPU with cudnn is 2 times more efficient than CPU

   Special note: dl4j optimizes the following structure based on cudnn, as shown in the following figure:

    

 

Happiness comes from sharing.

   This blog is original by the author, please indicate the source for reprinting.

 

 

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324125757&siteId=291194637
Recommended