Installing CUDA and cuDNN on Ubuntu and verifying installation steps

Installing CUDA and cuDNN on Ubuntu and verifying installation steps

This tutorial details how to install CUDA (NVIDIA's parallel computing platform) and cuDNN (deep neural network library) on the Ubuntu operating system, and how to verify whether the installation was successful. By following these steps, you will be able to configure your system to take advantage of GPU acceleration for deep learning and other compute-intensive tasks. In addition, it also includes how to set environment variables and compile and run sample code to verify the normal operation of CUDA and cuDNN.

Install CUDA

Before installing CUDA, we need to perform some pre-installation operations. First, you need to install the header files and development packages for the currently running kernel. Open a terminal and execute the following command:

sudo apt-get install linux-headers-$(uname -r)

Next, you need to delete the outdated signing key:

sudo apt-key del 7fa2af80

Install CUDA via web repository (for Ubuntu)

The GPG public key for the new CUDA repository is 3bf863cc. You can cuda-keyringadd it to your system via a package or manually, using apt-keycommands is not recommended. Follow these steps:

  1. Install new cuda-keyringpackages. Replace according to your system version $distro/$arch:
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb

$distro/$archReplacement should be based on one of the following options:

  • ubuntu1604/x86_64: Applicable to Ubuntu 16.04 64-bit version.
  • ubuntu1804/cross-linux-sbsa: Applicable to Ubuntu 18.04 cross-compiled version (SBSA architecture).
  • ubuntu1804/ppc64el: Applicable to Ubuntu 18.04 64-bit PowerPC architecture version.
    * ubuntu1804/sbsa: Applicable to Ubuntu 18.04 SBSA architecture version.
  • ubuntu1804/x86_64: Applicable to Ubuntu 18.04 64-bit version.
  • ubuntu2004/cross-linux-aarch64: Applicable to Ubuntu 20.04 cross-compiled version (AArch64 architecture).
  • ubuntu2004/arm64: Applicable to Ubuntu 20.04 64-bit ARM architecture version.
  • ubuntu2004/cross-linux-sbsa: Applicable to Ubuntu 20.04 cross-compiled version (SBSA architecture).
  • ubuntu2004/sbsa: Applicable to Ubuntu 20.04 SBSA architecture version.
  • ubuntu2004/x86_64: Applicable to Ubuntu 20.04 64-bit version.
  • ubuntu2204/sbsa: Applicable to Ubuntu 22.04 SBSA architecture version.
  • ubuntu2204/x86_64: Applicable to Ubuntu 22.04 64-bit version.
    Select the appropriate alternative based on your Ubuntu version and architecture to perform the corresponding installation steps.
  1. Update Apt repository cache:
sudo apt-get update
  1. Install CUDA SDK:
    You can get the list of available CUDA packages using the following command:
cat /var/lib/apt/lists/*cuda*Packages | grep "Package:"

Or check out the list below:

Meta Package Purpose
cuda Installs all CUDA Toolkit and Driver packages. Handles upgrading to the next version of the cuda package when it’s released.
cuda-12-2 Installs all CUDA Toolkit and Driver packages. Remains at version 12.1 until an additional version of CUDA is installed.
cuda-toolkit-12-2 Installs all CUDA Toolkit packages required to develop CUDA applications. Does not include the driver.
cuda-toolkit-12 Installs all CUDA Toolkit packages required to develop applications. Will not upgrade beyond the 12.x series toolkits. Does not include the driver.
cuda-toolkit Installs all CUDA Toolkit packages required to develop applications. Handles upgrading to the next 12.x version of CUDA when it’s released. Does not include the driver.
cuda-tools-12-2 Installs all CUDA command line and visual tools.
cuda-runtime-12-2 Installs all CUDA Toolkit packages required to run CUDA applications, as well as the Driver packages.
cuda-compiler-12-2 Installs all CUDA compiler packages.
cuda-libraries-12-2 Installs all runtime CUDA Library packages.
cuda-libraries-dev-12-2 Installs all development CUDA Library packages.
cuda-drivers Installs all Driver packages. Handles upgrading to the next version of the Driver packages when they’re released.

Select the package you need to install, here select cuda-11.8

sudo apt-get install cuda-11-8

This installation package contains the graphics card driver. During the installation process, you will be asked to enter a password. Please remember this password, which will be used later when you restart the computer and enter the Perform MOK managment .

  1. After the installation is complete, restart the system:
sudo reboot

Configure Perform MOK managment
MOK management
Select Enroll MOK(register) -> Select Continue-> Select Enroll the key-> Select Yes-> Type the password entered in step 3 -> Select RebootRestart the computer to complete the NVIDIA graphics card driver installation.

Configure environment variables

  1. Use to vimedit ~/.bashrcthe file.
sudo vim ~/.bashrc
  1. Add the following at the end of the file:
export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

${PATH:+:${PATH}}Is a special syntax in the Bash Shell used to set environment variables. What it does is that when adding a new path to an environment variable, it ensures that if the original variable (in this case $PATH) already contains some paths, then the new path is added at the end of the original path and they are separated by a colon: .
Specifically, ${PATH:+:${PATH}}the meaning is:
if $PATHis already defined (non-empty), then it prepends the new path with a colon: before adding the new path.
If $PATHis undefined or empty, it will just add the new path without adding a colon.
The purpose of this syntax is to ensure that when $PATHadding new paths to , you keep the paths separated by colons to ensure proper formatting of environment variables. This is useful in setting many environment variables, as it avoids errors caused by missing separators between paths.

LD_LIBRARY_PATH is an environment variable that specifies the path where the dynamic linker searches for shared library files (dynamic link libraries or shared object files) when running an executable file. In Linux and Unix-like systems, shared library files are included in various programs, allowing multiple programs to share the same library, thereby reducing memory usage and improving the efficiency of the system.

  1. Refresh the configuration
    Run the following command in the terminal to make the new environment variable settings take effect:
source ~/.bashrc

Verify installation

First, we need to install some third-party libraries required by the CUDA examples. These examples will usually detect the required libraries during the build process, but if they are not detected, you will need to install them manually. Open a terminal and execute the following command:

sudo apt-get install g++ freeglut3-dev build-essential libx11-dev \
    libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev

After completing the installation of third-party library dependencies, download the https://github.com/nvidia/cuda-samples source code from github.

After the download is complete, you can compile it using the following command:

cd cuda-sample
sudo make

Pay attention to switch to the branch where you installed the cuda version, here it is v11.8.

If the entire compilation can be completed, it means there is no problem in the installation process.

Execute ./bin/x86_64/linux/release/deviceQuerythe command in the source code directory, the results are as follows:

cheungxiongwei@root:~/Source/cuda-samples$ ./bin/x86_64/linux/release/deviceQuery
./bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"
  CUDA Driver Version / Runtime Version          12.2 / 11.8
  CUDA Capability Major/Minor version number:    8.9
  Total amount of global memory:                 7940 MBytes (8325824512 bytes)
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
  (024) Multiprocessors, (128) CUDA Cores/MP:    3072 CUDA Cores
  GPU Max Clock rate:                            2250 MHz (2.25 GHz)
  Memory Clock rate:                             8001 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 33554432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 1
Result = PASS

Install cuDNN

Install cuDNN library and cuDNN example

sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-samples=${cudnn_version}-1+${cuda_version}

Replace according to:
${cudnn_version}is 8.9.4.*
${cuda_version}is cuda12.2 or cuda11.8

Use the following command to find package information related to cuDNN version "libcudnn8"

cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"

The output is as follows:

cheungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"
Filename: ./libcudnn8_8.5.0.96-1+cuda11.7_amd64.deb
Filename: ./libcudnn8-dev_8.5.0.96-1+cuda11.7_amd64.deb
Filename: ./libcudnn8_8.6.0.163-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.6.0.163-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.7.0.84-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.7.0.84-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.0.121-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.0.121-1+cuda12.0_amd64.deb
Filename: ./libcudnn8-dev_8.8.0.121-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.8.0.121-1+cuda12.0_amd64.deb
Filename: ./libcudnn8_8.8.1.3-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb
Filename: ./libcudnn8-dev_8.8.1.3-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
Filename: ./libcudnn8_8.9.0.131-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.0.131-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.0.131-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.0.131-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.4.25-1+cuda12.2_amd64.deb
Filename: ./libcudnn8-dev_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.4.25-1+cuda12.2_amd64.deb
Filename: ./libcudnn8-samples_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.4.25-1+cuda12.2_amd64.deb

Select the latest one here cudnn 8.9.4.25and cuda 11.8replace with and . The complete instruction after replacement is as follows:

sudo apt-get install libcudnn8=8.9.4.25-1+cuda11.8
sudo apt-get install libcudnn8-dev=8.9.4.25-1+cuda11.8
sudo apt-get install libcudnn8-samples=8.9.4.25-1+cuda11.8

Verify cuDNN

To verify that cuDNN is installed and functioning properly, compile the mnistCUDNN samples in the `/usr/src/cudnn_samples_v8` directory.

  1. Copy the cuDNN example to the current user directory
cp -r /usr/src/cudnn_samples_v8/ $HOME
  1. Move to cuDNN examples directory
cd  $HOME/cudnn_samples_v8/mnistCUDNN
  1. Compile cuDNN mnisiCUDNN example
$make clean && make

If an error is reported and the FreeImage.h file is not found, please execute the `sudo apt-get install libfreeimage-dev` command to install the dependency.

  1. Run the mnistCUDNN example
 ./mnistCUDNN

If cuDNN is correctly installed and compiled & run on your Linux system, you will see a message similar to the following:

heungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8904 , CUDNN_VERSION from cudnn.h : 8904 (8.9.4)
Host compiler version : GCC 11.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 24  Capabilities 8.9, SmClock 2250.0 Mhz, MemSize (Mb) 7940, MemClock 8001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.032992 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.047104 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.051200 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.049152 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.058368 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063648 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.130112 time requiring 128848 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.007328 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.024576 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.026624 time requiring 178432 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025376 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.030720 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.036864 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063488 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.021504 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.022592 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.025600 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.033792 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.074752 time requiring 4608 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.031744 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.040960 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.069632 time requiring 1536 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.009216 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.012288 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.021312 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.023552 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.024352 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.029696 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.035840 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.065536 time requiring 1536 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Guess you like

Origin blog.csdn.net/cheungxiongwei/article/details/132655076