Installing CUDA and cuDNN on Ubuntu and verifying installation steps
This tutorial details how to install CUDA (NVIDIA's parallel computing platform) and cuDNN (deep neural network library) on the Ubuntu operating system, and how to verify whether the installation was successful. By following these steps, you will be able to configure your system to take advantage of GPU acceleration for deep learning and other compute-intensive tasks. In addition, it also includes how to set environment variables and compile and run sample code to verify the normal operation of CUDA and cuDNN.
- Install CUDA
- Install CUDA via web repository (for Ubuntu)
- Configure environment variables
- Verify installation
- Install cuDNN
- Verify cuDNN
Install CUDA
Before installing CUDA, we need to perform some pre-installation operations. First, you need to install the header files and development packages for the currently running kernel. Open a terminal and execute the following command:
sudo apt-get install linux-headers-$(uname -r)
Next, you need to delete the outdated signing key:
sudo apt-key del 7fa2af80
Install CUDA via web repository (for Ubuntu)
The GPG public key for the new CUDA repository is 3bf863cc
. You can cuda-keyring
add it to your system via a package or manually, using apt-key
commands is not recommended. Follow these steps:
- Install new
cuda-keyring
packages. Replace according to your system version$distro/$arch
:
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
$distro/$arch
Replacement should be based on one of the following options:
ubuntu1604/x86_64
: Applicable to Ubuntu 16.04 64-bit version.ubuntu1804/cross-linux-sbsa
: Applicable to Ubuntu 18.04 cross-compiled version (SBSA architecture).ubuntu1804/ppc64el
: Applicable to Ubuntu 18.04 64-bit PowerPC architecture version.
*ubuntu1804/sbsa
: Applicable to Ubuntu 18.04 SBSA architecture version.ubuntu1804/x86_64
: Applicable to Ubuntu 18.04 64-bit version.ubuntu2004/cross-linux-aarch64
: Applicable to Ubuntu 20.04 cross-compiled version (AArch64 architecture).ubuntu2004/arm64
: Applicable to Ubuntu 20.04 64-bit ARM architecture version.ubuntu2004/cross-linux-sbsa
: Applicable to Ubuntu 20.04 cross-compiled version (SBSA architecture).ubuntu2004/sbsa
: Applicable to Ubuntu 20.04 SBSA architecture version.ubuntu2004/x86_64
: Applicable to Ubuntu 20.04 64-bit version.ubuntu2204/sbsa
: Applicable to Ubuntu 22.04 SBSA architecture version.ubuntu2204/x86_64
: Applicable to Ubuntu 22.04 64-bit version.
Select the appropriate alternative based on your Ubuntu version and architecture to perform the corresponding installation steps.
- Update Apt repository cache:
sudo apt-get update
- Install CUDA SDK:
You can get the list of available CUDA packages using the following command:
cat /var/lib/apt/lists/*cuda*Packages | grep "Package:"
Or check out the list below:
Meta Package | Purpose |
---|---|
cuda | Installs all CUDA Toolkit and Driver packages. Handles upgrading to the next version of the cuda package when it’s released. |
cuda-12-2 | Installs all CUDA Toolkit and Driver packages. Remains at version 12.1 until an additional version of CUDA is installed. |
cuda-toolkit-12-2 | Installs all CUDA Toolkit packages required to develop CUDA applications. Does not include the driver. |
cuda-toolkit-12 | Installs all CUDA Toolkit packages required to develop applications. Will not upgrade beyond the 12.x series toolkits. Does not include the driver. |
cuda-toolkit | Installs all CUDA Toolkit packages required to develop applications. Handles upgrading to the next 12.x version of CUDA when it’s released. Does not include the driver. |
cuda-tools-12-2 | Installs all CUDA command line and visual tools. |
cuda-runtime-12-2 | Installs all CUDA Toolkit packages required to run CUDA applications, as well as the Driver packages. |
cuda-compiler-12-2 | Installs all CUDA compiler packages. |
cuda-libraries-12-2 | Installs all runtime CUDA Library packages. |
cuda-libraries-dev-12-2 | Installs all development CUDA Library packages. |
cuda-drivers | Installs all Driver packages. Handles upgrading to the next version of the Driver packages when they’re released. |
Select the package you need to install, here select cuda-11.8
sudo apt-get install cuda-11-8
This installation package contains the graphics card driver. During the installation process, you will be asked to enter a password. Please remember this password, which will be used later when you restart the computer and enter the Perform MOK managment .
- After the installation is complete, restart the system:
sudo reboot
Configure Perform MOK managment
Select Enroll MOK
(register) -> Select Continue
-> Select Enroll the key
-> Select Yes
-> Type the password entered in step 3 -> Select Reboot
Restart the computer to complete the NVIDIA graphics card driver installation.
Configure environment variables
- Use to
vim
edit~/.bashrc
the file.
sudo vim ~/.bashrc
- Add the following at the end of the file:
export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
${PATH:+:${PATH}}
Is a special syntax in the Bash Shell used to set environment variables. What it does is that when adding a new path to an environment variable, it ensures that if the original variable (in this case$PATH
) already contains some paths, then the new path is added at the end of the original path and they are separated by a colon: .
Specifically,${PATH:+:${PATH}}
the meaning is:
if$PATH
is already defined (non-empty), then it prepends the new path with a colon: before adding the new path.
If$PATH
is undefined or empty, it will just add the new path without adding a colon.
The purpose of this syntax is to ensure that when$PATH
adding new paths to , you keep the paths separated by colons to ensure proper formatting of environment variables. This is useful in setting many environment variables, as it avoids errors caused by missing separators between paths.
LD_LIBRARY_PATH is an environment variable that specifies the path where the dynamic linker searches for shared library files (dynamic link libraries or shared object files) when running an executable file. In Linux and Unix-like systems, shared library files are included in various programs, allowing multiple programs to share the same library, thereby reducing memory usage and improving the efficiency of the system.
- Refresh the configuration
Run the following command in the terminal to make the new environment variable settings take effect:
source ~/.bashrc
Verify installation
First, we need to install some third-party libraries required by the CUDA examples. These examples will usually detect the required libraries during the build process, but if they are not detected, you will need to install them manually. Open a terminal and execute the following command:
sudo apt-get install g++ freeglut3-dev build-essential libx11-dev \
libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev
After completing the installation of third-party library dependencies, download the https://github.com/nvidia/cuda-samples source code from github.
After the download is complete, you can compile it using the following command:
cd cuda-sample
sudo make
Pay attention to switch to the branch where you installed the cuda version, here it is v11.8.
If the entire compilation can be completed, it means there is no problem in the installation process.
Execute ./bin/x86_64/linux/release/deviceQuery
the command in the source code directory, the results are as follows:
cheungxiongwei@root:~/Source/cuda-samples$ ./bin/x86_64/linux/release/deviceQuery
./bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"
CUDA Driver Version / Runtime Version 12.2 / 11.8
CUDA Capability Major/Minor version number: 8.9
Total amount of global memory: 7940 MBytes (8325824512 bytes)
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
(024) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores
GPU Max Clock rate: 2250 MHz (2.25 GHz)
Memory Clock rate: 8001 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 33554432 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 102400 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 1
Result = PASS
Install cuDNN
Install cuDNN library and cuDNN example
sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-samples=${cudnn_version}-1+${cuda_version}
Replace according to:
${cudnn_version}
is 8.9.4.*
${cuda_version}
is cuda12.2 or cuda11.8
Use the following command to find package information related to cuDNN version "libcudnn8"
cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"
The output is as follows:
cheungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"
Filename: ./libcudnn8_8.5.0.96-1+cuda11.7_amd64.deb
Filename: ./libcudnn8-dev_8.5.0.96-1+cuda11.7_amd64.deb
Filename: ./libcudnn8_8.6.0.163-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.6.0.163-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.7.0.84-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.7.0.84-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.0.121-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.0.121-1+cuda12.0_amd64.deb
Filename: ./libcudnn8-dev_8.8.0.121-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.8.0.121-1+cuda12.0_amd64.deb
Filename: ./libcudnn8_8.8.1.3-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb
Filename: ./libcudnn8-dev_8.8.1.3-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
Filename: ./libcudnn8_8.9.0.131-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.0.131-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.0.131-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.0.131-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.4.25-1+cuda12.2_amd64.deb
Filename: ./libcudnn8-dev_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.4.25-1+cuda12.2_amd64.deb
Filename: ./libcudnn8-samples_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.4.25-1+cuda12.2_amd64.deb
Select the latest one here cudnn 8.9.4.25
and cuda 11.8
replace with and . The complete instruction after replacement is as follows:
sudo apt-get install libcudnn8=8.9.4.25-1+cuda11.8
sudo apt-get install libcudnn8-dev=8.9.4.25-1+cuda11.8
sudo apt-get install libcudnn8-samples=8.9.4.25-1+cuda11.8
Verify cuDNN
To verify that cuDNN is installed and functioning properly, compile the mnistCUDNN samples in the `/usr/src/cudnn_samples_v8` directory.
- Copy the cuDNN example to the current user directory
cp -r /usr/src/cudnn_samples_v8/ $HOME
- Move to cuDNN examples directory
cd $HOME/cudnn_samples_v8/mnistCUDNN
- Compile cuDNN mnisiCUDNN example
$make clean && make
If an error is reported and the FreeImage.h file is not found, please execute the `sudo apt-get install libfreeimage-dev` command to install the dependency.
- Run the mnistCUDNN example
./mnistCUDNN
If cuDNN is correctly installed and compiled & run on your Linux system, you will see a message similar to the following:
heungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8904 , CUDNN_VERSION from cudnn.h : 8904 (8.9.4)
Host compiler version : GCC 11.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 24 Capabilities 8.9, SmClock 2250.0 Mhz, MemSize (Mb) 7940, MemClock 8001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.032992 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.047104 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.051200 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.049152 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.058368 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063648 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.130112 time requiring 128848 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.007328 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.024576 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.026624 time requiring 178432 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025376 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.030720 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.036864 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063488 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.021504 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.022592 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.025600 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.033792 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.074752 time requiring 4608 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.031744 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.040960 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.069632 time requiring 1536 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.009216 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.012288 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.021312 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.023552 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.024352 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.029696 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.035840 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.065536 time requiring 1536 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!