VS2022 CUDA environment configuration

Installation preparation

Configuring the Cuda environment is mainly divided into the following steps

  1. Install VS This should go without saying, just install the latest version
  2. Install CUDA download address: Cuda Toolkit
  3. Install cuDNN download address: cuDNN archieve

This installation sequence is very important, it must be installed first VS and then CUDA Toolkit, otherwise CUDA cannot create project templates in the VS directory in advance.

In order to confirm the installed version of Cuda, you can enter it on the command line nvidia-smi. The return value is as follows. The version in the header CUDA Versionis Cudathe version. Here I am 12.1, so choose CUDA Toolkit 12.1.0, cuDNNchoose the latestv8.8.1 for CUDA 12.x

>nvidia-smi
Tue Apr 25 11:52:50 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 531.14                 Driver Version: 531.14       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                      TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 L...  WDDM | 00000000:01:00.0  On |                  N/A |
| N/A   36C    P8                3W /  N/A|    250MiB /  8188MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     10556    C+G   ...auncher\PowerToys.PowerLauncher.exe    N/A      |
|    0   N/A  N/A     10980    C+G   ...rPicker\PowerToys.ColorPickerUI.exe    N/A      |
+---------------------------------------------------------------------------------------+

Cuda toolkitThe first step in the installation is to decompress, you can choose a temporary address, and the NVIDIA安装程序actual installation will not start until the interface pops up. During the installation process, you can choose the installation path by yourself, and there is nothing left to say. After the installation is successful, enter the version information 自定义in the command line .nvcc -V

>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

cuDNNRegistration is required during installation. After the download is complete, unzip the three folders in the compressed package to the Cuda installation directory, and then you can use it.

New Project

After the installation is successful, open VS2022, you can see CUDA12.1the project template, click in and you can create CUDAthe program

insert image description here

CUDA, or the programming logic of the GPU is to write the array in the memory into the video memory first, and then perform the operation.

Its built-in template program is very simple, that is, to sum two arrays in parallel, using a custom function

__global__ void addKernel(int *c, const int *a, const int *b)
{
    
    
    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

But in the specific call, this syntax is used <<<>>>to indicate the thread block called by the GPU. When the sample program calls the custom function, the addKernelfollowing code is used, which means that 1 thread block is called, and each thread block has 5 threads.

//size=5
addKernel<<<1, size>>>(dev_c, dev_a, dev_b);

Guess you like

Origin blog.csdn.net/m0_37816922/article/details/130364769