"The Definitive Guide to CUDA C Programming" 03 - Summary of CUDA Small Functions

1. Timing

1.1 linux

#include <sys/time.h>

double cpuSecond() {
	struct timeval tp;
	gettimeofday(&tp, NULL);
	return ((double)tp.tv_sec + (double)tp.tv_usec*1e-6);
}

// 调用
double start = cpuSecond();
kernel_name << <grid, block >> > (argument list);
cudaDeviceSynchronize();  // 显示的使其同步。
double cost = cpuSecond() - start;

1.2 under windows

#include <time.h>

// 调用
time_t begin, end;
time(&begin);
kernel_name << <grid, block >> > (argument list);
time(&end);
time_t elapsed = end - begin;
printf("Time measured: %ld seconds.\n", elapsed);

2. nvprof utility

nvprof is a command-line analysis tool with many functions that can help to obtain timeline information from the application's CPU and GPU activities, including kernel execution, memory transfer and CUDA API calls. The details can be viewed through the following command.

nvprof --help

 (1) If the above command reports an error: the code cannot continue to be executed because cupti64_2022.2.1.dll cannot be found. . .

Reason: The nvprof tool is a plug-in, and its dll is in the directory: 

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\extras\CUPTI\lib64

 The system environment does not have this directory, so the dll library cannot be indexed.

Solution: Since C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7 \bin is in the system environment, you can copy cupti64_2022.2.1.dll to the bin directory.

(2) Question 2, if you run the compiled file and report an error: Cannot find compiler 'cl.exe' in PATH

nvcc kernel.cu -o kernel  // 编译

 Then add C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin to the system path.

Example of usage:

nvprof ./kernel

(1) Which operations can be seen, how many times have they been operated, what is the average, maximum, and minimum time spent, and what is the proportion of time spent;

(2) cudaMalloc takes the most time and has been run 3 times, the minimum time is 2us, the maximum time is 259ms, and the average time is 86ms;

to be continued. . .

Guess you like

Origin blog.csdn.net/jizhidexiaoming/article/details/132027586