Getting to Know CUDA
1. Heterogeneous Computing
1.host CPU and memory
2.Device GPU and video memory
2. View of CUDA
General graphics cards, servers use nvidia-smi to view related parameters
jetson devices use jtop to view related parameters
3. Programming
1. Process the data with the CPU and copy it to the GPU
2. Execute the chip cache data, load the gpu program and execute it
3. Copy the calculation result from the GPU memory to the CPU memory
keywords:
__global__
Declare the function as the kernel, execute it on the device, and call it on the device
__device__
Execution space specifier, declare a function, execute on device, call on host and device
__host__
Declare a function, execute and call it on the host
CUDA-written
int main() is executed on the host
__global__
execute on device
Compilation of CUDA programs
cuda compilation uses nvcc
to compile from .cu to .o, and then compile from .o to an executable file
NVPROF
Analysis tools
Analysis commands:
nvprof -o out.nvvp a.exe