GPU High Performance Programming CUDA : Introduction to CUDA C


There is no difference between writing a hello world first and C

We call the CPU and the memory of the system the host, and the GPU and its memory as the device. The hello world sample program does not consider any computing devices other than the host.

Kernel function call

__global__ void kernel ( void ) {

}

int main(){

kernel<<1,1>>();

}

kernel() function with the modifier __global__, which tells the compiler that the function should run on the device instead of the host, and the main function is handed over to the host compiler

Why does the call to kernel() need angle brackets and a value

CUDA C requires some syntax to mark a function as "device code", which means sending host code to one compiler and device code to another compiler. The CUDA compiler is responsible for calling device code from host code at runtime

So this function call actually means calling the device code, the angle brackets means passing some parameters to the runtime system, these parameters are not parameters passed to the device code, but tell the runtime how to start the device code, the parameters passed to the device code is enclosed in parentheses

There is nothing special about the process of passing parameters to the kernel function, except for the angle bracket syntax, the appearance of the kernel function is the same as any function call in standard C. At runtime, the system is responsible for all processes of passing parameters from the host to the device.

When the device performs any useful operation, it needs to allocate memory, such as returning the calculated value to the host, to allocate memory through cudaMalloc(), this function tells the CUDA runtime to allocate memory on the device, the first parameter is a pointer, Pointer to hold the address of the newly allocated memory, the second parameter is the size of the allocated memory. This function behaves the same as malloc() except that the pointer to allocate memory is not used as the return value of the function, and returns void*


dev_c is a pointer on the host, that is, the address of the pointer is stored on the host, and the pointer points to an allocated address on the device. The statement *c=a+b represents adding the values ​​of parameters a and b, and the result is saved in the memory pointed to by c


You cannot dereference the pointer returned by cudaMalloc() in the host code. You can pass parameters to this pointer and perform arithmetic operations on it, but you cannot use this pointer to read or write memory.

cudaMemcpy() copies memory, similar to memcpy() in C, except that there is one more parameter to specify whether the device memory pointer is a source pointer or a target pointer, cudaMemcpyDevicesToHost, the source pointer is the device pointer, and the target pointer is the host pointer , why use the address of a variable instead of a pointer for parameter one

The limitations of using device pointers are as follows:

Pass a pointer allocated by cudaMalloc() to a function executed on the device

Read/write memory using pointers allocated by cudaMalloc() in device code

Pass a pointer allocated by cudaMalloc() to a function executing on the host

Cannot read/write memory using a pointer allocated by cudaMalloc() in host code

Use cudaFree() to free memory allocated by cudaMalloc()

device query

int count;
cudaGetDeviceCount(&count)

Returns the number of devices

cudaDeviceProp prop;
cudaGetDevicesPropertise(&prop,0)

Returns the properties of the zeroth device







 
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324850120&siteId=291194637
Recommended