VS2017 CUDA编程学习1:CUDA编程两变量加法运算


前言

今天开始学习CUDA编程,在这里当成笔记,分享给大家


1. CUDA编程基础

(1)CUDA编程需要将CPU上数据拷贝到GPU显存上,然后在GPU上进行高效计算并得到计算结果,最终将结果拷贝到CPU上打印;
(2)CUDA编程区分GPU函数(设备代码)和CPU上函数(主机代码)是通过关键字 “_global_”;
(3)使用cuda API cudaMalloc(), cudaFree()在GPU显存上分配释放内存空间,类似CPU上Malloc()和Free();cudaMemcpy()将数据在CPU(host)和GPU(device)间拷贝,类似于CPU上的Memcpy();
(4)调用GPU上函数时需要指定模块(block)和每个模块上线程数(thread), 被称为配置内核参数, 比如 gpuAdd<<<1,1>>>(…),这里表示1个模块,每个模块执行1个线程。
(5)内核调用:主机代码(host, CPU)调用设备代码(device, GPU)

2. CUDA编程实现两变量加法运算

#include <stdio.h>
#include <cuda.h>
#include<cuda_runtime.h>

// define kernel function to add two variable
__global__ void gpuAdd(int d_a, int d_b, int* d_c)
{
    
    
	*d_c = d_a + d_b;
}

int main()
{
    
    
	//define host(cpu) variable to store answer
	int h_c;

	//define device(gpu) variable pointer to store answer
	int *d_c;
	//Allocating memory for device pointer
	cudaMalloc((void**)&d_c, sizeof(int));

	//kernel call by passing 1 and 4 as inputs and storing answer in d_c
	//<<<1,1>>> means 1 block is executed with 1 thread per block
	gpuAdd << <1, 1 >> > (1, 4, d_c);

	//copy result from device memory to host memory
	cudaMemcpy(&h_c, d_c, sizeof(int), cudaMemcpyDeviceToHost);

	//print result
	printf("1+4=%d\n", h_c);

	// free device memory
	cudaFree(d_c);

	system("pause");
	return 0;
}

总结

今天开始进入cuda编程的世界,感觉还不错,继续努力!

参考

《基于GPU加速的计算机视觉编程》

Guess you like

Origin blog.csdn.net/DU_YULIN/article/details/120641897