CUDA:(二)对于Block, thread的简单理解

因为新型冠状病毒的影响,公司也通知延长假期并且在家办公一周,虽然今天还没有到,但是LZ也已经大门不出二门不迈了,生活还得继续不是嘛,但是基于这次疫情,LZ也做了点思考,很浅薄,但是希望之后自己能够做到。

1.凡事都要要最坏的打算和Plan B方案。虽然面对疫情,应该要充分相信前线的医护人员能够控制好疫情,但是作为后方的群众,也要做好囤聚粮食,为什么?减少去超市等人口密集的场所,以后家中养成习惯,至少准备两个月的口粮。哪怕不能出门,也能保证好自己和其他人员的安全。

2.家中尽量做到常备口罩,消毒液,创可贴,退烧药等常用医用品,为什么?一旦疫情爆发,家中仍然有储备,不会产生恐慌,其次,可以让工厂优先送往疫情较严重区域,减少口罩等物资的供货压力,尽量负责好前线就好了

3.什么蛇虫鼠蚁、珍禽猛兽真的不要再吃了,以前还没有这个意识,但是经过这次事件之后,万事万物都有存在的道理,那个蝙蝠都长那样了,咋还能下的去嘴,看了不瘆得慌嘛。诶,这个事件对这个国家多造成了极大的影响,还是要引起反思的,有些时候不逞能,在吃野生动物这个问题上,还是怂些的好。

吃一堑,长一智

回归正题了,最近也确实憋坏了,不吐不快/(ㄒoㄒ)/~~

什么叫block?什么叫thread呢?
可以先看一个代码

#include <cuda_runtime_api.h>
#include <iostream>
#include <cublas_v2.h>
#include <stdio.h>
__global__ void helloCUDA(void){
	// blockIdx.x gives the block number ot current kernel
	printf("Hello!!! I am thread %d in block: %d\n", threadIdx.x, blockIdx.x);
}

int main()
{
	helloCUDA<<<16, 8>>>();//分别表示block的数量,和每个block的线程数
	cudaDeviceSynchronize();
	printf("All threads are finnished!\n");

	return 0;
}

如果是单block多线程呢?

	helloCUDA<<<1, 128>>>();

输出结果:

Hello!!! I am thread 64 in block: 0
Hello!!! I am thread 65 in block: 0
Hello!!! I am thread 66 in block: 0
Hello!!! I am thread 67 in block: 0
Hello!!! I am thread 68 in block: 0
Hello!!! I am thread 69 in block: 0
Hello!!! I am thread 70 in block: 0
Hello!!! I am thread 71 in block: 0
Hello!!! I am thread 72 in block: 0
Hello!!! I am thread 73 in block: 0
Hello!!! I am thread 74 in block: 0
Hello!!! I am thread 75 in block: 0
Hello!!! I am thread 76 in block: 0
Hello!!! I am thread 77 in block: 0
Hello!!! I am thread 78 in block: 0
Hello!!! I am thread 79 in block: 0
Hello!!! I am thread 80 in block: 0
Hello!!! I am thread 81 in block: 0
Hello!!! I am thread 82 in block: 0
Hello!!! I am thread 83 in block: 0
Hello!!! I am thread 84 in block: 0
Hello!!! I am thread 85 in block: 0
Hello!!! I am thread 86 in block: 0
Hello!!! I am thread 87 in block: 0
Hello!!! I am thread 88 in block: 0
Hello!!! I am thread 89 in block: 0
Hello!!! I am thread 90 in block: 0
Hello!!! I am thread 91 in block: 0
Hello!!! I am thread 92 in block: 0
Hello!!! I am thread 93 in block: 0
Hello!!! I am thread 94 in block: 0
Hello!!! I am thread 95 in block: 0
Hello!!! I am thread 0 in block: 0
Hello!!! I am thread 1 in block: 0
Hello!!! I am thread 2 in block: 0
Hello!!! I am thread 3 in block: 0
Hello!!! I am thread 4 in block: 0
Hello!!! I am thread 5 in block: 0
Hello!!! I am thread 6 in block: 0
Hello!!! I am thread 7 in block: 0
Hello!!! I am thread 8 in block: 0
Hello!!! I am thread 9 in block: 0
Hello!!! I am thread 10 in block: 0
Hello!!! I am thread 11 in block: 0
Hello!!! I am thread 12 in block: 0
Hello!!! I am thread 13 in block: 0
Hello!!! I am thread 14 in block: 0
Hello!!! I am thread 15 in block: 0
Hello!!! I am thread 16 in block: 0
Hello!!! I am thread 17 in block: 0
Hello!!! I am thread 18 in block: 0
Hello!!! I am thread 19 in block: 0
Hello!!! I am thread 20 in block: 0
Hello!!! I am thread 21 in block: 0
Hello!!! I am thread 22 in block: 0
Hello!!! I am thread 23 in block: 0
Hello!!! I am thread 24 in block: 0
Hello!!! I am thread 25 in block: 0
Hello!!! I am thread 26 in block: 0
Hello!!! I am thread 27 in block: 0
Hello!!! I am thread 28 in block: 0
Hello!!! I am thread 29 in block: 0
Hello!!! I am thread 30 in block: 0
Hello!!! I am thread 31 in block: 0
...

可以看到什么呢?threadIdx.x是线程的idx,输出是一块连续的数,一块连续的数字,最直观的是index从0-31,说明这32个线程是一起执行的,啥意思呢?

线程块block是核函数的启动单位,每个线程块有多个线程,如上述代码设置,有一个block,每个block有128个线程,通常这128个线程还会被分成线程束,每个线程束有32个线程,这32个线程通过一个线程束来运行,所以线程束是程序执行的单位。

但是如果设置为:

	helloCUDA<<<128, 1>>>();

结果为:

Hello!!! I am thread 0 in block: 91
Hello!!! I am thread 0 in block: 107
Hello!!! I am thread 0 in block: 109
Hello!!! I am thread 0 in block: 68
Hello!!! I am thread 0 in block: 64
Hello!!! I am thread 0 in block: 23
Hello!!! I am thread 0 in block: 35
Hello!!! I am thread 0 in block: 71
Hello!!! I am thread 0 in block: 56
Hello!!! I am thread 0 in block: 72
Hello!!! I am thread 0 in block: 37
Hello!!! I am thread 0 in block: 79
Hello!!! I am thread 0 in block: 28
Hello!!! I am thread 0 in block: 34
Hello!!! I am thread 0 in block: 33
Hello!!! I am thread 0 in block: 55
Hello!!! I am thread 0 in block: 21
Hello!!! I am thread 0 in block: 86
Hello!!! I am thread 0 in block: 52
Hello!!! I am thread 0 in block: 57
Hello!!! I am thread 0 in block: 49
Hello!!! I am thread 0 in block: 78
Hello!!! I am thread 0 in block: 31
Hello!!! I am thread 0 in block: 25
Hello!!! I am thread 0 in block: 113
Hello!!! I am thread 0 in block: 111
Hello!!! I am thread 0 in block: 8
Hello!!! I am thread 0 in block: 66
...

则可以看到block的ID是乱序的。

如果是这样的呢?

	helloCUDA<<<4128>>>();

截取当中部分结果

Hello!!! I am thread 32 in block: 0
Hello!!! I am thread 33 in block: 0
Hello!!! I am thread 34 in block: 0
Hello!!! I am thread 35 in block: 0
Hello!!! I am thread 36 in block: 0
Hello!!! I am thread 37 in block: 0
Hello!!! I am thread 38 in block: 0
Hello!!! I am thread 39 in block: 0
Hello!!! I am thread 40 in block: 0
Hello!!! I am thread 41 in block: 0
Hello!!! I am thread 42 in block: 0
Hello!!! I am thread 43 in block: 0
Hello!!! I am thread 44 in block: 0
Hello!!! I am thread 45 in block: 0
Hello!!! I am thread 46 in block: 0
Hello!!! I am thread 47 in block: 0
Hello!!! I am thread 48 in block: 0
Hello!!! I am thread 49 in block: 0
Hello!!! I am thread 50 in block: 0
Hello!!! I am thread 51 in block: 0
Hello!!! I am thread 52 in block: 0
Hello!!! I am thread 53 in block: 0
Hello!!! I am thread 54 in block: 0
Hello!!! I am thread 55 in block: 0
Hello!!! I am thread 56 in block: 0
Hello!!! I am thread 57 in block: 0
Hello!!! I am thread 58 in block: 0
Hello!!! I am thread 59 in block: 0
Hello!!! I am thread 60 in block: 0
Hello!!! I am thread 61 in block: 0
Hello!!! I am thread 62 in block: 0
Hello!!! I am thread 63 in block: 0
Hello!!! I am thread 32 in block: 3
Hello!!! I am thread 33 in block: 3
Hello!!! I am thread 34 in block: 3
Hello!!! I am thread 35 in block: 3
Hello!!! I am thread 36 in block: 3
Hello!!! I am thread 37 in block: 3
Hello!!! I am thread 38 in block: 3
Hello!!! I am thread 39 in block: 3
Hello!!! I am thread 40 in block: 3
Hello!!! I am thread 41 in block: 3
Hello!!! I am thread 42 in block: 3
Hello!!! I am thread 43 in block: 3
Hello!!! I am thread 44 in block: 3
Hello!!! I am thread 45 in block: 3
Hello!!! I am thread 46 in block: 3
Hello!!! I am thread 47 in block: 3
Hello!!! I am thread 48 in block: 3
Hello!!! I am thread 49 in block: 3
Hello!!! I am thread 50 in block: 3
Hello!!! I am thread 51 in block: 3
Hello!!! I am thread 52 in block: 3
Hello!!! I am thread 53 in block: 3
Hello!!! I am thread 54 in block: 3
Hello!!! I am thread 55 in block: 3
Hello!!! I am thread 56 in block: 3
Hello!!! I am thread 57 in block: 3
Hello!!! I am thread 58 in block: 3
Hello!!! I am thread 59 in block: 3
Hello!!! I am thread 60 in block: 3
Hello!!! I am thread 61 in block: 3
Hello!!! I am thread 62 in block: 3
Hello!!! I am thread 63 in block: 3

对于图像处理,如果对每个像素点进行相同的操作,也许使用GPU的方式比较合适,但是如果对于极少数数据进行一些操作,有可能加上IO使用的时间,可能还不如直接在CPU上操作更方便些,OpenCV gpu版本的一些操作,接受的参数都是CPU上的参数,那么我直接inference得到的GPU上的结果还得copy到GPU上,不知道小伙伴们有没有什么比较好的方法呢?

最后说一句: 武汉加油!湖北加油!中国加油!

发布了300 篇原创文章 · 获赞 203 · 访问量 59万+

猜你喜欢

转载自blog.csdn.net/Felaim/article/details/104100147
今日推荐