Efficiency of CUDA vector types (float2, float3, float4)

Efficiency of CUDA vector types (float2, float3, float4)

https://stackoverflow.com/questions/26676806/efficiency-of-cuda-vector-types-float2-float3-float4


I’m expanding njuffa’s comment into a worked example. In that example, I’m simply adding two arrays in three different ways: loading the data as float, float2 or float4.

These are the timings on a GT540M and on a Kepler K20c card:

GT540M
float  - Elapsed time:  74.1 ms
float2 - Elapsed time:  61.0 ms
float4 - Elapsed time:  56.1 ms

Kepler K20c
float  - Elapsed time:  4.4 ms 
float2 - Elapsed time:  3.3 ms 
float4 - Elapsed time:  3.2 ms

As it can be seen, loading the data as float4 is the fastest approach.

猜你喜欢

转载自blog.csdn.net/fb_help/article/details/81219944
今日推荐