GpuDriven in alignment

In recent gpu driven to achieve things in alignment and padding above step on some pit;
start with the conclusion, if it is tried less stepped pit, in the design of data structures when they are guaranteed to be able to make the pad problem 16byte (float4) multiples less very much.

Here a list of a few points, there are no clear documentation (might have, but did not see the conventional document)

  1. gpu read resource when forced to do align operations, data will result in a non 16byte errors
struct InstData
{
vec4 pos;
vec3 scale;
};
//使用InstData的const buffer,在每个instance读取的时候,地址会align(16 byte),导致数据读取错误;
//这样会好:
struct InstData
{
vec4 pos;
vec3 scale;
float padding;
};

  1. Note the difference between padding and c ++ shader different classes: the following data is read if the cpu gpu InstData class, the data will appear as dislocation padding
//cpp:
struct InstData
{
	u64 a;
	u32 b;
}

//gpu:
struct InstData
{
	UINT3 a;
}

3, structued buffer of non-16byte align cause data across cache line, resulting in reduced performance:
https://developer.nvidia.com/content/understanding-structured-buffer-performance

Published 780 original articles · won praise 460 · Views 1.65 million +

Guess you like

Origin blog.csdn.net/ccanan/article/details/104288333