[Memory Alignment] An article will take you to understand memory alignment (detailed introduction + code samples)

Table of contents

Why do you need memory alignment

performance

scope

atomicity

in conclusion

data model

C++ memory alignment 

signature request

ordinary class

Standard Layout Class

Summary of trivial and standard layout classes

Memory alignment for standard layout classes

Ordinary standard layout class

Standard layout class with bitfields

A standard layout class that manually specifies the alignment size

Memory alignment for non-standard layout classes

GLSLang's memory alignment

Layout decoration of buffer

location

Passing data with GLM and GLSLang


As we all know, the running program needs to occupy memory. When coding, it is assumed that the space on the stack is continuous, and all the variables defined are continuously distributed on the stack.

In fact, although the variables are continuously distributed on the stack, the compiler will rearrange the variables according to different types and alignments to achieve the optimal situation.

#include <stdio.h>

#define print_position(type, n)                 \
    type n;                                     \
    printf(#n ": %p\n", &n);

int main(void) {
  print_position(int, a);    // a: 0x7ffe84765408
  print_position(double, b); // b: 0x7ffe84765410
  print_position(char, c);   // c: 0x7ffe84765407
  print_position(float, d);  // d: 0x7ffe8476540c
}

This article mainly focuses on the alignment of structures.

Why do you need memory alignment

performance

Modern processors have multiple levels of caches through which data must pass; supporting single-byte reads tightly binds memory throughput to execution unit throughput (called cpu-bound, CPU-bound). This has many similar reasons as  PIO being overridden by DMA  on hardware drivers.

The CPU always reads in a word size (4 bytes on a 32bit processor), when accessing an unaligned address - the processor reads in multiple words if the CPU supports it. The CPU will read the address requested by the program across words, which will result in memory reads and writes twice the size of the requested data. So it can easily be the case that reading 2 bytes is slower than reading 4 bytes.

If a two-byte data is not aligned within a word, the processor only needs to read it once and perform an offset calculation, which usually takes only one cycle.

In addition, alignment can better determine whether they are on the same cache line, and some types of applications will optimize the cache line to achieve better performance.

scope

Given an arbitrary address space, if the architecture considers the 2 least significant bits (LSB) to be always 0 (like a 32bit machine), then it can access four times as much memory (2 bits can represent 4 different states), or the same sized memory but with two additional flag bits. 2 least significant bits means 4-byte alignment, the address will only change from the 2nd bit when incrementing, the lowest 2 bits are always  00.

This can affect the physical structure of the processor, which means two fewer bits on the address bus, or two fewer pins on the CPU, or two fewer wires on the circuit board.

atomicity

The CPU can operate on aligned word memory atomically, which means that no instruction can interrupt the operation. This is critical to the correct operation of many lock-free data structures and other concurrency paradigms.

in conclusion

A processor's memory system is much more complex than what is described here, and here is a discussion of how  x86 processors are actually addressed  , which would be helpful to understand (many processors work similarly).

There are many other benefits to insisting on memory alignment, which can be read in this article. The primary purpose of a computer is to transfer data, and modern memory architectures and technologies have been optimized over decades to facilitate the processing of more data input and output among more and faster execution units in a highly reliable manner.

data model

In C and its derived languages, in many cases, the size of the type is related to the platform, so the data model is used to define the data size under different platforms.

 

Although the data model definition is very clear, when dealing with cross-platform code, the handling of data type size is a headache.

Fortunately, C/C++  stdint.h also provides more types of fixed-length integers, the lengths are mainly  8, 16, 32 and  64 bit, and provides fixed-length integers  fast with  different requirements least.

  • Fixed-length integer, eg  uint8_t, int16_t. Fixed-length integers are compiler-optional, so there may not be a specified type. The bit length specified by the fixed-length integer type cannot be more or less, that is, the bit length matching is mandatory.
  • The closest fixed-length integer, eg  int_least16_t, uint_least16_t. The closest fixed-length integer can be more but never less than the specified bit length. Such as using  uint_least8_t, but the platform does not support it  uint16_t but supports it  uint32_t, so the type is  uint32_t.
  • The fastest fixed-length integer, eg  int_fast32_t, uint_fast32_t. The fastest fixed-length integer type refers to the integer type that can be more but not less than the specified bit length, and the fastest-executing integer type is used when the specified bit length is satisfied. Such as using  uint_fast8_t, the platform supports  uint32_t and  uint16_t, but the fastest is  uint32_t, so the former is used for this type.

Finally, because pointers have different sizes on different platforms, when converting pointer bit integers, you can choose the optional sum of the standard library for cross-platform  intptr_t compatibility  uintptr_t.

C++ memory alignment 

The data model in this chapter is LP64 data model 

signature request

ordinary class

First, 可平凡复制类型 meet all of the following conditions

  • At least one undiscarded 复制构造函数, 移动构造函数, 复制赋值运算符or移动赋值运算符
  • Every copy constructor is trivial or deleted
  • Every move constructor is trivial or deleted
  • Every copy assignment operator is trivial or deleted
  • Every move-assignment operator is trivial or deleted
  • has a non-deleted trivial destructor

One  平凡类, meeting all of the following conditions

  • is a trivially copyable type
  • has one or more default constructors, all of which are trivial or deleted, and at least one of them is not deleted

 

struct A {};  // is trivial
struct B { B(B const&) = delete; };  // is trivial
struct C { C() {} }; // is non-trivial
struct D { ~D() {} }; // is non-trivial
struct E { ~E() = delete; }; // is non-trivial
struct F { private: ~F() = default; } // is non-trivial
struct G { virtual ~G() = default; } // is non-trivial
struct H {
  H() = default;
  H(const H &) = delete;
  H(H &&) noexcept = delete;
  H &operator=(H const &) = delete;
  H &operator=(H &&) noexcept = delete;
  ~H() = default;
}; // is non-trivial
struct I { I() = default; I(int) {} }; // is trivial
struct J {
  J() = default;
  J(const J &) {}
}; // is non-trivial
struct K { int x; }; // is trivial
struct L { int x{0}; }; // is non-trivial

If you compile with gcc or clang, you will find that the compiler displays  Eand F are  H ordinary classes. According to the standard, they should not be ordinary classes. You can check   the bug reports of gcc  and  clang in bugzilla.

In addition, trivially copyable classes can be used to  copy ::memcpy or  ::memmove copy between two objects without potential overlap.

struct A { int x; };
A a = { .x = 10 }; // C++20
A b = { .x = 20 };
::memcpy(&b, &a, sizeof(A)); // b.x = 10

Ordinary classes can be considered as not holding resources, so objects can be directly overwritten or discarded without causing resource leaks.

template <typename T, size_t N>
void destroy_array_element(
    typename ::std::enable_if<::std::is_trivial<T>::value>::type (&/* arr */)[N]) {}

template <typename T, size_t N> void destroy_array_element(T (&arr)[N]) {
  for (size_t i = 0; i < N; ++i) {
    arr[i].~T();
  }
}

Standard Layout Class

A standard layout class that meets all of the following conditions

  • all non-static data members are standard-layout class types or references to them
  • No virtual functions and virtual base classes
  • All non-static data members have the same accessibility
  • No base class for non-standard layouts
  • Non-static data members and bit-fields in this class and all of its base classes are first declared in the same class
  • Given the class S, and as a base class the set M(S)has no elements, where M(X) for type X is defined as follows:
    • If X is a non-union class type with no (possibly inherited) non-static data members, then the set M(X) is empty.
    • If X is a non-union class type whose first non-static data member (possibly an anonymous union) has type X0, then the set M(X) contains elements from X0 and M(X0).
    • If X is a union type, the set M(X) is the union of the set containing all UiU_{i}Ui​ and each M(UiU_{i}Ui​) set, where each UiU_{i}Ui​ is The type of the ith non-static data member of X .
    • If X is an array type whose element type is XeX_{e}Xe​, the set M(X) contains elements from XeX_{e}Xe​ and M(XeXeXe).
    • If X is not a class type or an array type, then the set M(X) is empty.
struct A { int a; }; // is standard layout
struct B : public A { double b; }; // isn't standard layout
struct C { A a; double b; }; // is standard layout
struct D {
    int a;
    double b;
}; // is standard layout
struct E {
    public: int a;
    private: double b;
}; // isn't standard layout
struct F {
    public: int fun() { return 0; }
    private: double a;
}; // is standard layout

 

Summary of trivial and standard layout classes

Obviously all types in the C language are standard layouts, but C++ introduces the concept of POD (plain old data) to represent these types in C (C++20 removed this concept), that is, all of the following conditions are met the type:

  • ordinary class
  • Standard Layout Class
  • All non-static data members are of POD class type

It can be understood that the ordinary class specifies that a type does not care about any resources, that is, the most basic construction and destruction methods; the standard layout class specifies how a type lays out each field. As long as it is a standard layout class, it can be painlessly operated with C programs, but this type may not be a trivial type, so the POD is split into two concepts.

The best thing to understand is that  ::std::vectorit uses RAII to manage resources by itself, with complex construction and destructors. It is not an ordinary class, but it is a standard layout class , so it completely follows the memory alignment method, and can also be used  memcpy Copy its internal value.

// #include <stdint.h>
// #include <stdlib.h>
// #include <string.h>
// #include <iostream>
// #include <vector>
::std::vector<char> v{'a', 'b', 'c'};
uintptr_t *copy = reinterpret_cast<uintptr_t *>(::alloca(sizeof v));
::memcpy(copy, &v, sizeof v);
for (size_t i = 0, e = sizeof(v) / sizeof(uintptr_t); i < e; ++i) {
    ::std::cout << copy[i] << ::std::endl;
}
// maybe output:
// 94066226852544
// 94066226852547
// 94066226852547

 

Memory alignment for standard layout classes

There are some rules to follow for memory alignment:

  1. The object's starting address is divisible by its alignment size
  2. The offset of the member relative to the starting address can be divisible by its own alignment size, otherwise fill bytes after the previous member
  3. The size of the class is divisible by its alignment size, otherwise pad bytes at the end
  4. If it is an empty class, objects of this class must occupy one byte according to the standard (unless the  empty base class is optimized ), and the size of an empty class in C is  0  bytes
  5. By default, a type's alignment size is the same as the maximum alignment size of all its fields

 

Ordinary standard layout class

For any standard layout class, you can easily use the above rules to determine the size of the type

struct S {}; // sizeof = 1, alignof = 1
struct T : public S { char x; }; // sizeof = 1, alignof = 1
struct U {
  int x;  // offsetof = 0
  char y; // offsetof = 4
  char z; // offsetof = 5
}; // sizeof = 8, alignof = 4
struct V {
  int a;    // offsetof = 0
  T b;      // offsetof = 4
  U c;      // offsetof = 8
  double d; // offsetof = 16
}; // sizeof = 24, alignof = 8
struct W {
  int val;  // offset = 0
  W *left;  // offset = 8
  W *right; // offset = 16
}; // sizeof = 24, alignof = 8

 Finally, I want to explain the array. The array is like you have introduced the length of the array and variables of this type at this position.

struct S { int x[4]; }; // sizeof = 16, alignof = 4
struct T {
  int a;      // offsetof = 0
  char b[9];  // offsetof = 4
  short c[2]; // offsetof = 14
  double *d;  // offsetof = 24
}; // sizeof = 32, alignof = 8
struct U {
  char x;    // offsetof = 0
  char y[1]; // offsetof = 1
  short z;   // offsetof = 2
}; // sizeof = 4, alignof = 2

 Do you think this is the end? Of course not, there is a very interesting usage in the C language, that is,  the flexible array declaration that appeared in C99 . Define the last field as an array with a length of 0. At this time, the underlying data type of the array will affect the alignment size of the type, but will not affect the size of the entire type. Of course, there is no support for the C++ standard, and it depends entirely on the compiler to expand.

struct S {
  int i;      // offset = 0
  double d[]; // offset = 8
}; // sizeof = 8, alignof = 8
struct T {
  int i;     // offset = 0
  char c[0]; // offset = 4
}; // sizeof = 4, alignof = 4

 Classes with flexible array members need to use dynamic allocation, because flexible array members cannot be initialized. In fact, the compiler cannot determine the length of the array, so even if the given extra space is not enough to store the underlying type data, the correctness of the access is guaranteed by the programmer, and the scope of access overflow will be UB.

S s1; // sizeof(s1) = 8, length(d) = 1, accessing d is a UB
// S s2 = {1, {3.14}}; // error: initialization of flexible array member is not allowed
S* s3 = reinterpret_cast<S*>(alloca(sizeof(S))); // equivalent to s1
// s4: sizeof(*s4) = 8, length(d) = 6
S *s4 = reinterpret_cast<S *>(alloca(sizeof(S) + 6 * sizeof(S::d[0])));
// s5: sizeof(*s5) = 8, length(d) = 1, accessing d[1] is a UB
S *s5 = reinterpret_cast<S *>(alloca(sizeof(S) + 10));
*s4 = *s5; // copy size = sizeof(S)

Standard layout class with bitfields

For standard layout classes with bit fields, it is also very simple, bit fields will not be stored across the underlying data, that is to say, when the remaining bits are not enough, the next bit field field will be stored in the next underlying data. The unnamed bit field field can play a placeholder role. In addition, after the bit field is declared, it will actually be filled with an underlying data into the class, and the size and alignment of the class will be affected by the underlying data.

struct S {
  // offsetof = 0
  unsigned char b1 : 3, : 2;
  // offsetof = 1
  unsigned char b2 : 6, b3 : 2;
}; // sizeof = 2, alignof = 1

The size of a bitfield field can be specified as 0, meaning that the next bitfield will be declared in the next underlying data. But the actual 0-length bit field field does not introduce an underlying data for the class.

struct S { int : 0; }; // sizeof = 1, alignof = 1
struct T {
  uint64_t : 0;
  uint32_t x; // offsetof = 0
}; // sizeof = 4, alignof = 4
struct U {
  // offsetof = 0
  unsigned char b1 : 3, : 0;
  // offsetof = 1
  unsigned char b2 : 2;
}; // sizeof = 2, alignof = 1

A standard layout class that manually specifies the alignment size

Going back to the five rules at the beginning of this chapter, in fact, it is also applicable when specifying the alignment manually.

#pragma pack(N) When specifying and  gnu::packed arranging fields, it is carried out in a packaged manner, that is, each field is arranged continuously, and no additional memory holes will be generated between fields, which can reduce unnecessary waste of memory.

struct [[gnu::packed]] S {
  uint8_t x;  // offsetof = 0
  uint16_t y; // offsetof = 1
}; // sizeof = 3, alignof = 1
struct [[gnu::packed]] T {
  uint16_t x : 4;
  uint8_t y; // offsetof = 1
}; // sizeof = 2, alignof = 1
struct [[gnu::packed]] alignas(4) U {
  uint8_t x;  // offsetof = 0
  uint16_t y; // offsetof = 1
}; // sizeof = 4, alignof = 4
struct [[gnu::packed]] alignas(4) V {
  uint16_t x : 4;
  uint8_t y; // offsetof = 1
}; // sizeof = 4, alignof = 4

 

But today the focus is on  alignas the declarators introduced by C++11. In fact, it can not only specify how to align the structure, but also specify how to align an object. The specified alignment size must be a positive integer power of 2. If the specified alignment is weaker than the default alignment, the compiler may ignore it or report an error.

The simplest thing is to start with the declaration of the specified structure.

struct alignas(4) S {}; // sizeof = 4, alignof = 4
struct SS {
  S s;  // offsetof = 0
  S *t; // offsetof = 8
}; // sizeof = 16, alignof = 8
struct alignas(SS) T {
  S s;     // offsetof = 0
  char t;  // offsetof = 4
  short u; // offsetof = 6
  short v; // offsetof = 8
}; // sizeof = 16, alignof = 8
struct alignas(1) U : public S {}; // error or ignore
// struct alignas(5) V : public S {}; // error
struct alignas(4) W : public S {};

alignas The application of is mainly to obtain better performance, or to match SIMD instructions.

 

Memory alignment for non-standard layout classes

For non-standard layout classes caused by access restrictions, we cannot assume that they are laid out according to the standard layout, and their behavior depends on the compiler. In the C++11 standard, only variables with the same access are guaranteed to be arranged in the order of declaration, but the order of variables with different access is not guaranteed.

struct S {
 public:  int s;
          int t;
 private: int u;
 public:  int v;
};

That is to say, in the above example, it is only guaranteed  &S::s < &S::t < &S::v, but not guaranteed  &S::s < &S::u. In other words, in memory,  s, t, u, v the order in which it may appear, and  u, s, t, v the order in which it may appear.

Of course, not only the order problems caused by accessibility, but also the fields declared in different classes will cause order problems. In other words, we cannot assume that the variables declared in the base class must be located before the variables declared in the derived class.

struct S { int s; };
struct T { int t; };
struct U : public S, T { int u; };

That said, in the example above, there are no guarantees  &U::s < &U::u. But the standard guarantees that when a derived class pointer is converted to a base class pointer, the offset of the base class word object is automatically calculated. But there is no guarantee that the first address of the object of U is the first address of the word object of S.

U *up = reinterpret_cast<U *>(alloca(sizeof(U)));
S *ssp = static_cast<S *>(up); // offset adjustment
T *stp = static_cast<T *>(up); // offset adjustment
S *rsp = reinterpret_cast<S *>(up); // no offset adjustment
T *rtp = reinterpret_cast<T *>(up); // no offset adjustment

 Finally, let's talk about the memory alignment of virtual classes, which is a very interesting question. The standard does not specify how to implement virtual functions, but most compilers use virtual tables to implement them, that is, insert a pointer to a virtual function table in the object. But it should be noted that the virtual table only exists in one object, and there will be no virtual table in the base class subobject.

struct S {
  bool s; // offsetof = 0
}; // sizeof = 1, alignof = 1
struct T {
  virtual ~T() = default;
  int t;
};
struct U : public S, T {
  virtual ~U() = default;
  int u;
};

In the implementation of the compiler, it is likely to arrange virtual base classes first, and then arrange non-virtual base classes, so the class size and layout cannot be determined in different arrangements.

 

GLSLang's memory alignment

GLSL 4.60, Vulkan binding

In GLSLang, a word is 4 bytes long. The alignment in GLSLang is also very similar to that in C/C++, so the alignment described in the memory alignment of the standard layout class is basically the same as here. In addition, the size of the basic type in GLSLang is a multiple of the word length, so the subsequent  sizeof result unit defaults to word. 

 

Layout decoration of buffer

buffer As a readable and writable global object, its layout is implementation-defined unless manually specified. uniform It is a special global buffer, readable only, the default std140 layout and cannot be modified; push_constant it is a special uniform, which is stored in a register, with a size of about 16 words. The implementation can use uniform instead. When the size is exceeded, it will also be The excess part is stored in the uniform buffer, the default layout is std430, the layout can be modified.

In buffer, the default matrix is ​​column-major matrix ( column_major ), which can be modified in the layout

 

layout(binding = 0, column_major) buffer CMTest {
  // matrix stride = 16
  mat2x3 cm; // is equalent to 2-elements array of vec3
};
layout(binding = 1, row_major) buffer RMTest {
  // matrix stride = 8
  mat2x3 rm; // is equalent to 3-elements array of vec2
};

 

packedIt is consistent with the concept on the CPU, arrange fields as compactly as possible, save memory, regardless of alignment. However, SPIRV prohibits the use packedof and sharedlayout.

In GLSLang's layout, the offset is also an integer multiple of the alignment size. The std140 layout has the following rules

  1. A scalar type whose aligned size is the same as its own size
  2. A binary or quaternary vector with an underlying type of size N, the vector size is the same as the alignment size, and the alignment size is 2N2N2N or 4N4N4N. In particular, the size of the ternary vector is 3N3N3N, but the alignment size is 4N4N4N
  3. Fill each element in the array to a multiple of 4 words
  4. The alignment size of the structure variable is filled to a multiple of 4 words
  5. A column-major matrix with C columns and R rows is equivalent to an array with C R-element vectors; similarly, an array with N-element column-major matrices is equivalent to an array with N×CN \times CN×C array of R-element vectors
  6. A row-major matrix with C columns and R rows is equivalent to an array with R C-element vectors; similarly, an array with N-element row-major matrices is equivalent to an array with N×RN \times RN×R array of C element vectors
struct S {
    vec2 v;
};
layout(binding = 0, std140) buffer BufferObject {
    mat2x3 m;  // offsetof = 0
    bool b[2]; // offsetof = 8
    vec3 v1;   // offsetof = 16
    uint u;    // offsetof = 19
    S s;       // offsetof = 20
    float f2;  // offsetof = 24
    vec2 v2;   // offsetof = 26
    dvec3 dv;  // offsetof = 32
} bo; // sizeof = 40, alignof = 8

 For the std430 layout, there is no longer the requirement in std140 to align and fill array and structure elements to 4 words, that is, std430 is more compact and closer to our layout in the CPU.

struct S {
    vec2 v;
};
layout(binding = 0, std430) buffer BufferObject {
    mat2x3 m;  // offsetof = 0
    bool b[2]; // offsetof = 8
    vec3 v1;   // offsetof = 12
    uint u;    // offsetof = 15
    S s;       // offsetof = 16
    float f2;  // offsetof = 18
    vec2 v2;   // offsetof = 20
    dvec3 dv;  // offsetof = 24
} bo; // sizeof = 32, alignof = 8

 Although the default layout is already very good, sometimes you may manually modify the offset of the following fields. It needs to be used at this time  offset. But the compiler doesn't check if manually set offsets overlap with other fields.

layout(binding = 0, std430) buffer BufferObject {
    mat2x3 m;  // offsetof = 0
    bool b[2]; // offsetof = 8
    layout(offset = 48) uint u; // offsetof = 12
    vec2 v;    // offsetof = 14
    layout(offset = 0) int i; // offset = 0
} bo;

align The use of the CPU is also similar to the usage on the CPU mentioned above.

layout(binding = 0, std430) buffer BufferObject {
    vec2 a;                     // offsetof = 0
    layout(align = 16) float b; // offsetof = 4
} bo; // sizeof = 8, alignof = 4

location

The location is equivalent to a storage point for each shader data transmission, and the location is matched according to the number, which matches the previous shader  in and the next shader  out. The same location cannot be declared multiple times in the shader, in and out are completely different locations.

layout(location = 0) in vec2 i;
// layout(location = 0) in vec2 i2; // error

layout(location = 0) out vec2 o; // okay

The location size is 4 words. Each declared variable occupies a location, and when the size of the variable is greater than 4 words, it will occupy the next location.

layout(location = 0) in dvec4 dv;
// location = 1, occupied by dv
// layout(location = 1) in vec4 v; // error
layout(location = 2) in vec4 v;

 Each element of the array occupies a location, and the location value occupied by the element is incremented sequentially, so

layout(location = 0) in float a[2];
// location = 1, occupied by a[1]
layout(location = 2) in float f1;
layout(location = 3) in mat2 m[2]; // cxr matrix is equialent to c-elements array of r-vector
// location = 4, occupied by m[0]
// location = 5, occupied by m[1]
// location = 6, occupied by m[1]
layout(location = 7) in float f2;

 It is too troublesome to specify locations one by one, so you can use it  block to specify the initial location value of the first variable, and then let the location values ​​of other variables increase automatically.

layout(location = 3) in block {
  float a[2];                   // location = 3
  mat2 m;                       // location = 5
  vec2 v;                       // location = 7
  layout(location = 0) mat2 m2; // location = 0
  bool b;                       // location = 2
  // vec3 v3;                      // error
  layout(location = 8) vec3 v3; // location = 8
};

 You can also use struct to increment location, but the difference is that location cannot be specified in struct.

layout(locaton = 3) in struct {
  vec3 a;                      // location = 3
  mat2 b;                      // location = 4, 5
  // layout(location = 6) vec2 c; // error
};

I said before that the size of a location is 4 words. If a location only uses a part of it to store variables, it is obviously inefficient. component You can specify the offset of the variable in the location. However, it should be noted that the remaining part of the component after the offset must be able to store the variable.

layout(location = 0, component = 0) in float x; // l = 0, c = 0
layout(location = 0, component = 1) in float y; // l = 0, c = 1
layout(location = 0, component = 2) in float z; // l = 0, c = 2
layout(location = 1) in vec2 a;                 // l = 1, c = 0
// layout(location = 1, component = 2) in dvec3 b; // error
layout(location = 2, component = 0) in float b; // l = 2, c = 0
layout(location = 2, component = 1) in vec3 c;  // l = 2, c = 1

If the component of the array is specified, each element of the array still occupies each location incrementally, but the starting position of each location is the specified component.

layout(location = 0, component = 2) in float f[6]; // every element c = 2
// layout(location = 2, component = 0) in vec4 v;  // error
layout(location = 1, component = 0) in vec2 v;     // l = 1, c = 0
// f[1] at location 1, component 2

Passing data with GLM and GLSLang

The reason for writing this article is entirely because an alignment-related bug was encountered when transferring data between the host and device.

struct PCO {
    uint32_t time;    // offsetof = 0
    ::glm::vec2 extent; // offsetof = 4
}; // sizeof = 12, alignof = 4
layout(push_constant) uniform PCO {
    int time;    // offsetof = 0
    vec2 extent; // offsetof = 2
}; // sizeof = 4, alignof = 2

After repeatedly checking that there is no problem with the code, try to exchange  time fields and  extent fields, and the program can run normally. Obviously the alignment of the host is inconsistent with that of the device. Since SPIRV cannot be used  packed to compress the memory size, alignment can only be achieved manually.

Through previous study, here are several more elegant ways to solve this problem.

  • Use the bit field to generate holes, forcing the structure to be consistent with the layout in glsl
  • The specified field is consistent with the alignment size in glsl
struct PCO {
    uint32_t time;    // offsetof = 0
    uint32_t : 1, : 0;
    ::glm::vec2 extent; // offsetof = 8
}; // sizeof = 16, alignof = 4

struct PCO {
  uint32_t time;                // offsetof = 0
  alignas(8)::glm::vec2 extent; // offsetof = 8
}; // sizeof = 16, alignof = 8

Guess you like

Origin blog.csdn.net/qq_62464995/article/details/128440953