WebGPU learning (six): Learning "rotatingCube" example

Hello everyone, Chrome-> webgpu-samplers-> rotatingCube examples in this article to learn.

Last post:
WebGPU Learning (Five): modern graphics API support technical points and WebGPU Investigation

Learn rotatingCube.ts

We have learned , "drawing a triangle" example, compared with it, this example adds the following elements:

  • Increasing a uniform buffer object (abbreviated as UBO), for transmission matrix model view matrix of the resulting matrix (referred to as a matrix mvp) Projection matrix, and is updated in each frame
  • Set vertex
  • Open culling
  • Open depth test

Below, we open rotatingCube.ts file, in turn look at the New:

Increase a uniform buffer object

Introduction

In WebGL 1, we pass uniform variables corresponding to each gameObject (e.g. diffuseMap, diffuse color, model matrix, etc.) to the shader through uniform1i, uniform4fv other functions.
Many of the same value are not required to be delivered, for example as follows:
? If gameObject1 and gameObject3 use the same shader1, they have the same diffuse color, you only need to pass one of the diffuse color, but in general we put WebGL 1 two diffuse color are passed, resulting in duplication of costs.

WebGPU using uniform buffer object to pass uniform variables. global uniform buffer is a buffer, we only need to set a value, then each render pass, the setting data used range (set by offset, size), so that reuse the same data. If the value has changed uniform, uniform Buffer only need to modify the corresponding data.

In WebGPU, we can put all gameObject the model matrix to a ubo, all the camera's view and projection matrices set a ubo, each material (such as phong material, pbr material, etc.) of data (such as diffuse color, specular color, etc.) to a ubo, each light (such as direction light, point light, etc.) of the data (e.g., light color, light position, etc.) to a ubo, which can effectively reduce the transmission overhead uniform variable.

In addition, we need to pay attention ubo memory layout:
The default layout is std140, we can roughly understood as it agreed each column has four elements.
Let's illustrate:
The following ubo corresponding uniform block, the layout is defined as std140:

layout (std140) uniform ExampleBlock
{
    float value;
    vec3  vector;
    mat4  matrix;
    float values[3];
    bool  boolean;
    int   integer;
};

It is in memory of the actual layout:

layout (std140) uniform ExampleBlock
{
                     // base alignment  // aligned offset
    float value;     // 4               // 0 
    vec3 vector;     // 16              // 16  (must be multiple of 16 so 4->16)
    mat4 matrix;     // 16              // 32  (column 0)
                     // 16              // 48  (column 1)
                     // 16              // 64  (column 2)
                     // 16              // 80  (column 3)
    float values[3]; // 16              // 96  (values[0])
                     // 16              // 112 (values[1])
                     // 16              // 128 (values[2])
    bool boolean;    // 4               // 144
    int integer;     // 4               // 148
};

That is, the first element ubo is value, which is the first element 2-4 0 (to align);
5-7 th element of the vector values of x, y, z, and eighth element is 0 ;
9-24 matrix element value (column priority);
25-27 element values for the values of the array, the first element 28 is 0;
29th element is boolean value into float, the first 30- 0 to 32 elements;
the first element 33 of the float to integer value into the first element to 34-36 0.

Analysis of this sample code corresponding

  • Define uniform block in the vertex shader

code show as below:

  const vertexShaderGLSL = `#version 450
  layout(set = 0, binding = 0) uniform Uniforms {
    mat4 modelViewProjectionMatrix;
  } uniforms;
  ...
  void main() {
    gl_Position = uniforms.modelViewProjectionMatrix * position;
    fragColor = color;
  }
  `;

The default layout for std140, specifies the set and binding, contains a matrix mvp

  • Creating uniformsBindGroupLayout

code show as below:

  const uniformsBindGroupLayout = device.createBindGroupLayout({
    bindings: [{
      binding: 0,
      visibility: 1,
      type: "uniform-buffer"
    }]
  });

visibility is GPUShaderStage.VERTEX (equal to 1), designated as type "uniform-buffer"

  • Create a uniform buffer

code show as below:

  const uniformBufferSize = 4 * 16; // BYTES_PER_ELEMENT(4) * matrix length(4 * 4 = 16)

  const uniformBuffer = device.createBuffer({
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });
  • Creating uniform bind group

code show as below:

  const uniformBindGroup = device.createBindGroup({
    layout: uniformsBindGroupLayout,
    bindings: [{
      binding: 0,
      resource: {
        buffer: uniformBuffer,
      },
    }],
  });
  • Each data frame update mvp matrix of uniform buffer

code show as below:

  //因为是固定相机,所以只需要计算一次projection矩阵
  const aspect = Math.abs(canvas.width / canvas.height);
  let projectionMatrix = mat4.create();
  mat4.perspective(projectionMatrix, (2 * Math.PI) / 5, aspect, 1, 100.0);
  
  ...
 
  
  //计算mvp矩阵
  function getTransformationMatrix() {
    let viewMatrix = mat4.create();
    mat4.translate(viewMatrix, viewMatrix, vec3.fromValues(0, 0, -5));
    let now = Date.now() / 1000;
    mat4.rotate(viewMatrix, viewMatrix, 1, vec3.fromValues(Math.sin(now), Math.cos(now), 0));

    let modelViewProjectionMatrix = mat4.create();
    mat4.multiply(modelViewProjectionMatrix, projectionMatrix, viewMatrix);

    return modelViewProjectionMatrix;
  }
  
  ...
  return function frame() {
    uniformBuffer.setSubData(0, getTransformationMatrix());
    ...
  }
  • render pass setting bind group

code show as below:

    ...
    passEncoder.setBindGroup(0, uniformBindGroup);

Detailed analysis of the "Update uniform buffer"

This example uses setSubData update uniform buffer:

  return function frame() {
    uniformBuffer.setSubData(0, getTransformationMatrix());
    ...
  }

We WebGPU Learning (Five): modern graphics API support technical points and WebGPU Investigation -> Approaching zero driver overhead-> persistent map buffer , reference is made WebGPU There are two ways to achieve "CPU to transfer data to the GPU", that is, GPUBuffer updated values:
1. call GPUBuffer-> setSubData method
2. persistent map buffer technique

We look at how to use the second method in this example:

function setBufferDataByPersistentMapBuffer(device, commandEncoder, uniformBufferSize, uniformBuffer, mvpMatricesData) {
    const [srcBuffer, arrayBuffer] = device.createBufferMapped({
        size: uniformBufferSize,
        usage: GPUBufferUsage.COPY_SRC
    });

    new Float32Array(arrayBuffer).set(mvpMatricesData);
    srcBuffer.unmap();

    commandEncoder.copyBufferToBuffer(srcBuffer, 0, uniformBuffer, 0, uniformBufferSize);
    const commandBuffer = commandEncoder.finish();

    const queue = device.defaultQueue;
    queue.submit([commandBuffer]);

    srcBuffer.destroy();
}

return function frame() {
    //uniformBuffer.setSubData(0, getTransformationMatrix());
     ...

    const commandEncoder = device.createCommandEncoder({});

    setBufferDataByPersistentMapBuffer(device, commandEncoder, uniformBufferSize, uniformBuffer, getTransformationMatrix());
     ...
}

In order to verify the performance, I did a benchmark test , create a ubo, it contains 160,000 mat4, conducted js profile:

Use setSubData (setBufferDataBySetSubData function call):
2019-12-22 screenshots morning 10.09.43.png-38.6kB

setSubData occupied 91.54%

Use persistent map buffer (setBufferDataByPersistentMapBuffer function call):
2019-12-22 screenshots morning 10.09.50.png-52.9kB

createBufferMapped和setBufferDataByPersistentMapBuffer占72.72+18.06=90.78%

We can see almost two performance. However, considering the persistent map buffer from the principle to achieve faster (cpu and gpu share a buffer, do not need to copy), so the method should be preferred.

In addition, WebGPU community is still discussing how to optimize the update buffer data (if someone proposes to increase GPUUploadBuffer pass), so we also need to continue to monitor progress in that regard.

Reference material

Advanced-GLSL->Uniform buffer objects

Set vertex

  • Transmitting vertex position and the vertex shader to the color data attribute (in) in

code show as below:

  const vertexShaderGLSL = `#version 450
  ...
  layout(location = 0) in vec4 position;
  layout(location = 1) in vec4 color;
  layout(location = 0) out vec4 fragColor;
  void main() {
    gl_Position = uniforms.modelViewProjectionMatrix * position;
    fragColor = color;
  }
  
  const fragmentShaderGLSL = `#version 450
  layout(location = 0) in vec4 fragColor;
  layout(location = 0) out vec4 outColor;
  void main() {
    outColor = fragColor;
  }
  `;

Here set color to fragColor (out, WebGL varying variable corresponding to 1), then receives the fragment shader fragColor, which was set outColor, whereby the fragment corresponding to the vertex color to color

  • Create vertices buffer, set cube vertex data

code show as below:

cube.ts:

//每个顶点包含position,color,uv数据
export const cubeVertexArray = new Float32Array([
    // float4 position, float4 color, float2 uv,
    1, -1, 1, 1,   1, 0, 1, 1,  1, 1,
    -1, -1, 1, 1,  0, 0, 1, 1,  0, 1,
    -1, -1, -1, 1, 0, 0, 0, 1,  0, 0,
    1, -1, -1, 1,  1, 0, 0, 1,  1, 0,
    1, -1, 1, 1,   1, 0, 1, 1,  1, 1,
    -1, -1, -1, 1, 0, 0, 0, 1,  0, 0,

    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,
    1, -1, 1, 1,   1, 0, 1, 1,  0, 1,
    1, -1, -1, 1,  1, 0, 0, 1,  0, 0,
    1, 1, -1, 1,   1, 1, 0, 1,  1, 0,
    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,
    1, -1, -1, 1,  1, 0, 0, 1,  0, 0,

    -1, 1, 1, 1,   0, 1, 1, 1,  1, 1,
    1, 1, 1, 1,    1, 1, 1, 1,  0, 1,
    1, 1, -1, 1,   1, 1, 0, 1,  0, 0,
    -1, 1, -1, 1,  0, 1, 0, 1,  1, 0,
    -1, 1, 1, 1,   0, 1, 1, 1,  1, 1,
    1, 1, -1, 1,   1, 1, 0, 1,  0, 0,

    -1, -1, 1, 1,  0, 0, 1, 1,  1, 1,
    -1, 1, 1, 1,   0, 1, 1, 1,  0, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
    -1, -1, -1, 1, 0, 0, 0, 1,  1, 0,
    -1, -1, 1, 1,  0, 0, 1, 1,  1, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,

    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,
    -1, 1, 1, 1,   0, 1, 1, 1,  0, 1,
    -1, -1, 1, 1,  0, 0, 1, 1,  0, 0,
    -1, -1, 1, 1,  0, 0, 1, 1,  0, 0,
    1, -1, 1, 1,   1, 0, 1, 1,  1, 0,
    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,

    1, -1, -1, 1,  1, 0, 0, 1,  1, 1,
    -1, -1, -1, 1, 0, 0, 0, 1,  0, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
    1, 1, -1, 1,   1, 1, 0, 1,  1, 0,
    1, -1, -1, 1,  1, 0, 0, 1,  1, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
]);
rotatingCube.ts:

  const verticesBuffer = device.createBuffer({
    size: cubeVertexArray.byteLength,
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST
  });
  verticesBuffer.setSubData(0, cubeVertexArray);

Because only one set of vertex data, it can be used here setSubData to set up, has little impact on performance

  • When you create a render pipeline, the specified attribute vertex shader

code show as below:

cube.ts:

export const cubeVertexSize = 4 * 10; // Byte size of one cube vertex.
export const cubePositionOffset = 0;
export const cubeColorOffset = 4 * 4; // Byte offset of cube vertex color attribute.
rotatingCube.ts:

  const pipeline = device.createRenderPipeline({
    ...
    vertexState: {
      vertexBuffers: [{
        arrayStride: cubeVertexSize,
        attributes: [{
          // position
          shaderLocation: 0,
          offset: cubePositionOffset,
          format: "float4"
        }, {
          // color
          shaderLocation: 1,
          offset: cubeColorOffset,
          format: "float4"
        }]
      }],
    },
    ...
  });
  • The number of vertices render pass-> draw designated 36

code show as below:

  return function frame() {
    ...
    const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
    ...
    passEncoder.draw(36, 1, 0, 0);
    passEncoder.endPass();
    ...
  }

Open culling

The relevant code is:

  const pipeline = device.createRenderPipeline({
    ...
    rasterizationState: {
      cullMode: 'back',
    },
    ...
  });

Related defined as:

enum GPUFrontFace {
    "ccw",
    "cw"
};
enum GPUCullMode {
    "none",
    "front",
    "back"
};
...

dictionary GPURasterizationStateDescriptor {
    GPUFrontFace frontFace = "ccw";
    GPUCullMode cullMode = "none";
    ...
};

Which ccw counter-clockwise, cw represents clockwise.

Since the present example is provided to cullMode Back, not provided frontFace (frontFace the default CCW), it will WebGPU counterclockwise to the outside, all the back surface of the triangle (the vertex direction of the inner connector, i.e. clockwise triangle) weed out

Reference material

[Getting the WebGL] VI polygon vertices and
Investigation: Rasterization State

Open depth test

Now analysis of the relevant code, and ignore the code associated with the template test:

  • When you create a render pipeline, set depthStencilState

code show as below:

  const pipeline = device.createRenderPipeline({
    ...
    depthStencilState: {
      //开启深度测试
      depthWriteEnabled: true,
      //设置比较函数为less,后面会继续说明 
      depthCompare: "less",
      //设置depth为24bit
      format: "depth24plus-stencil8",
    },
    ...
  });
  • Create a depth texture (note its size-> depth of 1, the format is also 24bit), it's view is set to render pass-> depth and stencil attachment-> attachment

code show as below:

  const depthTexture = device.createTexture({
    size: {
      width: canvas.width,
      height: canvas.height,
      depth: 1
    },
    format: "depth24plus-stencil8",
    usage: GPUTextureUsage.OUTPUT_ATTACHMENT
  });

  const renderPassDescriptor: GPURenderPassDescriptor = {
    ...
    depthStencilAttachment: {
      attachment: depthTexture.createView(),

      depthLoadValue: 1.0,
      depthStoreOp: "store",
      ...
    }
  };

Among them, depthStencilAttachment is defined as:

dictionary GPURenderPassDepthStencilAttachmentDescriptor {
    required GPUTextureView attachment;

    required (GPULoadOp or float) depthLoadValue;
    required GPUStoreOp depthStoreOp;
    ...
};

depthLoadValue and depthStoreOp with WebGPU Learning (II): Learning "drawing a triangle" Example -> Analysis render pass-> colorAttachment the loadOp StoreOp and the like, we analyzed directly related code for this example:


  const pipeline = device.createRenderPipeline({
    ...
    depthStencilState: {
      ...
      depthCompare: "less",
      ...
    },
    ...
  });
  
  ...

  const renderPassDescriptor: GPURenderPassDescriptor = {
    ...
    depthStencilAttachment: {
      ...
      depthLoadValue: 1.0,
      depthStoreOp: "store",
      ...
    }
  };

When a depth test, depthLoadValue value will fragment gpu z values ​​(in the range [0.0-1.0]) provided herein (here 1.0) comparison. Wherein the comparison function uses depthCompare defined function (here, less, meaning that all values ​​of z greater than or equal to 1.0 fragment is rejected)

Reference material

Depth testing

The final rendering results

Screenshot 12.01.20.png-54.8kB 2019-12-22 PM

Reference material

WebGPU规范
webgpu-samplers Github Repo
WebGPU-5

Guess you like

Origin www.cnblogs.com/chaogex/p/12079739.html