The OpenGL Compute Shader (general purpose computing parallel speedup)

 

 

  Shader has usually we use vertex shader, a geometry shader, fragment shader, these are rasterized graphics rendering service, a new 4.3 OpenGL after the Compute a Shader, parallel speedup for general purpose computing, now it introduced.

  

  Before introducing the need to introduce Compute Shader about ImageTexture :

    Common GLSL Texture in the read-only (data acquisition sampling Sampler), must be written in the Annex Texture current pixel in the frame buffer the bound Fragment Shader, the writing can not be arbitrarily specified location, and can not be read at the same time write with a texture (I tried not, there blog is also said no, it should not do it).

  1, generate Texture

void WKS::ImageTexture::setupTexture() {
    glGenTextures(1, &this->textureID);
    glBindTexture(GL_TEXTURE_2D, this->textureID);
    glTexStorage2D(GL_TEXTURE_2D, 1, GL_RGBA32F, width, height);
    // turn off filtering and wrap modes
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
    glBindTexture(GL_TEXTURE_2D, 0);
}

  Note that, if the size of the texture to generate a fixed with glTexStorage2D (), can not be used glTexImage2D ()

  2, generate ImageTexture

glBindImageTexture(0, this->inputTexture, 0, GL_FALSE, 0, GL_READ_ONLY, GL_RGBA32F);

  1 corresponds inputTexture, Texture generated texture ID. The first parameter is ImageTexture binding point, texture and texture binding point should not overlap.

  3, GLSL declared

layout (rgba32f, binding = 0) uniform image2D input_image;

  Supplementary: ImageTexture bottom is Texture, it can be accessed on the Host

    a, initialization, incoming data

void WKS::ImageTexture::Transfer2Texture(float* data) {
    glBindTexture(GL_TEXTURE_2D, this->textureID);
    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, GL_RGBA, GL_FLOAT, data);
}

    b, data is read

float* WKS::Texture::GetTextureData(GLuint width, GLuint height, GLuint channels, GLuint texID) {
    float* data = new float[width * height * channels];
    glBindTexture(GL_TEXTURE_2D, texID);
    if(channels==1)    glGetTexImage(GL_TEXTURE_2D, 0, GL_RED, GL_FLOAT, data);
    if(channels==3) glGetTexImage(GL_TEXTURE_2D, 0, GL_RGB, GL_FLOAT, data);
    if (channels == 4) glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_FLOAT, data);
    glBindTexture(GL_TEXTURE_2D, 0);
    return data;
}

 

  Now to introduce Compute Shader :

#version 430 core
layout (local_size_x=16, local_size_y=16) in;

uniform float v[4];

layout (rgba32f, binding = 0) uniform image2D input_image;
layout (rgba32f, binding = 1) uniform image2D output_image;

shared vec4 mat_shared[16][16];

void main(void)
{
    ivec2 pos=ivec2(gl_GlobalInvocationID.xy);
    mat_shared[pos.x][pos.y]=imageLoad(input_image,pos);
    barrier();
    vec4 data=mat_shared[pos.x][pos.y];
    data.r=v[0]+data.r;
    data.g=v[1]+data.g;
    data.b=v[2]+data.b;
    data.a=v[3]+data.a;
    imageStore(output_image,pos.xy,data);
}

  

  A calculation of a complete unit, layout (local_size_x = 16, local_size_y = 16) in ; is a local working group by the calculation unit 16 of the composition 16 *, workgroup can share local variables Shadered. 
  Working group consisting of a plurality of local global working group consisting of:
glDispatchCompute(1, 1, 1);

  Start calculated parameter indicates the global dimension of the Working Group (Working Group for the local unit), (1,1,1) means that only a local workgroup.

  Note: Compute Shader is only one phase (usually rendered vertex + fragment 2 stages), compile type selection GL_COMPUTE_SHADER

Shader(const char* computePath) :programId(0)
    {
        std::vector<ShaderFile> fileVec;
        fileVec.push_back(ShaderFile(GL_COMPUTE_SHADER, computePath));
        loadFromFile(fileVec);
    }

  

  Example:

  Vec4 all elements of a 4 * 4 matrix plus vec4 (0, 0.1,0.2,0.3)

  initialization:

void SceneRendering::setupAddData() {
    int num = 4 * 4 * 4;
    this->inputData = new float[num];
    for (int i = 0; i < num; i++) inputData[i] = i;
    for (int i = 0; i < 4; i++) v[i] = i*0.1f;
    shader_add = new Shader("./Shader/add.comp");
    WKS::ImageTexture* texturePtr = new WKS::ImageTexture(4, 4);
    this->inputTexture = texturePtr->GetTextureID();
    this->outputTexture = (new WKS::ImageTexture(4, 4))->GetTextureID();
    texturePtr->Transfer2Texture(inputData);
}

  Call Compute Shader:

void SceneRendering::performCompute() {
    this->shader_add->use();
    this->shader_add->setVecN("v", 4, v);
    glBindImageTexture(0, this->inputTexture, 0, GL_FALSE, 0, GL_READ_ONLY, GL_RGBA32F);
    glBindImageTexture(1, this->outputTexture, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F);
    glDispatchCompute(1, 1, 1);
    glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
    glFinish ();
}

  The main function calls, the resulting output:

   glClearColor(0.5f, 0.5f, 0.5f, 1.0f);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // 我们现在不使用模板缓冲//Compute Shader
    this->performCompute();
    float* data = WKS::Texture::GetTextureData(4, 4, 4, this->outputTexture);
    int index = 0;
    for (int i = 0; i < 4; i++) {
        for (int j = 0; j < 4; j++) {
            std::cout << "(" <<data[index]<<","<<data[index+1]<<","<<data[index+2]<<","<<data[index+3]<< ")" << " ";
            index += 4;
        }
        std::cout << std::endl;
    }
    std::cout<< std::endl;
    free(data);

  image:

  

  

  

Guess you like

Origin www.cnblogs.com/chen9510/p/12000320.html