OpenGL.Shader: Zhige teaches you to write a live filter client (11) Visual filter: Gaussian filter optimization, convolution dimensionality reduction operation

OpenGL.Shader: Zhige teaches you to write a live filter client(11)

 

1. The need for optimization

Why talk about Gaussian filtering again? Because Gaussian filtering is widely used in practice, and a relatively large convolution kernel may be used (for example, [5x5] [7x7] [9x9], note that they are all odd sizes). At this time, if the simple Gaussian filter implementation introduced earlier is used again, the GPU memory will increase, and the program performance will not be satisfactory at this time. This is the background of the article, how to optimize it? The use of convolutional separability is an idea to solve this problem. Cooperating with OpenGL.Shader's multi-shader combination variable can effectively realize the optimization.

Next, let's briefly introduce what is convolutional separability. Assuming that A is a column vector and B is a row vector, then A*B=B*A. For example, when n=3, as shown in the figure below:

    

According to the above theory, the Gaussian Kernel2D mentioned earlier can be understood as KernelX·KernelY, then the convolution process can be expressed as: Dst=Src*Kernel2D=(Src*KernelX)*KernelY=(Src*KernelY)*KernelX. Generally speaking, no matter what kind of convolution filtering, as long as the convolution kernel can be disassembled into the product of two row vectors and column vectors, then the convolution is separable. This process can also be understood as reducing the two-dimensional convolution kernel to one-dimensional processing.

 

2. Use FBO to achieve dimensionality reduction of convolution kernel

Everyone understands the truth, so how to realize its dimensionality reduction optimization logic? We can use FBO off-screen rendering, first process the logic part of Src*KernelX, save the result to FBO1, and then use FBO1 as input and KernelY to calculate the final output again. GPUImage also implements other complex convolution optimizations and some filter merging operations through this idea, which imitates the core implementation class GpuBaseFilterGroup.hpp, the code is as follows:

#ifndef GPU_FILTER_GROUP_HPP
#define GPU_FILTER_GROUP_HPP

#include <list>
#include <vector>
#include "GpuBaseFilter.hpp"

class GpuBaseFilterGroup : public GpuBaseFilter {
    // GpuBaseFilter virtual method
public:
    GpuBaseFilterGroup()
    {
        mFBO_IDs = NULL;
        mFBO_TextureIDs = NULL;
    }

    virtual void onOutputSizeChanged(int width, int height) {
        if (mFilterList.empty()) return;

        destroyFrameBufferObjs();

        std::vector<GpuBaseFilter>::iterator itr;
        for(itr=mFilterList.begin(); itr!=mFilterList.end(); itr++)
        {
            GpuBaseFilter filter = *itr;
            filter.onOutputSizeChanged(width, height);
        }

        createFrameBufferObjs(width, height);
    }

    virtual void destroy() {
        destroyFrameBufferObjs();

        std::vector<GpuBaseFilter>::iterator itr;
        for(itr=mFilterList.begin(); itr!=mFilterList.end(); itr++)
        {
            GpuBaseFilter filter = *itr;
            filter.destroy();
        }
        mFilterList.clear();

        GpuBaseFilter::destroy();
    }


private:
    void createFrameBufferObjs(int _width, int _height ) {
        const int num = mFilterList.size() -1;
        // 最后一次draw是在显示屏幕上
        mFBO_IDs = new GLuint[num];
        mFBO_TextureIDs = new GLuint[num];
        glGenFramebuffers(num, mFBO_IDs);
        glGenTextures(num, mFBO_TextureIDs);

        for (int i = 0; i < num; i++) {
            glBindTexture(GL_TEXTURE_2D, mFBO_TextureIDs[i]);
            glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
            glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
            glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); // GL_REPEAT
            glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); // GL_REPEAT
            glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, _width, _height, 0, GL_RGBA, GL_FLOAT, 0);

            glBindFramebuffer(GL_FRAMEBUFFER, mFBO_IDs[i]);
            glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, mFBO_TextureIDs[i], 0);

            glBindTexture(GL_TEXTURE_2D, 0);
            glBindFramebuffer(GL_FRAMEBUFFER, 0);
        }
    }

    void destroyFrameBufferObjs() {
        if (mFBO_TextureIDs != NULL) {
            glDeleteTextures(length(mFBO_TextureIDs), mFBO_TextureIDs);
            delete[] mFBO_TextureIDs;
            mFBO_TextureIDs = NULL;
        }
        if (mFBO_IDs != NULL) {
            glDeleteFramebuffers(length(mFBO_IDs), mFBO_IDs);
            delete[] mFBO_IDs;
            mFBO_IDs = NULL;
        }
    }


    inline int length(GLuint arr[]) {
        return sizeof(arr) / sizeof(arr[0]);
    }


public:
    std::vector<GpuBaseFilter> mFilterList;
    void addFilter(GpuBaseFilter filter) {
        mFilterList.push_back(filter);
    }

    GLuint* mFBO_IDs;
    GLuint* mFBO_TextureIDs;
};
#endif // GPU_FILTER_GROUP_HPP

The code is clear and easy to understand. Although it inherits GpuBaseFilter, it is in fact compatible with the previous parent class reference. Then overwrite this basic method. The key is that in onOutputSizeChanged, we take it separately:

virtual void onOutputSizeChanged(int width, int height) {     if (mFilterList.empty()) return;     // Check if the Filter list is empty, there is no need to continue if it is empty     destroyFrameBufferObjs();     // Destroy the future fbo and its binding Texture cache list     std::vector<GpuBaseFilter>::iterator itr;     for(itr=mFilterList.begin(); itr!=mFilterList.end(); itr++)     {         GpuBaseFilter filter = *itr;         filter.onOutputSizeChanged(width, height);     }     // activate the sizechanged method of all filters     createFrameBufferObjs(width, height);     // create the corresponding fbo and its binding texture }













Next is the private method createFrameBufferObjs. For the specific code, please refer to the above. Note that the number of creations is mFilterList.size() -1, because the final output image of the last time is rendered to the screen.

 

Three, GpuGaussianBlurFilter2

The last and most complicated step is to transform the realization of our Gaussian filter. Look at the color code first

attribute vec4 position;
attribute vec4 inputTextureCoordinate;
const int GAUSSIAN_SAMPLES = 9;
uniform float widthFactor;
uniform float heightFactor;
varying vec2 blurCoordinates[GAUSSIAN_SAMPLES];
void main()
{
    gl_Position = position;
    vec2 singleStepOffset = vec2(widthFactor, heightFactor);
    int multiplier = 0;
    vec2 blurStep;
    for (int i = 0; i < GAUSSIAN_SAMPLES; i++)
    {
        multiplier = (i - ((GAUSSIAN_SAMPLES - 1) / 2));
        //-4,-3,-2,-1,0,1,2,3,4
        blurStep = float(multiplier) * singleStepOffset;
        blurCoordinates[i] = inputTextureCoordinate.xy + blurStep;
    }
}

With the current vertex as the center, output the coordinate positions of 9 sampling points. Among them, the first reading code singleStepOffset may be questionable. In the past, we used to pass in both the width factor and the height factor, so the position of the 9 sampling points becomes 45. ° Tilt diagonally. I won’t explain it here, I will explain it in detail below.

uniform sampler2D SamplerY;
uniform sampler2D SamplerU;
uniform sampler2D SamplerV;
uniform sampler2D SamplerRGB;
mat3 colorConversionMatrix = mat3(
                   1.0, 1.0, 1.0,
                   0.0, -0.39465, 2.03211,
                   1.13983, -0.58060, 0.0);
vec3 yuv2rgb(vec2 pos)
{
   vec3 yuv;
   yuv.x = texture2D(SamplerY, pos).r;
   yuv.y = texture2D(SamplerU, pos).r - 0.5;
   yuv.z = texture2D(SamplerV, pos).r - 0.5;
   return colorConversionMatrix * yuv;
}
uniform int drawMode; //0为YUV,1为RGB 
const int GAUSSIAN_SAMPLES = 9;
varying vec2 blurCoordinates[GAUSSIAN_SAMPLES];
void main()
{
    vec3 fragmentColor = vec3(0.0); 
    if (drawMode==0) 
    {
        fragmentColor += (yuv2rgb(blurCoordinates[0]) *0.05); 
        fragmentColor += (yuv2rgb(blurCoordinates[1]) *0.09); 
        fragmentColor += (yuv2rgb(blurCoordinates[2]) *0.12); 
        fragmentColor += (yuv2rgb(blurCoordinates[3]) *0.15); 
        fragmentColor += (yuv2rgb(blurCoordinates[4]) *0.18); 
        fragmentColor += (yuv2rgb(blurCoordinates[5]) *0.15); 
        fragmentColor += (yuv2rgb(blurCoordinates[6]) *0.12); 
        fragmentColor += (yuv2rgb(blurCoordinates[7]) *0.09); 
        fragmentColor += (yuv2rgb(blurCoordinates[8]) *0.05); 
        gl_FragColor = vec4(fragmentColor, 1.0);
    }
    else 
    { 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[0]).rgb *0.05); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[1]).rgb *0.09); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[2]).rgb *0.12); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[3]).rgb *0.15); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[4]).rgb *0.18); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[5]).rgb *0.15); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[6]).rgb *0.12); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[7]).rgb *0.09); 
        fragmentColor += (texture2D(SamplerRGB, blurCoordinates[8]).rgb *0.05); 
        gl_FragColor = vec4(fragmentColor, 1.0);
    } 
}

It looks very complicated, yuv and rgb. In fact, the original design of GpuBaseFilter is compatible with two modes. I was lazy and didn't write it all. The actual content is very simple to write here. The texture color value is sampled according to 9 coordinate points, and then the convolution operation is performed. The Gaussian kernel is also left. It is simplified to 9 weight coefficients. It is worth noting that these 9 coefficients are not randomly defined, they are generated according to the general Gaussian formula, and they have all been normalized. The sum of the 9 coefficients is equal to One!

 

So why can't I be lazy and just write a drawMode, how do I understand the 45° tilt of the vertex coordinates caused by the singleStepOffset of the vertex shader? Then go on.

class GpuGaussianBlurFilter2 : public GpuBaseFilterGroup {

    GpuGaussianBlurFilter2()
    {
        GAUSSIAN_BLUR_VERTEX_SHADER = "...";
        GAUSSIAN_BLUR_FRAGMENT_SHADER = "..."; //上方代码
    }
    
    ~GpuGaussianBlurFilter2()
    {
        if(!GAUSSIAN_BLUR_VERTEX_SHADER.empty()) GAUSSIAN_BLUR_VERTEX_SHADER.clear();
        if(!GAUSSIAN_BLUR_FRAGMENT_SHADER.empty()) GAUSSIAN_BLUR_FRAGMENT_SHADER.clear();
    }

    void init() {
        GpuBaseFilter filter1;
        filter1.init(GAUSSIAN_BLUR_VERTEX_SHADER.c_str(), GAUSSIAN_BLUR_FRAGMENT_SHADER.c_str());
        mWidthFactorLocation1  = glGetUniformLocation(filter1.getProgram(), "widthFactor");
        mHeightFactorLocation1 = glGetUniformLocation(filter1.getProgram(), "heightFactor");
        mDrawModeLocation1     = glGetUniformLocation(filter1.getProgram(), "drawMode");
        addFilter(filter1);

        GpuBaseFilter filter2;
        filter2.init(GAUSSIAN_BLUR_VERTEX_SHADER.c_str(), GAUSSIAN_BLUR_FRAGMENT_SHADER.c_str());
        mWidthFactorLocation2  = glGetUniformLocation(filter2.getProgram(), "widthFactor");
        mHeightFactorLocation2 = glGetUniformLocation(filter2.getProgram(), "heightFactor");
        mDrawModeLocation2     = glGetUniformLocation(filter2.getProgram(), "drawMode");
        addFilter(filter2);
    }
    ... ...
}

Look at the parameterless method init() that overrides the parent class (GpuBaseFilter) of the parent class (GpuBaseFilterGroup) to facilitate unified management and code reference. The content is not difficult, that is, to create two shader objects, both use the same set of shaders, but the shader object references are distinguished.

class GpuGaussianBlurFilter2 : public GpuBaseFilterGroup {
    ... ... 接上
public:
    void onOutputSizeChanged(int width, int height) {
        GpuBaseFilterGroup::onOutputSizeChanged(width, height);
    }
    void setAdjustEffect(float percent) {
        mSampleOffset = range(percent * 100.0f, 0.0f, 2.0f);
    }
}

Then overwrite the no-parameter method onOutputSizeChanged of the parent class (GpuBaseFilter) of the parent class (GpuBaseFilterGroup), no special processing is required, and the code logic of the parent class GpuBaseFilterGroup is used directly (the content is shown in Table 2 above)

class GpuGaussianBlurFilter2 : public GpuBaseFilterGroup {
    ... ... 接上
public:
    void onDraw(GLuint SamplerY_texId, GLuint SamplerU_texId, GLuint SamplerV_texId,
                void* positionCords, void* textureCords)
    {
        if (mFilterList.size()==0) return;
        GLuint previousTexture = 0;
        int size = mFilterList.size();
        for (int i = 0; i < size; i++) {
            GpuBaseFilter filter = mFilterList[i];
            bool isNotLast = i < size - 1;
            if (isNotLast) {
                glBindFramebuffer(GL_FRAMEBUFFER, mFBO_IDs[i]);
            }
            glClearColor(0, 0, 0, 0);
            if (i == 0) {
                drawFilter1YUV(filter, SamplerY_texId, SamplerU_texId, SamplerV_texId, positionCords, textureCords);
            }
            if (i == 1) { //isNotLast=false, not bind FBO, draw on screen.
                drawFilter2RGB(filter, previousTexture, positionCords, mNormalTextureCords);
            }
            if (isNotLast) {
                glBindFramebuffer(GL_FRAMEBUFFER, 0);
                previousTexture = mFBO_TextureIDs[i];
            }
        }
    }
}

To the key rendering method onDraw, overwrite the grandpa class GpuBaseFilter, which is a common method for all filter interfaces. The code logic is simplified with reference to GPUImage. Um, I don't feel much to say, because this onDraw method is a specific implementation of GpuGaussianBlurFilter2, and it is not universal. Just follow the optimized implementation logic of the Gaussian filter in (Catalog 1).

When i==0, perform off-screen rendering of src*kernelX first. Enter to view the content of drawFilter1YUV.

class GpuGaussianBlurFilter2 : public GpuBaseFilterGroup {
    ... ... 接上
private:
    void drawFilter1YUV(GpuBaseFilter filter,
                 GLuint SamplerY_texId, GLuint SamplerU_texId, GLuint SamplerV_texId,
                 void* positionCords, void* textureCords)
    {
        if (!filter.isInitialized())
            return;
        glUseProgram(filter.getProgram());
        glUniform1i(mDrawModeLocation1, 0);
        //glUniform1f(mSampleOffsetLocation1, mSampleOffset);
        glUniform1f(mWidthFactorLocation1, mSampleOffset / filter.mOutputWidth);
        glUniform1f(mHeightFactorLocation1, 0);

        glVertexAttribPointer(filter.mGLAttribPosition, 2, GL_FLOAT, GL_FALSE, 0, positionCords);
        glEnableVertexAttribArray(filter.mGLAttribPosition);
        glVertexAttribPointer(filter.mGLAttribTextureCoordinate, 2, GL_FLOAT, GL_FALSE, 0, textureCords);
        glEnableVertexAttribArray(filter.mGLAttribTextureCoordinate);

        glActiveTexture(GL_TEXTURE0);
        glBindTexture(GL_TEXTURE_2D, SamplerY_texId);
        glUniform1i(filter.mGLUniformSampleY, 0);
        glActiveTexture(GL_TEXTURE1);
        glBindTexture(GL_TEXTURE_2D, SamplerU_texId);
        glUniform1i(filter.mGLUniformSampleU, 1);
        glActiveTexture(GL_TEXTURE2);
        glBindTexture(GL_TEXTURE_2D, SamplerV_texId);
        glUniform1i(filter.mGLUniformSampleV, 2);

        glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
        glDisableVertexAttribArray(filter.mGLAttribPosition);
        glDisableVertexAttribArray(filter.mGLAttribTextureCoordinate);
        glBindTexture(GL_TEXTURE_2D, 0);
    }  
}

Pay attention, all general Shader application object indexes are specified incoming filters, and only three special object indexes are specially processed. When i==0, perform off-screen rendering of src*kernelX first. For the first time, we are sampling the image from the original video data yuv, so we need to use drawMode=0, that is, YUV mode. Then, widthFactor passes in SampleOffset / screen width as the width factor of the vertex shader. but! ! ! heightFactor is passed in 0! That is, the current vertical offset is not performed, so the vertex shader will not have a 45° stepped sampling offset. At this time, the off-screen rendering of src*kernalX is completed.

 

Strike while the iron is hot, when i==1, this time is the last time of the loop, no off-screen rendering is needed, and it is output directly to the screen. Note that previousTexture caches the bound texture id of i==0 off-screen rendering, which carries the rendering result of i==0, which is src*kernelX. We take this as input and perform drawFilter2RGB

class GpuGaussianBlurFilter2 : public GpuBaseFilterGroup {
    ... ... 接上
private:
    void drawFilter2RGB(GpuBaseFilter filter, GLuint _texId, void* positionCords, void* textureCords)
    {
        if (!filter.isInitialized())
            return;
        glUseProgram(filter.getProgram());
        glUniform1i(mDrawModeLocation2, 1);
        //glUniform1f(mSampleOffsetLocation2, mSampleOffset);
        glUniform1f(mWidthFactorLocation2, 0);
        glUniform1f(mHeightFactorLocation2, mSampleOffset / filter.mOutputHeight);

        glVertexAttribPointer(filter.mGLAttribPosition, 2, GL_FLOAT, GL_FALSE, 0, positionCords);
        glEnableVertexAttribArray(filter.mGLAttribPosition);

        glVertexAttribPointer(filter.mGLAttribTextureCoordinate, 2, GL_FLOAT, GL_FALSE, 0, textureCords);
        glEnableVertexAttribArray(filter.mGLAttribTextureCoordinate);

        glActiveTexture(GL_TEXTURE3);
        glBindTexture(GL_TEXTURE_2D, _texId);
        glUniform1i(filter.mGLUniformSampleRGB, 3);

        glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
        glDisableVertexAttribArray(filter.mGLAttribPosition);
        glDisableVertexAttribArray(filter.mGLAttribTextureCoordinate);
        glBindTexture(GL_TEXTURE_2D, 0);
    }
}

It is also the three specially processed shader reference indexes. At this time, the rgb texture is passed in, drawMode uses rgb mode 1, and then this time widthFactor is 0, and heightFactor is passed in mSampleOffset / screen height to complete the last step (src*kernelX)*kernelY.

 

Four, summary

The final test can be in GpuFilterRender::checkFilterChange, replace the reference of GpuGaussianBlurFilter with GpuGaussianBlurFilter2. You can compare the effect and find that the realization of 2 is more obvious than that of 1. That is because GpuGaussianBlurFilter is just a simple 3x3 Gaussian kernel, and GpuGaussianBlurFilter2 is the result of 9x9. Although it seems that the amount of calculation is larger than that of 2 to 1, but looking at the GPU memory situation, it has been reduced by almost half, and the performance has been significantly improved.

This article not only gets to the optimization of the dimensionality reduction of the convolution kernel, but also gets the hierarchical rendering method of multiple shaders. Is it possible to consider the combination effect of multiple filters? Code synchronization to: https://github.com/MrZhaozhirong/NativeCppApp                       /src/main/cpp/gpufilter/filter/GpuGaussianBlurFilter2.hpp

Guess you like

Origin blog.csdn.net/a360940265a/article/details/107861956