Compute Shader basis

ComputeShader:
    GPGPU: General Purpose GPU Programming, GPU general computing, using the parallel nature of GPU. A number of parallel branching logic low order data suitable GPGPU. Platform or interface: DirectCompute, OpenCL, CUDA and so on.
    Definitions: GPGPU program, running on the GPU from the conventional rendering pipeline outside the program, and can map or data output buffer.
    Specific: mathematics, parallelization, does not affect the rendering results.
    Uses: massively parallel branches mathematical operation is less suitable for use compute shader, the disadvantage is very slow data transmission between the cpu and GPU.

 

  ComputeShader need to call the script

/ * 
     * Test.compute 
     * / 
    // main function, a plurality of kernal compute shader functions, when used manually specify a script which calls 
    #pragma Kernel CSMain // declare a read-write mapping 
    RWTexture2D <float4> the Result;
     / / numthreads: create a thread group size, which is a thread group that contains the number of threads, the following command, said: specify that each thread group contains 64 threads
     // the above mentioned id: the overall structure of the thread where the thread of the index 
    [numthreads ( . 8 , . 8 , . 1 )]
     void CSMain (on uint3 that ID: SV_DispatchThreadID) 
    { 
        the Result [id.xy] = float4 (id.x & id.y, (id.x & 15 ) / 15.0 , (& id.y 15 ) / 15.0 ,
     
    0.0 ); 
    } 

    / * 
     * cs script of call test.compute 
     * / 
    public ComputeShader Shader; 
     
    void RunShader () 
    { 
        int kernelHandle = shader.FindKernel ( " CSMain " ); 
         
        the RenderTexture TEX = new new the RenderTexture ( 256 , 256 , 24 )
         / / manual marked as random access 
        tex.enableRandomWrite = to true ; 
        tex.Create (); 
        
        // the data to the CPU of the GPU (delay occurs in a different storage space of the mobile data needs to be considered when optimizing use) 
        shader.SetTexture (kernelHandle, "The Result " , TEX);
         // Specify how the thread group is divided 
        shader.Dispatch (kernelHandle, 256 / . 8 , 256 / . 8 , . 1 ); 
    }

  Structured Buffers: an array of a data type, can be floating, integers, structure:

    StructuctedBuffer<float> floatBuffer;
    RWStructuredBuffer<int> readWriteIntBuffer;
    struct VecMatPair
    {
        public Vector3 point;
        public Matrix4x4 matrix;
    }
    RWStructuredBuffer<VecMatPair> dataBuffer;
 /*
     * test.compute
     */
    #pragma kernel Multiply
    struct VecMatPair
    {
        float3 pos;
        float4x4 mat;
    };
     
    RWStructuredBuffer<VecMatPair> dataBuffer;
     
    [numthreads(16,1,1)]
    void Multiply (uint3 id : SV_DispatchThreadID)
    {
        dataBuffer[id.x].pos = mul(dataBuffer[id.x].mat,
                        float4(dataBuffer[id.x].pos, 1.0));
    }
    
    /*
     * 调用test.compute的cs脚本
     */
    public ComputeShader Shader; 
     
    void RunShader () 
    { 
        VecMatPair [] Data = new new VecMatPair [ . 5 ]; 
        VecMatPair [] Output = new new VecMatPair [ . 5 ]; 
         
        // the INITIALIZE the DATA HERE 
        
        // 76 = th. 3 float + 4 * 4 th float, buffer need to manually specify the size of the number of byte 
        ComputeBuffer buffer = new new ComputeBuffer (data.length, 76 ); 
        buffer.SetData (Data); 
        
        int Kernel = shader.FindKernel ( " the Multiply " );
         // set buffer
        shader.SetBuffer (Kernel, " dataBuffer " , Buffer); 
        shader.Dispatch (Kernel, data.length, 1 , 1 ); 
        
        // and the texture is not the same, structured buffers need to be clear shift from GPU memory to CPU (Performance consumption is very large, generally only required when shader pull data from the need to use) 
        buffer.GetData (Output); 
    }
  As can be seen, the processing of the texture of the need to manually return the CPU, so faster than compute buffer.
 
important point:
    (1) OpenGL ES 3.1 supports only four compute buffers
    (2) can be viewed cs versions Show complied code which is cs_4_x or cs_5_0 etc.
 
reference:

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/sifenkesi/p/11374615.html