(Android-RTC-7) Analyze AndroidVideoDecoder and see how webrtc uses the shader to output texture to yuv420

The output content has become less recently. Firstly, there is indeed a lot of work content, and secondly, it is about life. But this should not be a reason to give up studying. Without further ado, this article continues the content of the previous article and analyzes AndroidVideoDecoder & HardwareVideoEncoder.

1. Review of premise

Let's review the previous content a little to decode DefaultVideoDeocderFactory. createDecoder is created by HardwareVideoDecoderFactory and SoftwareVideoDecoderFactory respectively, and then passed back to PeerConnectionClient.

@Override
public @Nullable
VideoDecoder createDecoder(VideoCodecInfo codecType) {
    VideoDecoder softwareDecoder = softwareVideoDecoderFactory.createDecoder(codecType);
    final VideoDecoder hardwareDecoder = hardwareVideoDecoderFactory.createDecoder(codecType);
    if (softwareDecoder == null && platformSoftwareVideoDecoderFactory != null) {
        softwareDecoder = platformSoftwareVideoDecoderFactory.createDecoder(codecType);
    }
    if (hardwareDecoder != null && softwareDecoder != null) {
        // Both hardware and software supported, wrap it in a software fallback
        return new VideoDecoderFallback(
                /* fallback= */ softwareDecoder, /* primary= */ hardwareDecoder);
    }
    return hardwareDecoder != null ? hardwareDecoder : softwareDecoder;
}

For HardwareVideoDecoderFactory createDecoder creates a hard decoder, as follows:

@Nullable
@Override
public VideoDecoder createDecoder(VideoCodecInfo codecType) {
    VideoCodecMimeType type = VideoCodecMimeType.valueOf(codecType.getName());
    MediaCodecInfo info = findCodecForType(type);
    if (info == null) {
        return null;
    }
    CodecCapabilities capabilities = info.getCapabilitiesForType(type.mimeType());
    return new AndroidVideoDecoder(new MediaCodecWrapperFactoryImpl(), info.getName(), type,
            MediaCodecUtils.selectColorFormat(MediaCodecUtils.DECODER_COLOR_FORMATS, capabilities),
            sharedContext);
}

Yes, today we are going to see what is worth exploring and learning about this AndroidVideoDecoder.

二、AndroidVideoDecoder

When looking at Android open source projects, it is recommended that you make good use of the Structure tag in the lower left corner of the AS window. It will display all member variables and methods of the current class, giving readers an overall overview.

 Let’s talk about the key points directly, initDeocde -> SurfaceTextureHelper

@Override
public VideoCodecStatus initDecode(Settings settings, Callback callback) {
    this.decoderThreadChecker = new ThreadChecker();
    this.callback = callback;
    if (sharedContext != null) {
        surfaceTextureHelper = createSurfaceTextureHelper();
        surface = new Surface(surfaceTextureHelper.getSurfaceTexture());
        surfaceTextureHelper.startListening(this);
    }
    return initDecodeInternal(settings.width, settings.height);
}
private SurfaceTextureHelper(Context sharedContext, Handler handler, boolean alignTimestamps,
                             YuvConverter yuvConverter, FrameRefMonitor frameRefMonitor) {
    if (handler.getLooper().getThread() != Thread.currentThread()) {
        throw new IllegalStateException("SurfaceTextureHelper must be created on the handler thread");
    }
    this.handler = handler;
    this.timestampAligner = alignTimestamps ? new TimestampAligner() : null;
    this.yuvConverter = yuvConverter;
    this.frameRefMonitor = frameRefMonitor;
    // 1
    eglBase = EglBase.create(sharedContext, EglBase.CONFIG_PIXEL_BUFFER);
    try {
        // Both these statements have been observed to fail on rare occasions, see BUG=webrtc:5682.
        eglBase.createDummyPbufferSurface();
        eglBase.makeCurrent();
    } catch (RuntimeException e) {
        // Clean up before rethrowing the exception.
        eglBase.release();
        handler.getLooper().quit();
        throw e;
    }
    // 2
    oesTextureId = GlUtil.generateTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES);
    surfaceTexture = new SurfaceTexture(oesTextureId);
    setOnFrameAvailableListener(surfaceTexture, (SurfaceTexture st) -> {
        if (hasPendingTexture) {
            Logging.d(TAG, "A frame is already pending, dropping frame.");
        }
        hasPendingTexture = true;
        tryDeliverTextureFrame();
    }, handler);
}

1. EglBase.createDummyPbufferSurface creates a Pbuffer. For knowledge about pbuffer, you can read my previous related articles. Understand the difference between pbuffer and fbo. I think many knowledge bloggers have made this question very clear. Then during interviews, they often ask whether pbuffer can replace fbo? In what situations should pbuffer be used? Here are the answers given to netizens a few years ago:

pbuffer is suitable for large-scale multi-GPU computing projects. On the mobile terminal, fbo can really replace pbuffer in most usage scenarios.

2. SurfaceTexture.FrameAvailableListener. Many students know how SurfaceTexture works, so I won’t elaborate on it. So what I want to say here is that many students have asked how SurfaceTexture can efficiently extract frame data?

@Override
public VideoCodecStatus initDecode(Settings settings, Callback callback) {
    ... ...
    return initDecodeInternal(settings.width, settings.height);
}
private VideoCodecStatus initDecodeInternal(int width, int height) {
    codec = mediaCodecWrapperFactory.createByCodecName(codecName);
    codec.start();
    ... ...
    outputThread = createOutputThread();
    outputThread.start();
    ... ...
    return VideoCodecStatus.OK;
}
private Thread createOutputThread() {
    return new Thread("AndroidVideoDecoder.outputThread") {
        @Override
        public void run() {
            outputThreadChecker = new ThreadChecker();
            while (running) {
                deliverDecodedFrame();
            }
            releaseCodecOnOutputThread();
        }
    };
}

The speed here is a bit fast, so please sit tight and keep up with the pace. Go back to AndroidVideoDecoder's initDecoder -> initDecodeInternal -> createOutputThread. The meaning is obvious, that is, a thread continuously extracts and delivers decoded data and follows up with deliverDecodedFrame()

protected void deliverDecodedFrame() {
    ... ...
    MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
    int result = codec.dequeueOutputBuffer(info, DEQUEUE_OUTPUT_BUFFER_TIMEOUT_US);
    ... ...
    if (surfaceTextureHelper != null) {
        deliverTextureFrame(result, info, rotation, decodeTimeMs);
    } else {
        deliverByteFrame(result, info, rotation, decodeTimeMs);
    }
}

private void deliverTextureFrame(final int index, final MediaCodec.BufferInfo info,
                                 final int rotation, final Integer decodeTimeMs) {
    ... ...
    synchronized (renderedTextureMetadataLock) {
        if (renderedTextureMetadata != null) {
            codec.releaseOutputBuffer(index, false);
            return; // still waiting for previous frame, drop this one.
        }
        surfaceTextureHelper.setTextureSize(width, height);
        surfaceTextureHelper.setFrameRotation(rotation);
        renderedTextureMetadata = new DecodedTextureMetadata(info.presentationTimeUs, decodeTimeMs);
        codec.releaseOutputBuffer(index, /* render= */ true);
    }
}

@Override
public void onFrame(VideoFrame frame) {
    final VideoFrame newFrame;
    final Integer decodeTimeMs;
    final long timestampNs;
    synchronized (renderedTextureMetadataLock) {
        if (renderedTextureMetadata == null) {
            throw new IllegalStateException(
                    "Rendered texture metadata was null in onTextureFrameAvailable.");
        }
        timestampNs = renderedTextureMetadata.presentationTimestampUs * 1000;
        decodeTimeMs = renderedTextureMetadata.decodeTimeMs;
        renderedTextureMetadata = null;
    }
    // Change timestamp of frame.
    final VideoFrame frameWithModifiedTimeStamp =
            new VideoFrame(frame.getBuffer(), frame.getRotation(), timestampNs);
    callback.onDecodedFrame(frameWithModifiedTimeStamp, decodeTimeMs, null /* qp */);
}

deliverTextureFrame is not a true deliver Texture. The key is in renderedTextureMetadata. When you see onFrame, replace the pts of renderedTextureMetadata with the VideoFrame that is called back, and finally call back. So where is the callback interface for this onFrame? It is thrown by SurfaceTexture.FrameAvailableListener in SurfaceTextureHelper.

private void tryDeliverTextureFrame() {
    updateTexImage();

    final float[] transformMatrix = new float[16];
    surfaceTexture.getTransformMatrix(transformMatrix);
    long timestampNs = surfaceTexture.getTimestamp();

    final TextureBuffer buffer =
            new TextureBufferImpl(textureWidth, textureHeight, TextureBuffer.Type.OES, oesTextureId,
                    RendererCommon.convertMatrixToAndroidGraphicsMatrix(transformMatrix), handler,
                    yuvConverter, textureRefCountMonitor);
    
    final VideoFrame frame = new VideoFrame(buffer, frameRotation, timestampNs);
    listener.onFrame(frame);
    frame.release();
}

In the tryDeliverTextureFrame function, you can see that oesTextureId is encapsulated into TextureBuffer, and then encapsulated into VideoFrame to call back to AndroidVideoDeocder. You can see the key class YuvConverter, which is obviously an auxiliary class for converting Texture into yuv data.

Next, I won't pretend to be *, just look at the code and dig out valuable knowledge.

// TextureBufferImpl 
@Override
public VideoFrame.I420Buffer toI420() {
    return ThreadUtils.invokeAtFrontUninterruptibly(
            toI420Handler, () -> yuvConverter.convert(this));
}
// ThreadUtils
public static <V> V invokeAtFrontUninterruptibly(
        final Handler handler, final Callable<V> callable) {
    if (handler.getLooper().getThread() == Thread.currentThread()) {
        try {
            return callable.call();
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
    final Result result = new Result();
    final CaughtException caughtException = new CaughtException();
    final CountDownLatch barrier = new CountDownLatch(1);
    handler.post(new Runnable() {
        @Override
        public void run() {
            try {
                result.value = callable.call();
            } catch (Exception e) {
                caughtException.e = e;
            }
            barrier.countDown();
        }
    });
    awaitUninterruptibly(barrier);

    if (caughtException.e != null) {
        final RuntimeException runtimeException = new RuntimeException(caughtException.e);
        runtimeException.setStackTrace(
                concatStackTraces(caughtException.e.getStackTrace(), runtimeException.getStackTrace()));
        throw runtimeException;
    }
    return result.value;
}

2.1. Students who have studied Kotlin have more or less involved the suspend modifier - coroutine. A simple understanding is that when the business code execution process is synchronized and not blocked, code piece A is executed on the specified thread A, and code piece B is executed. Execute on specified thread B to ensure business process. So is there such a comfortable usage in the Java version? There must be some! The answer to the version is above. In fact, it uses Java's CountDownLatch to synchronize counting. awaitUninterruptibly(CountDownLatch) waits in a loop on the calling thread and returns the execution result. It is worth studying carefully and applying it to actual projects.

2.2. Open YuvConveter and quickly locate the convert method. The method remarks have been clearly written /*Converts the texture buffer to I420.*/. Continue to analyze how to convert. The code is divided into two parts to understand: ①The texture is drawn according to the yuv format. On fbo; ② Extract the content of fbo;

三、Texture Conveter YUV420

public I420Buffer convert(TextureBuffer inputTextureBuffer) {
    TextureBuffer preparedBuffer = (TextureBuffer) videoFrameDrawer.prepareBufferForViewportSize(
            inputTextureBuffer, inputTextureBuffer.getWidth(), inputTextureBuffer.getHeight());
    // We draw into a buffer laid out like
    //    +---------+
    //    |         |
    //    |  Y      |
    //    |         |
    //    |         |
    //    +----+----+
    //    | U  | V  |
    //    |    |    |
    //    +----+----+
    // In memory, we use the same stride for all of Y, U and V. The
    // U data starts at offset |height| * |stride| from the Y data,
    // and the V data starts at at offset |stride/2| from the U
    // data, with rows of U and V data alternating.
    //
    // Now, it would have made sense to allocate a pixel buffer with
    // a single byte per pixel (EGL10.EGL_COLOR_BUFFER_TYPE,
    // EGL10.EGL_LUMINANCE_BUFFER,), but that seems to be
    // unsupported by devices. So do the following hack: Allocate an
    // RGBA buffer, of width |stride|/4. To render each of these
    // large pixels, sample the texture at 4 different x coordinates
    // and store the results in the four components.
    //
    // Since the V data needs to start on a boundary of such a
    // larger pixel, it is not sufficient that |stride| is even, it
    // has to be a multiple of 8 pixels.
	// Note1
    final int frameWidth = preparedBuffer.getWidth();
    final int frameHeight = preparedBuffer.getHeight();
    final int stride = ((frameWidth + 7) / 8) * 8;
    final int uvHeight = (frameHeight + 1) / 2;
    // Total height of the combined memory layout.
    final int totalHeight = frameHeight + uvHeight;
    // Viewport width is divided by four since we are squeezing in four color bytes in each RGBA pixel.
    final int viewportWidth = stride / 4;
    // Note2
    // Produce a frame buffer starting at top-left corner, not bottom-left.
    final Matrix renderMatrix = new Matrix();
    renderMatrix.preTranslate(0.5f, 0.5f);
    renderMatrix.preScale(1f, -1f);
    renderMatrix.preTranslate(-0.5f, -0.5f);
    // Note3
    i420TextureFrameBuffer.setSize(viewportWidth, totalHeight);
    // Bind our framebuffer.
    GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, i420TextureFrameBuffer.getFrameBufferId());
    GlUtil.checkNoGLES2Error("glBindFramebuffer");
    // Note4
    // Draw Y.
    shaderCallbacks.setPlaneY();
    VideoFrameDrawer.drawTexture(drawer, preparedBuffer, renderMatrix, frameWidth, frameHeight,
            /* viewportX= */ 0, /* viewportY= */ 0, viewportWidth,
            /* viewportHeight= */ frameHeight);
    // Draw U.
    shaderCallbacks.setPlaneU();
    VideoFrameDrawer.drawTexture(drawer, preparedBuffer, renderMatrix, frameWidth, frameHeight,
            /* viewportX= */ 0, /* viewportY= */ frameHeight, viewportWidth / 2,
            /* viewportHeight= */ uvHeight);
    // Draw V.
    shaderCallbacks.setPlaneV();
    VideoFrameDrawer.drawTexture(drawer, preparedBuffer, renderMatrix, frameWidth, frameHeight,
            /* viewportX= */ viewportWidth / 2, /* viewportY= */ frameHeight, viewportWidth / 2,
            /* viewportHeight= */ uvHeight);

    GLES20.glReadPixels(0, 0, i420TextureFrameBuffer.getWidth(), i420TextureFrameBuffer.getHeight(),
            GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, i420ByteBuffer);

    GlUtil.checkNoGLES2Error("YuvConverter.convert");
    // Restore normal framebuffer.
    GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0);
	
	... ...
}

Note1 : The above code corresponds to the first part. The first part of the annotation mainly explains the 8-byte memory alignment , and the YUV420 format data is stored in a whole pixel (RGBA) buffer.

Note2 : Afterwards, a renderMatrix was prepared and three left-multiplication matrix operations were performed. The final renderMatrix = identity matrix MIdentity*T*S*-T. For the difference between left-multiplication and right-multiplication, please see the matrix operations here . The operation here is a bit similar to the 8-byte memory alignment routine. You will know it as you read below.

Note3 : Don’t look at the name i420TextureFrameBuffer. In fact, its format is a FrameBuffer of GLES20.GL_RGBA. The width is stride / 4, which is the frameWidth after memory alignment, and the height is frameHeight + uvHeight. This is actually the traditional size of yuv-size = width * height * 3 / 2.

private final GlTextureFrameBuffer i420TextureFrameBuffer =
            new GlTextureFrameBuffer(GLES20.GL_RGBA);
final int frameWidth = preparedBuffer.getWidth();
final int frameHeight = preparedBuffer.getHeight();
final int stride = ((frameWidth + 7) / 8) * 8;
final int uvHeight = (frameHeight + 1) / 2;
final int viewportWidth = stride / 4;
final int totalHeight = frameHeight + uvHeight;
i420TextureFrameBuffer.setSize(viewportWidth, totalHeight);

Note4 : If you want to understand the cool operations of DrawY, DrawU, and DrawV as you go down, you must first know how the ShaderCallbacks, GlGenericDrawer, and VideoFrameDrawer here work together.

private final ShaderCallbacks shaderCallbacks = new ShaderCallbacks();
private final GlGenericDrawer drawer = new GlGenericDrawer(FRAGMENT_SHADER, shaderCallbacks);

public GlGenericDrawer(String genericFragmentSource, ShaderCallbacks shaderCallbacks) {
        this(DEFAULT_VERTEX_SHADER_STRING, genericFragmentSource, shaderCallbacks);
    }

GlGenericDrawer is the encapsulation object of the Shader shader. Then I think the code here is a bit convoluted. I can't figure out why it is written like this. I have to analyze it slowly step by step.

Note4.1 : Analyze the DEFAULT_VERTEX_SHADER_STRING and FRAGMENT_SHADER passed in the GlGenericDrawer structure. Note that this is not all the shader content. Look down at the process VideoFrameDrawer.drawTexture -> GlGenericDrawer.drawOes -> prepareShader->createShader

/**
 * Draw an OES texture frame with specified texture transformation matrix. Required resources are
 * allocated at the first call to this function.
 */
@Override
public void drawOes(int oesTextureId, float[] texMatrix, int frameWidth, int frameHeight,
                    int viewportX, int viewportY, int viewportWidth, int viewportHeight) {
    prepareShader(
            ShaderType.OES, texMatrix, frameWidth, frameHeight, viewportWidth, viewportHeight);
    // Bind the texture.
    GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
    GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, oesTextureId);
    // Draw the texture.
    GLES20.glViewport(viewportX, viewportY, viewportWidth, viewportHeight);
    GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4);
    // Unbind the texture as a precaution.
    GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, 0);
}

// Visible for testing.
GlShader createShader(ShaderType shaderType) {
    return new GlShader(
            vertexShader, createFragmentShaderString(genericFragmentSource, shaderType));
}
static String createFragmentShaderString(String genericFragmentSource, ShaderType shaderType) {
    final StringBuilder stringBuilder = new StringBuilder();
    if (shaderType == ShaderType.OES) {
        stringBuilder.append("#extension GL_OES_EGL_image_external : require\n");
    }
    stringBuilder.append("precision mediump float;\n");
    stringBuilder.append("varying vec2 tc;\n");

    if (shaderType == ShaderType.YUV) {
        stringBuilder.append("uniform sampler2D y_tex;\n");
        stringBuilder.append("uniform sampler2D u_tex;\n");
        stringBuilder.append("uniform sampler2D v_tex;\n");

        // Add separate function for sampling texture.
        // yuv_to_rgb_mat is inverse of the matrix defined in YuvConverter.
        stringBuilder.append("vec4 sample(vec2 p) {\n");
        stringBuilder.append("  float y = texture2D(y_tex, p).r * 1.16438;\n");
        stringBuilder.append("  float u = texture2D(u_tex, p).r;\n");
        stringBuilder.append("  float v = texture2D(v_tex, p).r;\n");
        stringBuilder.append("  return vec4(y + 1.59603 * v - 0.874202,\n");
        stringBuilder.append("    y - 0.391762 * u - 0.812968 * v + 0.531668,\n");
        stringBuilder.append("    y + 2.01723 * u - 1.08563, 1);\n");
        stringBuilder.append("}\n");
        stringBuilder.append(genericFragmentSource);
    } else {
        final String samplerName = shaderType == ShaderType.OES ? "samplerExternalOES" : "sampler2D";
        stringBuilder.append("uniform ").append(samplerName).append(" tex;\n");

        // Update the sampling function in-place.
        stringBuilder.append(genericFragmentSource.replace("sample(", "texture2D(tex, "));
    }

    return stringBuilder.toString();
}

The code appends some content to createFragmentShaderString, so the complete shader content is as follows:

/* DEFAULT_VERTEX_SHADER_STRING */
varying vec2 tc;
attribute vec4 in_pos;
attribute vec4 in_tc;
uniform mat4 tex_mat;
void main() {
  gl_Position = in_pos;
  tc = (tex_mat * in_tc).xy;
}


/* FRAGMENT_SHADER */

#extension GL_OES_EGL_image_external : require
precision mediump float;
varying vec2 tc;
uniform samplerExternalOES tex;
// Difference in texture coordinate corresponding to one
// sub-pixel in the x direction.
uniform vec2 xUnit;
// Color conversion coefficients, including constant term
uniform vec4 coeffs;

void main() {
  gl_FragColor.r = coeffs.a + dot(coeffs.rgb,
      texture2D(tex, tc - 1.5 * xUnit).rgb);
  gl_FragColor.g = coeffs.a + dot(coeffs.rgb,
      texture2D(tex, tc - 0.5 * xUnit).rgb);
  gl_FragColor.b = coeffs.a + dot(coeffs.rgb,
      texture2D(tex, tc + 0.5 * xUnit).rgb);
  gl_FragColor.a = coeffs.a + dot(coeffs.rgb,
      texture2D(tex, tc + 1.5 * xUnit).rgb);
}

Note4.2 : After the shader is ready, look at the drawing process, which is DrawY, DrawU, and DrawV. First look at the ShaderCallbacks code and continue the analysis.

private static class ShaderCallbacks implements GlGenericDrawer.ShaderCallbacks {
    // Y'UV444 to RGB888, see https://en.wikipedia.org/wiki/YUV#Y%E2%80%B2UV444_to_RGB888_conversion
    // We use the ITU-R BT.601 coefficients for Y, U and V.
    // The values in Wikipedia are inaccurate, the accurate values derived from the spec are:
    // Y = 0.299 * R + 0.587 * G + 0.114 * B
    // U = -0.168736 * R - 0.331264 * G + 0.5 * B + 0.5
    // V = 0.5 * R - 0.418688 * G - 0.0813124 * B + 0.5
    // To map the Y-values to range [16-235] and U- and V-values to range [16-240], the matrix has
    // been multiplied with matrix:
    // {
   
   {219 / 255, 0, 0, 16 / 255},
    // {0, 224 / 255, 0, 16 / 255},
    // {0, 0, 224 / 255, 16 / 255},
    // {0, 0, 0, 1}}
    private static final float[] yCoeffs =
            new float[]{0.256788f, 0.504129f, 0.0979059f, 0.0627451f};
    private static final float[] uCoeffs =
            new float[]{-0.148223f, -0.290993f, 0.439216f, 0.501961f};
    private static final float[] vCoeffs =
            new float[]{0.439216f, -0.367788f, -0.0714274f, 0.501961f};

    public void setPlaneY() {
        coeffs = yCoeffs;
        stepSize = 1.0f;
    }

    public void setPlaneU() {
        coeffs = uCoeffs;
        stepSize = 2.0f;
    }

    public void setPlaneV() {
        coeffs = vCoeffs;
        stepSize = 2.0f;
    }

    @Override
    public void onNewShader(GlShader shader) {
        xUnitLoc = shader.getUniformLocation("xUnit");
        coeffsLoc = shader.getUniformLocation("coeffs");
    }
    @Override
    public void onPrepareShader(GlShader shader, float[] texMatrix, int frameWidth, int frameHeight,
                                int viewportWidth, int viewportHeight) {
        GLES20.glUniform4fv(coeffsLoc, /* count= */ 1, coeffs, /* offset= */ 0);
        // Matrix * (1;0;0;0) / (width / stepSize). Note that OpenGL uses column major order.
        GLES20.glUniform2f(
                xUnitLoc, stepSize * texMatrix[0] / frameWidth, stepSize * texMatrix[1] / frameWidth);
    }
}

You can get the general idea by reading the remarks: The derivation on the Wiki is wrong. The correct formula for converting RGB to YUV is as shown in the remarks.

Y = 0.299 * R + 0.587 * G + 0.114 * B
U = -0.168736 * R - 0.331264 * G + 0.5 * B + 0.5
V = 0.5 * R - 0.418688 * G - 0.0813124 * B + 0.5

To map the Y-values to range [16-235] and U- and V-values to range [16-240], the matrix has been multiplied with matrix:
{
   
   {219 / 255, 0, 0, 16 / 255},
{0, 224 / 255, 0, 16 / 255},
{0, 0, 224 / 255, 16 / 255},
{0, 0, 0, 1}}

Then I talked about a conversion relationship. In fact, the content expressed is the process of converting YUV Full range to TV range. As for what is Full range and TV range. Here is a brief introduction, please go here for details . (This can lead to the difference between BT601, BT709 and BT2020 )

YUV has many manifestations

  In addition to color space, you also need to pay attention to the various expressions of YUV, such as:

  YUV: YUV is an analog model, Y∈[0,1] U,V∈[-0.5,0.5] 

  YCbCr: Also called YCC or Y'CbCr. YCbCr is a digital signal. It contains two forms, namely TV range and full range. TV range is mainly a standard used by radio and television, and full range is mainly a standard used by PCs, so full range is sometimes called pc range

The range of each component of TV range is: YUV Y∈[16,235] Cb∈[16-240] Cr∈[16-240] 

The range of each component of Full range is: 0-255   

  Most of what we usually come into contact with is YCbCr (tv range), and most of the data decoded by ffmpeg is also this. Although ffmpeg describes its format as YUV420P, it is actually YCbCr420p tv range.

  YUV transfer range: Y' = 219.0*Y + 16 ; Cb = U * 224.0 + 128; Cr = V * 224.0 + 128;   

Final yCoeffs, uCoeffs, vCoeffs (matrix non-homogeneous operation) = 

The final result is yCoeffs = {0.299, 0.587, 0.114} * (219 / 255) = {0.256788f, 0.504129f, 0.0979059f, 0.0627451f} The same is true for uCoeffs and vCoeffs.

 

Note4.3 : After parsing coffs, let’s talk about xUnit (setpSize). It is difficult to understand the role of xUnit in the first place simply from the content of the shader. Combined with the previous renderMatrix to understand:

// Produce a frame buffer starting at top-left corner, not bottom-left.
        final Matrix renderMatrix = new Matrix();
        renderMatrix.preTranslate(0.5f, 0.5f);
        renderMatrix.preScale(1f, -1f);
        renderMatrix.preTranslate(-0.5f, -0.5f);

shaderCallbacks.setPlaneY();
VideoFrameDrawer.drawTexture(drawer, preparedBuffer, renderMatrix, frameWidth, frameHeight,
                /* viewportX= */ 0, /* viewportY= */ 0, viewportWidth,
                /* viewportHeight= */ frameHeight);

public static void drawTexture(RendererCommon.GlDrawer drawer, VideoFrame.TextureBuffer buffer,
                               Matrix renderMatrix, int frameWidth, int frameHeight, int viewportX, int viewportY,
                               int viewportWidth, int viewportHeight) {
    Matrix finalMatrix = new Matrix(buffer.getTransformMatrix());
    finalMatrix.preConcat(renderMatrix);
    float[] finalGlMatrix = RendererCommon.convertMatrixFromAndroidGraphicsMatrix(finalMatrix);
    switch (buffer.getType()) {
        case OES:
            drawer.drawOes(buffer.getTextureId(), finalGlMatrix, frameWidth, frameHeight, viewportX,
                    viewportY, viewportWidth, viewportHeight);
            break;
        case RGB:
            drawer.drawRgb(buffer.getTextureId(), finalGlMatrix, frameWidth, frameHeight, viewportX,
                    viewportY, viewportWidth, viewportHeight);
            break;
        default:
            throw new RuntimeException("Unknown texture type.");
    }
}

// GlGenericDrawer.drawOes.prepareShader会触发ShaderCallbacks.onPrepareShader
@Override
public void onPrepareShader(GlShader shader, float[] texMatrix, int frameWidth, int frameHeight,
                            int viewportWidth, int viewportHeight) {
    GLES20.glUniform4fv(coeffsLoc, /* count= */ 1, coeffs, /* offset= */ 0);
    // Matrix * (1;0;0;0) / (width / stepSize). Note that OpenGL uses column major order.
    GLES20.glUniform2f(
            xUnitLoc, stepSize * texMatrix[0] / frameWidth, stepSize * texMatrix[1] / frameWidth);
}

The code flow is as follows: Obviously this setpSize has a great relationship with renderMartix, in which renderMartix has a note // Produce a frame buffer starting at top-left corner, not bottom-left. It means "start generating from the upper left corner instead of the lower left corner. Frame buffer", combined with the most original texture coordinates FULL_RECTANGLE_TEXTURE_BUFFER passed in by GlGenericDrawer

// Texture coordinates - (0, 0) is bottom-left and (1, 1) is top-right.
private static final FloatBuffer FULL_RECTANGLE_TEXTURE_BUFFER =
        GlUtil.createFloatBuffer(new float[]{
                0.0f, 0.0f, // Bottom left.
                1.0f, 0.0f, // Bottom right.
                0.0f, 1.0f, // Top left.
                1.0f, 1.0f, // Top right.
        });

Due to space constraints, (please go here when it comes to the principle of matrix multiplication ) the operation process and results are given directly here:

//preTranslate的translate矩阵具体值,通过Androidrexf.com查找到的源码
static float[] setTranslate(float[] dest, float dx, float dy) {
    dest[0] = 1;
    dest[1] = 0;
    dest[2] = dx;
    dest[3] = 0;
    dest[4] = 1;
    dest[5] = dy;
    dest[6] = 0;
    dest[7] = 0;
    dest[8] = 1;
    return dest;
}
//preScale的scale矩阵具体值,通过Androidrexf.com查找到的源码
static float[] getScale(float sx, float sy) {
    return new float[] { sx, 0, 0, 0, sy, 0, 0, 0, 1 };
}

0.0f, 0.0f, 0.0                             1, 0, 0.5    0, 0, 0
1.0f, 0.0f, 0.0 preTranslate(0.5f, 0.5f) 即 0, 1, 0.5 =  1, 0, 0.5
0.0f, 1.0f, 0.0                             0, 0, 1      0,1, 0.5
1.0f, 1.0f, 0.0                                          1, 1, 1

0, 0, 0                       1, 0,  0   0, 0, 0
1, 0, 0.5 preScale(1f, -1f)即 0, -1, 0 = 1, 0, 0.5
0,1, 0.5                     0, 0,  1   0, -1,0.5 
1, 1, 1                                  1, -1, 1 

0, 0, 0                                              0, 0, 0
1, 0, 0.5 preTranslate(-0.5f, -0.5f)    1, 0, -0.5   1, 0, 0
0, -1,0.5                           即  0, 1, -0.5 = 0, -1, 1
1, -1, 1                                0, 0, 1      1, -1, 1

After the renderMatrix conversion, the texture coordinates are indeed flipped downwards.

Then, DrawY's setpSize = 1; DrawU's setpSize = 2; DrawV's setpSize = 2; and the collection size is square (all except frameWidth), combined with the YUV 4:2:0 sampling theory (as shown in the figure below , Y The component is represented by a cross, and the UV component is represented by a circle. ) In fact, xUnit/setpSize is the sampling step valve.

 Then there is another question to note, why the shader outputs four rgba sampling points. The texture sampling points are (tc - 1.5 * xUnit) (tc - 0.5 * xUnit) (tc + 0.5 * xUnit) (tc + 1.5 * xUnit) ) is not (tc - 2 * xUnit) (tc - 1 * xUnit) (tc + 1 * xUnit) (tc + 2 * xUnit)? This takes the center color value of each texel as the sampled color value.

Okay, the key points have been explained. The remaining points that need attention are: the determination of the viewport and window size when drawingU and drawV; the encapsulation and use of JavaI420Buffer; another point to talk about is that there is no combination of double pbo buffer readpixel (students who don’t understand what they are talking about can Baidu on their own, there are many Teaching article.) Skillfully convert rgba into yuv420 draw, which belongs to the idea of ​​​​GPGPU, and everyone can understand it in detail.

The next chapter explores the useful knowledge points of HardwareVideoEncoder, and then dives into PeerConnectionFactory.createPeerConnection.

Guess you like

Origin blog.csdn.net/a360940265a/article/details/121532966