From I frame to B frame, H.264 encoding technology builds a visual feast with perfect combination of picture and effect for you

H264 frame encoding

H.264, also known as MPEG-4 AVC (Advanced Video Coding), is an efficient video coding standard for compressing and storing video. H.264 utilizes advanced technologies such as predictive coding and transform coding, and its coding process is similar to ordinary video coding, mainly including steps such as frame type determination, motion estimation, transform coding, and entropy coding.

H.264 frame encoding process:

Frame type judgment

First, the H.264 encoder needs to determine which type the current frame is: I-frame, P-frame, or B-frame. Likewise, I frames are key frames, and P and B frames are predictive frames.

motion estimation

For P and B frames, the H.264 encoder needs motion estimation to find the motion information between the current frame and the previous frame. During motion estimation, the frame to be encoded is compared with the previous reference frame of a certain length to detect the motion difference between the two frames.

transform coding

The H.264 encoder inputs the motion-estimated difference pixel block (residual block) to a discrete cosine transformer (DCT), which converts spatial information into frequency domain information. The size of the DCT block can be 4x4, 8x8 or other sizes to match the compression needs of different video frames. Transform coding can effectively remove local redundancy in video frames.

Quantify

For the residual block after DCT transformation, the H.264 encoder will quantize it. Quantization reduces the amount of data by dividing the block's value by a predefined value and rounding. Different quantization matrices can be used for different video frames to adjust the balance between video quality and compression ratio.

entropy coding

Finally, the H.264 encoder outputs the quantized data to an entropy encoder to further reduce the amount of data. Entropy coding is a technique that converts data sequences into short codes, which can reduce the amount of data by using shorter codes for frequently occurring data sequences. H.264 uses CABAC or CAVLC (Context-based Adaptive Variable Length Coding) isentropic coding scheme.

I frame P frame B frame encoding process

I-frames, P-frames, and B-frames are three frame types commonly used in video coding to compress and store video. Their encoding process is as follows:

I frame encoding process

An I (Intra-picture) frame can be understood as a key frame, which is the first frame or key frame in a video sequence. The coding of each I-frame is independent, it contains data of all pixels, and has no dependence on other frames. Therefore, the compression ratio of I frames cannot be played well.

The encoding process of I frame is as follows:

  1. Image segmentation: I-frame first divides the original image into several small blocks.
  2. DCT transform: For each block, it is subjected to a discrete cosine transform (DCT) to convert it into a frequency domain signal.
  3. Quantization: For each block, the frequency domain coefficients after DCT transformation are quantized. Quantization is also one of the key steps in compressing images.
  4. Entropy coding: The quantized data is encoded using entropy coding technology to further reduce the amount of data. The result after encoding is the output of I frame.

P frame encoding process

A P (Predicted picture) frame is a predicted frame, which is encoded based on reference pixels from the previous frame. The P frame only encodes the pixel difference from the previous frame to reduce the redundancy of video data in the temporal dimension.

The encoding process of P frame is as follows:

  1. Motion estimation: First, use the motion estimation algorithm to compare the previous frame with the current frame to obtain motion information between the two frames.
  2. Inter-frame prediction: Use the obtained motion information to obtain the pixel value at the corresponding position in the previous frame, so as to predict the current frame.
  3. Residual coding: compare the image predicted by the predicted frame with the actual image to get the difference between the two images. These differences are encoded and P-frames are output.

B frame encoding process

The B (Bidirectionally predictive picture) frame is a bidirectionally predictive frame, which is estimated and encoded by the pixel values ​​​​of the two key frames before and after. The B frame also only encodes the pixel difference between the preceding and preceding frames, and reduces the amount of video data in the future.

The encoding process of B frame is as follows:

  1. Motion Estimation: B-frames usually require motion estimation from previous and subsequent frames.
  2. Inter-frame prediction: Refer to the two key frames before and after to perform inter-frame prediction on the current frame.
  3. Residual coding: Same as P frame, compare the image predicted by the predicted frame with the actual image to get the difference between the two images. These differences are encoded and B-frames are output.

Code example analysis of I frame P frame B frame encoding process

I frame, P frame, and B frame are important concepts of video frame coding, and are also the core technology of H.264 video coding. They play different roles in video encoding and have different effects on video quality and compression ratio. The following is a sample analysis of the encoding process code of I frame, P frame, and B frame using FFmpeg for video encoding.

#include <stdlib.h>
#include <stdio.h>
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
#include <libavutil/imgutils.h>
// 帧频
#define FRAME_RATE 25
// 视频宽高
#define VIDEO_WIDTH 640
#define VIDEO_HEIGHT 480
#define VIDEO_PIX_FMT AV_PIX_FMT_YUV420P
int main(int argc, char **argv)
{
    // 初始化FFmpeg
    av_register_all();
    // 分配AVFormatContext和AVOutputFormat
    AVFormatContext *pFormatCtx;
    avformat_alloc_output_context2(&pFormatCtx, NULL, NULL, "output.mp4");
    // 查找视频编码器(H.264)
    AVCodec *pCodec;
    pCodec = avcodec_find_encoder(AV_CODEC_ID_H264);
    if (!pCodec)
    {
        fprintf(stderr, "Codec not found\n");
        exit(1);
    }
    // 分配AVStream
    AVStream *pStream;
    pStream = avformat_new_stream(pFormatCtx, NULL);
    if (!pStream)
    {
        fprintf(stderr, "Could not allocate stream\n");
        exit(1);
    }
    // 设置编码器参数
    AVCodecContext *pCodecCtx;
    pCodecCtx = avcodec_alloc_context3(pCodec);
    if (!pCodecCtx)
    {
        fprintf(stderr, "Could not allocate codec context\n");
        exit(1);
    }
    pCodecCtx->bit_rate = 400000;
    pCodecCtx->width = VIDEO_WIDTH;
    pCodecCtx->height = VIDEO_HEIGHT;
    pCodecCtx->time_base = (AVRational){1, FRAME_RATE};
    pCodecCtx->framerate = (AVRational){FRAME_RATE, 1};
    pCodecCtx->gop_size = 10;
    pCodecCtx->max_b_frames = 1;
    if (pFormatCtx->oformat->flags & AVFMT_GLOBALHEADER)
    {
        pCodecCtx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
    }
    // 打开编码器并写入头文件
    if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0)
    {
        fprintf(stderr, "Could not open codec\n");
        exit(1);
    }
    if (avcodec_parameters_from_context(pStream->codecpar, pCodecCtx) < 0)
    {
        fprintf(stderr, "Could not copy the codec parameters to the output stream\n");
        exit(1);
    }
    if (avformat_write_header(pFormatCtx, NULL) < 0)
    {
        fprintf(stderr, "Error occurred when opening output file\n");
        exit(1);
    }
    // 准备输入的数据
    AVFrame *src_frame;
    uint8_t *src_data[4];
    int src_linesize[4];
    int ret;
    src_frame = av_frame_alloc();
    if (!src_frame)
    {
        fprintf(stderr, "Could not allocate source video frame\n");
        exit(1);
    }
    src_frame->format = VIDEO_PIX_FMT;
    src_frame->width = VIDEO_WIDTH;
    src_frame->height = VIDEO_HEIGHT;
    ret = av_image_alloc(src_frame->data, src_frame->linesize, VIDEO_WIDTH, VIDEO_HEIGHT, VIDEO_PIX_FMT, 16);
    if (ret < 0)
    {
        fprintf(stderr, "Could not allocate source image\n");
        exit(1);
    }
    src_data[0] = src_frame->data[0];
    src_data[1] = src_frame->data[1];
    src_data[2] = src_frame->data[2];
    src_linesize[0] = src_frame->linesize[0];
    src_linesize[1] = src_frame->linesize[1];
    src_linesize[2] = src_frame->linesize[2];
    // 编码视频帧
    int64_t pts = 0;
    int i;
    for (i = 0; i < 100; i++)
    {
        // 准备一帧YUV420P数据
        int j, k;
        for (j = 0; j < VIDEO_HEIGHT; j++)
        {
            for (k = 0; k < VIDEO_WIDTH; k++)
            {
                src_data[0][j * src_linesize[0] + k] = (uint8_t)(j + i * 3);
            }
        }
        for (j = 0; j < VIDEO_HEIGHT / 2; j++)
        {
            for (k = 0; k < VIDEO_WIDTH / 2; k++)
            {
                src_data[1][j * src_linesize[1] + k] = (uint8_t)(j + i * 2);
                src_data[2][j * src_linesize[2] + k] = (uint8_t)(j + i * 5);
            }
        }
        // 编码一帧数据
        AVPacket pkt;
        av_init_packet(&pkt);
        pkt.data = NULL;
        pkt.size = 0;
        src_frame->pts = pts++;
        ret = avcodec_send_frame(pCodecCtx, src_frame);
        if (ret < 0)
        {
            fprintf(stderr, "Error sending a frame for encoding\n");
            exit(1);
        }
        while (ret >= 0)
        {
            ret = avcodec_receive_packet(pCodecCtx, &pkt);
            if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
            {
                break;
            }
            if (ret < 0)
            {
                fprintf(stderr, "Error during encoding\n");
                exit(1);
            }
            // 写入帧数据
            pkt.stream_index = pStream->index;
            av_packet_rescale_ts(&pkt, pCodecCtx->time_base, pStream->time_base);
            ret = av_interleaved_write_frame(pFormatCtx, &pkt);
            if (ret < 0)
            {
                fprintf(stderr, "Error while writing video frame\n");
                exit(1);
            }
            av_packet_unref(&pkt);
        }
    }
    // 输入数据完成,写入尾部文件
    av_write_trailer(pFormatCtx);
    // 清理工作
    avcodec_close(pCodecCtx);
    avcodec_free_context(&pCodecCtx);
    av_frame_free(&src_frame);
    avio_closep(&pFormatCtx->pb);
    avformat_free_context(pFormatCtx);
    return 0;
}

This sample code demonstrates how to use FFmpeg to encode I frames, P frames, and B frames, which mainly involves the following steps:

  1. Find Encoder (H.264)
  2. Allocate AVStream and AVCodecContext, and set encoder parameters
  3. Open the encoder and write the header file
  4. data ready to enter
  5. encode video frames
  6. write frame data
  7. The input data is completed and written to the tail file

In this sample code, the encoding and output of I frame, P frame and B frame are completed through avcodec_send_frame() and avcodec_receive_packet() functions. Among them, whether the output frame type is a key frame (I frame) is judged by pkt.flags. Depending on the frame type, the encoding method is also different. The I frame is a key frame and does not depend on other frames; the P frame is a forward predictive frame and depends on the previous key frame or the previous P frame; the B frame is a bidirectional predictive frame and depends on the preceding and following key frames or P frames. Therefore, when encoding a video frame, it is necessary to judge the frame type and then perform corresponding encoding to ensure that the final output video stream is of high quality.

summary

The main content includes SystemUI car volume control precautions, marketing article title of SystemUI car volume control, and H.264 frame encoding process.

When using SystemUI to realize car volume control, it is necessary to pay attention to permissions, volume type, volume range, synchronization update notification, improved adaptability and two-way synchronization, etc., and provide the best user experience.

The title of the marketing article on SystemUI car volume control can be matched with exquisite pictures and detailed introductions to attract users' attention and make users more willing to learn about this function and try to use it.

The frame coding process of H.264 includes steps such as frame type determination, motion estimation, transform coding, quantization and entropy coding. Through the coding of I frame, P frame and B frame, H.264 can accurately predict and describe the difference between video frames, and achieve the purpose of efficiently compressing and storing video. Optimizing every step of the encoder can result in better video quality and higher compression rates.

For more learning materials about audio and video, please refer to "Audio and Video Development from 0 to 1 Mastery Manual", which records hundreds of technical knowledge points and 7 sections to help you quickly enter the audio and video field.

Summarize

Through the coding of I frame, P frame and B frame, H.264 can accurately predict and describe the difference between video frames, so as to achieve the purpose of efficiently compressing and storing video. The technology used by the H.264 encoder is very complex, and each step has many details, and the encoder can be tuned and optimized according to the actual application scenario.

Guess you like

Origin blog.csdn.net/m0_71524094/article/details/130548672