Basic knowledge of H.264 video coding

1. The development history of
   video coding technology     Video coding technology is basically the introduction of two series of video coding international standards, MPEG-x formulated by ISO/IEC and H.26x formulated by ITU-T. From H.261 video coding suggestions to H.262/3, MPEG-1/2/4, etc., there is a common goal that is continuously pursued, that is, to obtain as much as possible under the lowest possible bit rate (or storage capacity) Good image quality. Moreover, as the market's demand for image transmission increases, the problem of how to adapt to the transmission characteristics of different channels has become increasingly apparent. So IEO/IEC and ITU-T, two major international standardization organizations, jointly formulated the new video standard H.264 to solve these problems.
    H.261 is the earliest video coding suggestion that aims to standardize video coding technology in conference TV and videophone applications on ISDN networks. The algorithm it uses combines a hybrid coding method that can reduce temporal redundancy with inter-frame prediction and reduce spatial redundancy with DCT transform. Matching with ISDN channel, its output code rate is p×64kbit/s. When the value of p is small, only images with low definition can be transmitted, which is suitable for face-to-face TV calls; when the value of p is large (such as p>6), conference TV images with better definition can be transmitted. H.263 suggests a low bit rate image compression standard, which is technically an improvement and expansion of H.261, and supports applications with a bit rate less than 64kbit/s. But in fact H.263 and later H.263+ and H.263++ have been developed to support full bit rate applications. It can be seen from the fact that it supports many image formats, such as Sub-QCIF, QCIF , CIF, 4CIF and even 16CIF and other formats.
    The code rate of the MPEG-1 standard is about 1.2Mbit/s, and it can provide 30 frames of CIF (352×288) quality images. It is designed for CD-ROM video storage and playback. The basic algorithm of the MPEG-l standard video coding part is similar to H.261/H.263, and measures such as motion-compensated inter-frame prediction, two-dimensional DCT, and VLC run-length coding are also adopted. In addition, concepts such as intra frame (I), predictive frame (P), bidirectional predictive frame (B) and DC frame (D) are introduced to further improve the coding efficiency. On the basis of MPEG-1, the MPEG-2 standard has made some improvements in improving image resolution and compatibility with digital TV. For example, the accuracy of its motion vector is half pixel; in coding operations (such as motion estimation and DCT) Distinguish "frame" and "field"; introduce coding scalability technologies, such as spatial scalability, temporal scalability, and signal-to-noise ratio scalability. The MPEG-4 standard introduced in recent years has introduced coding based on audio-visual objects (AVO: Audio-Visual Object), which has greatly improved the interactive capabilities and coding efficiency of video communications. MPEG-4 has also adopted some new technologies, such as shape coding, adaptive DCT, arbitrary shape video object coding, etc. But the basic video encoder of MPEG-4 still belongs to a kind of hybrid encoder similar to H.263.
    In short, the H.261 recommendation is a classic video coding, H.263 is its development, and will gradually replace it in practice, mainly used in communications, but the numerous options of H.263 often make users at a loss. The MPEG series of standards have evolved from applications for storage media to applications that adapt to transmission media. The basic framework of its core video coding is consistent with H.261. The eye-catching "object-based coding" of MPEG-4 is partly due to There are technical obstacles, and it is difficult to apply universally. Therefore, the new video coding proposal H.264 developed on this basis overcomes the weaknesses of the two, introduces a new coding method under the framework of hybrid coding, improves coding efficiency, and faces practical applications. At the same time, it was jointly formulated by the two major international standardization organizations, and its application prospects should be self-evident.

Two, H.264 introduction
  H.264 is a new digital video coding standard developed by the joint video team (JVT: joint video team) of ITU-T's VCEG (Video Coding Expert Group) and ISO/IEC's MPEG (Moving Picture Coding Expert Group). It is part 10 of ITU-T's H.264 and ISO/IEC's MPEG-4. The solicitation of drafts started in January 1998. The first draft was completed in September 1999. The test model TML-8 was developed in May 2001. The FCD board of H.264 was passed at the 5th meeting of JVT in June 2002. . Officially released in March 2003.
    Like the previous standard, H.264 is also a hybrid coding mode of DPCM plus transform coding. However, it adopts a simple design of "return to basics", without many options, and obtains much better compression performance than H.263++; strengthens the adaptability to various channels, adopts a "network-friendly" structure and syntax, Conducive to the processing of errors and packet loss; a wide range of application targets to meet the needs of different speeds, different resolutions and different transmission (storage) occasions; its basic system is open, and no copyright is required for use.
    Technically, there are many highlights in the H.264 standard, such as unified VLC symbol coding, high-precision, multi-mode displacement estimation, integer transformation based on 4×4 blocks, and layered coding syntax. These measures make H.264 algorithm have very high coding efficiency, under the same reconstructed image quality, it can save about 50% of the code rate than H.263. H.264's code stream structure has strong network adaptability, increases error recovery capabilities, and can well adapt to IP and wireless network applications.

3. Technical highlights of H.264
1. Layered design
  The H.264 algorithm can be conceptually divided into two layers: the video coding layer (VCL: Video Coding Layer) is responsible for efficient video content representation, and the network abstraction layer (NAL: Network Abstraction Layer) is responsible for the appropriate way required by the network Pack and transmit data. A packet-based interface is defined between VCL and NAL, and packaging and corresponding signaling are part of NAL. In this way, the tasks of high coding efficiency and network friendliness are completed by VCL and NAL respectively.
    The VCL layer includes block-based motion compensation hybrid coding and some new features. Like the previous video coding standards, H.264 does not include functions such as pre-processing and post-processing in the draft, which can increase the flexibility of the standard.
    NAL is responsible for using the segmentation format of the lower layer network to encapsulate data, including framing, logical channel signaling, timing information utilization, or sequence end signal, etc. For example, NAL supports the transmission format of video on the circuit-switched channel, and supports the format of video transmission on the Internet using RTP/UDP/IP. NAL includes its own header information, segment structure information, and actual load information, that is, the upper layer VCL data. (If data segmentation technology is used, the data may consist of several parts).


2. High-precision, multi-mode motion estimation
   H.264 supports motion vectors with 1/4 or 1/8 pixel precision. At 1/4 pixel accuracy, a 6-tap filter can be used to reduce high-frequency noise. For motion vectors with 1/8 pixel accuracy, a more complex 8-tap filter can be used. When performing motion estimation, the encoder can also select "enhanced" interpolation filters to improve the effect of prediction.
    In H.264 motion prediction, a macro block (MB) can be divided into different sub-blocks as shown in Figure 2 to form block sizes of 7 different modes. This multi-mode flexible and detailed division is more suitable for the shape of the actual moving objects in the image, which greatly improves the accuracy of motion estimation. In this way, 1, 2, 4, 8, or 16 motion vectors can be included in each macro block.
    In H.264, the encoder is allowed to use more than one previous frame for motion estimation, which is the so-called multi-frame reference technology. For example, if 2 or 3 frames are just coded reference frames, the encoder will select a better prediction frame for each target macroblock, and indicate for each macroblock which frame is used for prediction.


3. Integer transformation of 4×4 blocks
    H.264 is similar to the previous standard. It uses block-based transformation coding for the residual, but the transformation is integer operation instead of real number operation, and its process is basically similar to DCT. The advantage of this method is that the same precision transformation and inverse transformation are allowed in the encoder and the decoder, and it is convenient to use simple fixed-point operations. In other words, there is no "inverse transformation error". The unit of transformation is 4×4 blocks, instead of 8×8 blocks commonly used in the past. As the size of the transform block is reduced, the division of the moving object is more accurate, so that not only the transformation calculation amount is smaller, but the convergence error at the edge of the moving object is also greatly reduced. In order to make the small-size block transformation method not produce the grayscale difference between the blocks in the larger smooth area in the image, the DC coefficient of 16 4×4 blocks of the intra-frame macroblock brightness data (each small block One, a total of 16) performs the second 4×4 block transformation, and performs 2×2 block transformation on the DC coefficients of 4 4×4 blocks of chrominance data (one for each small block, 4 in total).
    In order to improve the rate control ability of H.264, the change of the quantization step size is controlled at about 12.5%, instead of a constant increase. The normalization of the transform coefficient amplitude is processed in the inverse quantization process to reduce the computational complexity. In order to emphasize the fidelity of color, a smaller quantization step size is adopted for the chrominance coefficient.


4. Unified VLC
    H.264 entropy coding has two methods, one is to use unified VLC (UVLC: Universal VLC) for all symbols to be coded, and the other is to use content adaptive binary arithmetic coding ( CABAC: Context-Adaptive Binary Arithmetic Coding). CABAC is optional, and its coding performance is slightly better than UVLC, but the computational complexity is also higher. UVLC uses a code word set of unlimited length, and the design structure is very regular, and different objects can be coded with the same code table. This method can easily generate a codeword, and the decoder can easily identify the prefix of the codeword, and UVLC can quickly obtain resynchronization when a bit error occurs.


5. Intra-frame prediction
    is the method of inter- frame prediction in the previous H.26x series and MPEG-x series standards. In H.264, intra-frame prediction is available when encoding Intra images. For each 4×4 block (except for the special treatment of the edge block), each pixel can be predicted with the different weighted sum of the 17 closest previously encoded pixels (some weights can be 0), that is, this pixel 17 pixels in the upper left corner of the block. Obviously, this kind of intra-frame prediction is not in time, but a predictive coding algorithm performed in the spatial domain, which can remove the spatial redundancy between adjacent blocks and achieve more effective compression.
    
6. The
    H.264 draft for IP and wireless environments includes tools for error elimination to facilitate the transmission of compressed video in an environment with frequent errors and packet loss, such as the robustness of transmission in mobile channels or IP channels.
    In order to resist transmission errors, time synchronization in the H.264 video stream can be accomplished by using intra-frame image refresh, and spatial synchronization is supported by slice structured coding. At the same time, in order to facilitate resynchronization after a bit error, a certain resynchronization point is also provided in the video data of an image. In addition, intra-frame macroblock refresh and multiple reference macroblocks allow the encoder to consider not only coding efficiency, but also the characteristics of the transmission channel when determining the macroblock mode.
    In addition to using the change of the quantization step size to adapt to the channel code rate, in H.264, the method of data segmentation is often used to deal with the change of the channel code rate. Generally speaking, the concept of data segmentation is to generate video data with different priorities in the encoder to support the quality of service QoS in the network. For example, a syntax-based data partitioning method is adopted to divide the data of each frame into several parts according to its importance, which allows the less important information to be discarded when the buffer overflows. A similar temporal data partitioning method can also be used, which is accomplished by using multiple reference frames in P and B frames.
    In the application of wireless communication, we can support the large bit rate change of the wireless channel by changing the quantization precision or space/time resolution of each frame. However, in the case of multicast, it is impossible to require the encoder to respond to varying bit rates. Therefore, unlike the FGS (Fine Granular Scalability) method used in MPEG-4 (with lower efficiency), H.264 uses stream switching SP frames instead of hierarchical coding.

Fourth, H.264 performance comparison
    TML-8 is the test mode of H.264, use it to compare and test the video coding efficiency of H.264. The PSNR provided by the test results has clearly shown that compared to the performance of MPEG-4 (ASP: Advanced Simple Profile) and H.263++ (HLP: High Latency Profile), the results of H.264 have obvious advantages.
    The PSNR of H.264 is obviously better than that of MPEG-4 (ASP) and H.263++ (HLP). In a comparison test of 6 rates, the PSNR of H.264 is 2dB higher than MPEG-4 (ASP) on average. It is 3dB higher than H.263 (HLP) on average. The 6 test rates and their related conditions are: 32 kbit/s rate, 10f/s frame rate and QCIF format; 64 kbit/s rate, 15f/s frame rate and QCIF format; 128kbit/s rate, 15f/s Frame rate and CIF format; 256kbit/s rate, 15f/s frame rate and QCIF format; 512 kbit/s rate, 30f/s frame rate and CIF format; 1024 kbit/s rate, 30f/s frame rate and CIF format.

Guess you like

Origin blog.csdn.net/ccsss22/article/details/108740767