Audio and video codec summary (internship rotation)

Audio and video codec summary

  The essence of codec is compression and decompression, and everything revolves around how to achieve low-resource storage and transmission. In this process, it is necessary to find a balance between lossy compression and quality assurance.

insert image description here


  Video is composed of continuous images, RGBand YUVboth are used as representation methods of images. RGBAny visible color is formed by combining three colors with different intensities. YUVAccording to the characteristic that the human eye is less sensitive to light and chromaticity, one channel is used Yto store luminance information, and two channels UVare used to store chromaticity information. During the process, chromaticity data can be properly compressed, which is lossy, but To a certain extent, it will not affect the human eye's perception of image quality. Specifically, it is YUVthe chroma sub-sampling mechanism that allows multiple luma pixels to share one chroma pixel, thus compressing the chroma data. When restoring (rendering), the chroma resolution can be pulled back to the same brightness. (Under the same conditions, although YUVthe storage form of + sub-sampling will RGBsave space, it is still far from meeting the storage and transmission requirements)

  RGBThe existence of is because almost all current input and output devices only support in essence RGB, and YUVthe data in the format needs to be indirectly supported after internal processing. In addition , there is a conversion formula between RGBand YUV, and matrix multiplication can be used in the program to operate.


Further YUVcompress the data (encode)

insert image description here

  Intra prediction is to compress spatially redundant data, that is, many adjacent pixels in an image are often similar. Specifically, for example H264, a frame of image is divided into macroblocks, and the default luma 16×16, chroma 8×8, luma and chroma are predicted independently. However, if the data difference in a macroblock is too large, it will continue to be flexibly divided into smaller sub-blocks. Afterwards, the adjacent macroblocks can be predicted (predict the part that has not been coded for the part that has already been coded), using the left adjacent column and the upper adjacent row of the current macroblock as a reference, and select an appropriate prediction mode at the same time. In this way, it is only necessary to encode the difference (residual error) between the actual value and the predicted value, and the residual error is generally relatively small.

  Inter-frame prediction is to compress time redundant data, that is, the content of two adjacent frames will hardly differ much, and most of them are just some image displacements (unless it is a switch between scenes). Specifically, it is to find a macroblock that is very similar to itself in the previous reference frame, and then add vector information for encoding, without encoding all the information of the current macroblock.

  The reference frame is divided into I、P、B. I帧If it is a key frame, the rendering can be restored without any other frames, and only intra-frame compression can be performed, and the compression rate is low. P帧Need to refer to the previous Ior P帧restore. The B frame needs to refer to the reference frames in the front and rear directions for restoration, and has the highest compression rate, and P帧the same intra-frame and inter-frame prediction techniques can be used.

  GOP: Several frames of a scene, the first frame is I帧, because in a scene, the changes of adjacent frames are not particularly obvious, so they are classified as a group. But if the time is too long, it should be divided into multiple parts in order to reduce the error GOP.

  buffer: Because inter-frame prediction needs to refer to the coded frame, and a reference frame may be the object of reference by multiple other frames, at this time it cannot be said that it can be reconstructed once with one use, so the operation is repeated. Therefore, it is necessary to cache the encoded and reconstructed frames to be used as reference frames for subsequent encoded frames. The reason why the original frame is not used as a reference is that the division of the original frame and the encoded and reconstructed frame macroblocks are not completely consistent, and only the encoded and reconstructed frame is used as a reference during decoding, and there is no original frame. In order to benchmark the decoding process, the original frame cannot be used as a reference.

  Timestamp DTSand PTS: the former is the decoding time, the latter is the display time. If not B帧, I帧both and P帧are displayed after decoding. But after having B帧it, I帧it will still be displayed after decoding, but P帧it may be necessary to wait for the previous B帧decoding to be displayed after decoding, so in this case, there will be a time difference between DTSand .PTS

  Exchange DCT: The place where the gray value in the image changes slowly is called low frequency, such as the contour part. On the contrary, it is called high frequency, such as image edges and noise. Because the visual sensitivity of the human eye is limited, properly eliminating some high-frequency information has little effect on vision. After a macroblock is exchanged, the high-frequency information is placed in the lower right corner, and the low-frequency information is placed in the upper left corner.

  Quantization: It is a division method. QPThe larger the quantization value, the higher the quantization granularity, the larger the compression rate, the lower the bit rate, and the lower the video quality. The mosaic is larger, the picture is not delicate, and the picture is blurred. On the contrary, the compression rate is low, the bit rate is high, the quality is high, the picture is delicate, and the details are rich. In the end, the lower right corner of the macroblock is basically 0.

  Entropy coding: remove information entropy redundancy, because the lower right corner of the previously quantized data is basically 0, in order to make the final code stream more continuous 0and easy to compress, a reordering mechanism ( 'Z'glyph) is adopted. Afterwards, a coding algorithm is adopted, including fixed-length coding and variable-length coding. Finally output 01the code stream of the string.


  Different packaging formats use different containers to store audio and video streams (frames) in different arrangements, and need to be decapsulated when used.

  

notes:

How to install FFmpeg under Ubuntu - Lu Zelin's Blog - CSDN Blog

Basic knowledge of audio and video entry - Lu Zelin's Blog - CSDN Blog

H264 Basic Concepts (Getting Started)_Lu Zelin's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/qq_40342400/article/details/129730043