Chapter 1: Basic knowledge of audio and video
It is recommended to read before reading this chapter: FFmpeg+SDL-----Syllabus
table of Contents
• Preface
• Principle of video player: processing flow, and introduce the flow one by one
• Packaging format (MP4, RMVB, TS, FLV, AVI)
• Video coding data (H.264, MPEG2, VC-1)
• Audio coding Data (AAC, MP3, AC-3)
• Video pixel data (YUV420P, RGB): Data sent to the graphics card for display
• Audio sampling data (PCM)
• Practice
The principle of video player, the process of playing a video file:
The function of the encapsulation format: pack the video and audio together, combine them into one file for transmission, and decapsulate them to separate them.
Decoding: Generate data that can be recognized by the display.
Some common visualization tools commonly used in the development and learning process:
- Common players
- Cross-platform series (non-DirectShow framework): VLC, Mplayer, ffplay...
- Windows series (DirectShow framework): perfect decoding, ultimate decoding, Baofengyingyin...
- Information Viewing Tool
- Comprehensive information view: MediaInfo
- Binary information view: UltraEdit
- Detailed analysis of individual items
- Package format: Elecard Format Analyzer
- Video encoding data: Elecard Stream Eye
- Video pixel data: YUV Player
- Audio sample data: Adobe Audition
MediaInfo
As shown in the figure below: Open an mkv file, display comprehensive video-related information, video length, audio, video encoding, pixels, frame rate, sampling rate, etc.
Package format
1. The function of the encapsulation format: the video code stream and the audio code stream are stored in a file according to a certain format.
2. Package format analysis tool: Elecard Format Analyzer
Introduction to MPEG2-TS format The
file header is not included. The TS Packet with a fixed data size (188Byte) is composed of packets one by one into the cable TV network for transmission. The advantage of this format is that there is no file header. Even if there is an error in the front or the back, the video can be played normally.
Introduction to FLV format
Contains the file header. The data is composed of tags of variable size, and once the header file is damaged, it cannot be played.
Video encoding data
1. The role of video encoding: Compress the video pixel data (RGB, YUV, etc.) into a video stream, thereby reducing the amount of video data.
2. Video coding analysis tool: Elecard Stream Eye
is the interface for operation. The above is the data of the corresponding frame. All the videos are divided into grid-like intervals. This is the basic unit of coding, inside the grid. There is also a small grid, which is more complicated to judge. If it is complicated, more detailed coding will be carried out.
Red frame: I frame (direct compression, independent of other images); blue frame: P frame; green frame: B frame, the lines represent the motion vector.
3. Video encoding format:
4. Introduction to H.264 format
- The data is composed of NALUs of variable size
- In the most common case, 1 NALU stores the compressed and encoded data of 1 frame of picture
5. H.264 compression method
- quite complicated. Contains intra-frame prediction, inter-frame prediction, entropy coding, loop filtering and other links. This course does not give too much introduction to the algorithms.
- Image data can be compressed more than 100 times.
Audio coded data
1. The role of audio coding: compress audio sample data (PCM, etc.) into an audio code stream, thereby reducing the amount of audio data. (Audio coding is not as important as video coding, because audio data is not as big as video data)
2. Audio coding analysis tool: Not involved yet.
3. Introduction to AAC format: data is composed of ADTS with variable size
4. AAC compression method
- quite complicated. This course does not give too much introduction to the algorithms.
- The audio data can be compressed more than 10 times.
Video pixel data
If you want to fully understand this convenient knowledge, you can read: vector diagram, bitmap, dot matrix, RGB, YUV
1. Video pixel data function: save the pixel value of each pixel on the screen.
2. Format: Common pixel data formats are RGB24, RGB32, YUV420P, YUV422P, YUV444P, etc. The pixel data in the YUV format is generally used in compression coding, and the most common format is YUV420P.
3. Features: The volume of video pixel data is very large. Generally, the data volume of RGB24 format of 1 hour high-definition video is:
3600*25*1920*1080*3=559.9GByte // PS:这里假定帧率为25Hz,取样精度8bit。
4. YUV format pixel data viewing tool: YUV Player
5. Introduction to RGB format:
- The three colors of Red, Green and Blue can be mixed into all the colors in the world.
- Each point in a color image is composed of three components: R, G, and B.
- Taking RGB24 as an example, the storage method of image pixel data is as follows:
ps: The pixel data in RGB format is stored in the BMP file.
Introduction to YUV format
Related experiments show that the human eye is sensitive to brightness but not to chromaticity. Therefore, the luminance information and the chrominance information can be separated, and a more "ruthless" compression scheme can be adopted for the chrominance information, thereby improving the compression efficiency.
YUV data viewing tool: YUVPlayer
Audio sample data
1. Audio sampling data function: save the value of each sampling point in the audio. The kind of floating waveform we see is analog data that cannot be displayed on a computer, so sampling is required.
2. Features: The volume of audio sampling data is very large. In general, the volume of a 4-minute PCM format song is:
4 * 60 * 44100 * 2 * 2 = 42.3MByte
PS: It is assumed that the sampling rate is 44100 Hz (human ears only Can hear this half), the sampling accuracy is 16bit.
3. Audio sampling data viewing tool: Adobe Audition
4. Introduction to PCM format
▫ In the case of mono, the data of each sampling point is stored in order (it looks like the image is a continuous arc, but the wireless amplification is actually a lot Discrete sampling points).
▫ In the case of two channels, the data of two channels for each sampling point are stored in the order of "left and right, left and right".