[FFmpeg Video Player Development] Decapsulation and decoding process, common API and structure introduction (1)

I. Introduction

Before officially writing the FFmpeg player, we need to briefly understand the FFmpeg library, playback and decoding process, functions and related structures to be used.

2. Introduction to FFmpeg library

library introduce
avcodec Audio and video codec core library
avformat Encapsulation and analysis of audio and video container formats
on the helper core tool library
swscal Image format conversion module
swresample audio resampling
defilter Audio and video filter library such as video watermarking, audio voice change
avdevice Input and output device library, providing input and output of device data

FFmpeg relies on the above libraries to realize powerful audio and video encoding, decoding, editing, conversion, and acquisition capabilities. The realization of video playback here is useless except for the avfilter library.

3. FFmpeg playback process

Usually, video files such as MP4, MKV, FLV, etc. belong to the encapsulation format, which is to put the compressed and encoded video data and audio data together in a certain format. When we play a media file, we usually need to go through the following steps:

You can see that the implementation of this video player needs to involve the following:

  • Demuxing: It is to separate the input data in the encapsulation format into audio stream compression coded data and video stream compression coded data. For example, after decapsulating the data in FLV format, the H.264-encoded video stream and AAC-encoded audio stream are output.

  • Software and hardware decoding (Decode): It is to decode the video/audio compression coded data into uncompressed video/audio original data. Through decoding, the compressed and encoded video data H.264, MPEG2 is decoded into uncompressed color data, such as YUV, etc.; the compressed and encoded audio data AAC, MP3 is decoded into uncompressed audio sample data, such as PCM data. Decoding is divided into hard coding and soft coding.

  • Pixel format conversion: Convert YUV data format to RGB data format.

  • Resample: Resample the audio.

  • dts/pts: dts is the decoded timestamp, and pts is the displayed timestamp. pts is used to get the current playback progress. The progress bar movement needs to use the av_seek_frame function.

  • Audio and video synchronization: It is to synchronize the decoded audio and video data according to the parameter information obtained during the processing of the decapsulation module, and send the audio and video data to the graphics card and sound card of the system for playback (Render).

Among them, decoding is the most important. The following describes the decoding process and the API and structure used.

★The business card at the end of the article can receive audio and video development learning materials for free, including (C/C++, Linux server development, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmap, etc.

see below !

 

Four, FFmpeg decoding process

5. Description of the FFmpeg API used

5.1 av_register_all()

  • Register all components of FFmpeg.

  • It has been deprecated after version 4.0, so audio and video can be encoded and decoded normally without adding it.

5.2 avformat_alloc_context()

Used to initialize the AVFormatContext object. Its prototype is as follows:

AVFormatContext *avformat_alloc_context(void)
  • Because AVFormatContext must be initialized to NULL or initialized with avformat_alloc_context().

5.3 avformat_open_input()

Open the media file and get the decapsulation context. Its prototype is as follows:

int avformat_open_input(AVFormatContext **ps, const char *url, AVInputFormat *fmt, AVDictionary **options)
  • ps: AVFormatContext double pointer, after the function call is successful, the decapsulation context will be assigned to ps.

  • url: It can be rtsp, http network stream address, or local video file path.

  • fmt: Specifies the encapsulation format of the input audio and video. Generally, it can be set to nullptr, and it will be automatically explored.

  • fmt: AVInputFormat, a member of AVFormatContext, is mandatory, which is the encapsulation format of the input audio and video. In general, it can be set to NULL, which will automatically explore AVInputFormat.

  • options: Some additional options, generally can be set to nullptr, but sometimes need to be set when playing rtsp.

5.4 avformat_find_stream_info()

Probe to get flow information. Its prototype is as follows:

int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options)
  • Because there is no header information in some formats, such as flv format and h264 format, calling avformat_open_input() has no parameters after opening the file, so the information inside cannot be obtained.

  • This function can be called at this time, because it will try to detect the format of the file, but if there is no header information in the format, then it can only obtain information such as encoding, width and height, but still cannot obtain the total duration.

  • If the total duration cannot be obtained, you need to read the entire file and obtain its total frame number to calculate.

5.5 avcodec_find_decoder()

Find codecs. The parameter of the function is the ID of the decoder to be used, and the found decoder is returned successfully (NULL is returned if not found). Its prototype is as follows:

AVCodec *avcodec_find_decoder(enum AVCodecID id);
  • id: the found decoder

5.6 avcodec_open2()

AVCodecContext used to initialize an audio and video codec, the declaration is located in libavcodec\utils.c. Its prototype is as follows:

int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options)
  • avctx: AVCodecContext that needs to be initialized.

  • codec: AVCodec of the input.

  • options: some options. For example, when using libx264 encoding, "preset", "tune" and so on can be set through this parameter.

5.7 av_read_frame()

Read several frames of audio or one frame of video in the code stream. For example, when decoding a video, each time a video frame is decoded, av_read_frame() needs to be called to obtain the compressed data of a frame of video, and then the data can be decoded. Its prototype is as follows:

int av_read_frame(AVFormatContext *s, AVPacket *pts)
  • s: The unpacking context.

  • pkt: Stores the compressed data of a frame of video.

5.8 avcodec_decode_video2()

Decode a frame of video data. Input a compressed and encoded structure AVPacket, and output a decoded structure AVFrame. Its prototype is as follows:

int avcodec_decode_video2(AVCodecContext *avctx, AVFrame *picture,
                         int *got_picture_ptr,
                         const AVPacket *avpkt);
  • avctx: AVCodecContext that needs to be initialized.

  • codec: AVCodec of the input.

  • options: some options. For example, when using libx264 encoding, "preset", "tune", etc. can be set through this parameter.

5.9 avformat_close_input()

Turns off freeing the unwrapping context and is set to 0. Its prototype is as follows:

void avformat_close_input(AVFormatContext **s)
  • s: The unpacking context.

6. Description of the FFmpeg structure used

6.1 AVFormatContext

The decapsulation context is a structure that stores the information contained in the audio and video encapsulation format.

char filename[1024] // Save the opened file name, generally used in rtsp, rtmp disconnection and reconnection unsigned 
int nb_streams // Number of audio and video streams 
AVStream **streams // Store video stream, audio stream, subtitle stream information 
int64_t duration // The total duration of the media file, the unit is to divide 1 second into AV_TIME_BASE (1000000) parts, that is, the unit. For us, note that not every video can get duration 
int64_t bit_rate // bit rate (unit bps, convert to kbps need to be divided by 1000)

6.2 AV Stream

AVStream is a structure that stores information about each audio/video stream. Its important variables are as follows:

int index //Identifies the video/audio stream 
AVCodecContext *codec //Decoder, 
AVRational time_base //Timebase is deprecated after version 4.0. Through this value, PTS and DTS can be converted into actual time (in seconds) 
int64_t duration // The duration of the video/audio stream, in ms 
AVRational avg_frame_rate // Frame rate (Note: For video, this is very important ) 
AVPacket attached_pic // Attached picture. For example, some MP3, AAC audio files attached to the album cover 
AVCodecParameters *codecpar // Audio and video parameters, newly added to replace AVCodecContext *codec

6.3 AVCodecContext

AVCodecContext is a structure describing the codec context, which contains the parameter information required by many codecs. Let's pick some key variables to have a look (only decoding is considered here).

enum AVMediaType codec_type // type of codec (video, audio...) 
struct AVCodec *codec // used decoder AVCodec (H.264, MPEG2...)     
enum AVCodecID codec_id // mark a specific codec (H.264, MPEG2...) 
int format // video pixel format/audio sampling data format 
int width, height // indicates the width and height of the video 
int bit_rate // average bit rate     
int channels // number of channels (audio ) 
uint64_t channel_layout // channel format 
int sample_rate // sampling rate (audio) 
AVRational time_base; // time base. Through this value, PTS and DTS can be converted into actual time (in seconds) 
uint8_t *extradata; int extradata_size; // Additional information for specific encoders (for example, for H.264 decoders, store SPS, PPS, etc.)

6.4 AVCodec

AVCodec is a structure that stores encoder information. Its important variables are as follows:

const char *name; // The short name of the codec 
const char *long_name; // The full name of the codec 
enum AVMediaType type; // Indicates the type, whether it is video, audio, or subtitle 
enum AVCodecID id; // ID, not repeated 
const AVRational *supported_framerates; // Supported frame rate (only video) 
const enum AVPixelFormat *pix_fmts; // Supported pixel format (only video), such as RGB24, YUV420P, etc. 
const int *supported_samplerates; // Supported sample rates (audio only) 
const enum AVSampleFormat *sample_fmts; // Supported sample formats (audio only) 
const uint64_t *channel_layouts; // Number of channels supported (audio only) 
int priv_data_size; // size of private data

6.5 AVCodecParameters

Added to replace AVCodecContext *codec. Because the AVCodecContext structure contains too many parameters, AVCodecParameters separates the parameters of the encoder from the AVCodecContext. Some important parameters in the AVCodecParameters structure are as follows:

enum AVMediaType codec_type // type of codec (video, audio...)    
enum AVCodecID codec_id // mark a specific codec (H.264, MPEG2...) 
int format // video pixel format/audio sample data Format 
int width, height // indicates the width and height of the video 
int bit_rate // average bit rate     
int channels // number of channels (audio) 
uint64_t channel_layout // channel format 
int sample_rate // sampling rate (audio) 
AVRational time_base; / / time base. Through this value, PTS and DTS can be converted into actual time (in seconds) 
uint8_t *extradata; int extradata_size; // Additional information for specific encoders (for example, for H.264 decoders, store SPS, PPS, etc.)

It can be seen that the members of the two are basically the same.

avcodec_decode_video2(): decode a frame of video data 
sws_scale(): convert video data format     
av_frame_free(): release the memory requested by xx context 
avcodec_close(): close the decoder

6.6 The AVPacket

AVPacket is a structure that stores information about compressed encoded data. Its important variables are as follows:

uint8_t *data; // Compressed encoded data. 
/* For example for H.264. The data of 1 AVPacket usually corresponds to a NAL. 
Note: here is just a correspondence, not exactly the same. There is a slight difference between them: use the FFMPEG class library to separate the H.264 stream in the multimedia file. Therefore, when using FFMPEG for audio and video processing, the data data of the obtained AVPacket can often be directly written into a file, thereby obtaining the code stream file of the audio and video. */ 
int size; // data size 
int64_t pts; // display timestamp 
int64_t dts; // decode timestamp 
int stream_index; // identify the video/audio stream to which the AVPacket belongs.

6.7 AVFrame

The AVFrame structure is generally used to store raw data (that is, uncompressed data, such as YUV, RGB for video, and PCM for audio), and also contains some related information. For example, data such as the macroblock type table, QP table, and motion vector table are stored during decoding. Relevant data is also stored during encoding. Therefore, when using FFmpeg for stream analysis, AVFrame is a very important structure.

Let's look at the role of several main variables (consider the case of decoding here):

uint8_t *data[AV_NUM_DATA_POINTERS]; // Decoded original data (YUV, RGB for video, PCM for audio) 
int linesize[AV_NUM_DATA_POINTERS]; // The size of "one line" of data in data. Note: It is not necessarily equal to the width of the image, generally larger than the width of the image. 
int width, height; // video frame width and height (1920x1080, 1280x720...) 
int nb_samples; // an audio AVFrame may contain multiple audio frames, and this tag contains several 
int formats; // decoding After the original data type (YUV420, YUV422, RGB24...) 
int key_frame; // whether it is a key frame 
enum AVPictureType pict_type; // frame type (I, B, P...) 
AVRational sample_aspect_ratio; // aspect ratio ( 16:9, 4:3...) 
int64_t pts; // display timestamp 
int coded_picture_number; // coded frame number 
int display_picture_number; // display frame number

reference:

【Thor - Decode】

Graphical FFMPEG open media function avformat_open_input

Simple analysis of FFmpeg source code: avformat_open_input()

Simple analysis of FFmpeg source code: avformat_find_stream_info()

Simple analysis of FFmpeg source code: av_read_frame()

Simple analysis of FFmpeg source code: avcodec_decode_video2()

Simple analysis of FFmpeg source code: avformat_close_input()

[Thor - FFmpeg structure]

The relationship between the most critical structures in FFMPEG

FFMPEG structure analysis: AVFrame

FFMPEG structure analysis: AVFormatContext

FFMPEG structure analysis: AVCodecContext

FFMPEG structure analysis: AVIOContext

FFMPEG structure analysis: AVCodec

FFMPEG structure analysis: AVStream

FFMPEG structure analysis: AVPacket

Author: fengMisaka

★The business card at the end of the article can receive audio and video development learning materials for free, including (C/C++, Linux server development, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmap, etc.

See below! ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

 

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/128343320