Getting Started with FFmpeg - Video Playback

It is best to start learning audio and video from the playback that can directly see things and is closer to the user.

Audio and video codec basics

We can play videos through http, rtmp or local video files. The "video" here actually refers to mp4, avi, which have both audio and video file formats.

Such a video file may have multiple tracks such as video track, audio track, subtitle track, etc. Some formats have more restrictions, for example, there can only be one AVI video track, and only one audio track. Some formats are more flexible , For example, OGG video can have multiple video and audio tracks.

For tracks with a large amount of data such as audio and video, the above data is actually compressed. The video track may be compressed image data such as H264 and H256, which can be restored to image data in YUV, RGB and other formats through decoding. The audio track may be compressed audio data such as MP3 and AAC, which can be restored to a PCM audio stream through decoding.

Screenshot 2022-09-04 pm 1.47.57.png

In fact, using ffmpeg to play video is to restore the image data step by step according to the file format and hand it to the display device for display, and restore the audio data to the audio device for playback:

Screenshot 2022-09-04 pm 1.48.08.png

The business card at the end of the article is free to receive audio and video development learning materials, including (C/C++, Linux server development, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps , etc.

Simple introduction to ffmpeg

After understanding the video playback process, let's make a simple player and actually get started with ffmpeg. Since this blog is an introductory tutorial, the player function will be simplified:

  1. Use ffmpeg 4.4.2 version - 4.x version is widely used, and the latest 5.x version has less information

  2. Only decode the picture of one video track for playback - no need to consider the issue of audio and video synchronization

  3. Use SDL2 to decode on the main thread - no need to consider multi-thread synchronization issues

  4. Use source code + Makefile to build - verified on MAC and Ubuntu, Windows students need to create a vs project by themselves

Using ffmpeg to decode roughly has the following steps and key functions, you can correspond to the above flow chart:

Parse the file stream (de-protocol and de-encapsulation)

  1. avformat_open_input : It can open the data stream of protocols such as File and RTMP, and read the file header to parse out the video information, such as parsing out each track and duration, etc.

  2. avformat_find_stream_info : For formats without file headers such as MPEG or H264 naked streams, you can use this function to parse the first few frames to get video information

Create decoders for individual tracks (streaming)

  1. avcodec_find_decoder: Find the corresponding decoder

  2. avcodec_alloc_context3: Create a decoder context

  3. avcodec_parameters_to_context: Set the parameters required for decoding

  4. avcodec_open2: open the codec

Decode each track using the corresponding decoder (decoding)

  1. av_read_frame: read video data packets from video stream

  2. avcodec_send_packet: Send video data packets to the decoder for decoding

  3. avcodec_receive_frame: read the decoded frame data from the decoder

In order to focus on the audio and video part, I split the VideoDecoder class for decoding and the SdlWindow class for screen display. You can mainly focus on the VideoDecoder part.

Video stream analysis

Since the code for parsing the file stream and creating the decoder before actual decoding is relatively fixed, I will post the code directly. You may follow the comments to see the meaning of each step:

bool VideoDecoder::Load(const string& url) {
    mUrl = url;
​
    // 打开文件流读取文件头解析出视频信息如轨道信息、时长等
    // mFormatContext初始化为NULL,如果打开成功,它会被设置成非NULL的值,在不需要的时候可以通过avcodec_free_context释放。
    // 这个方法实际可以打开多种来源的数据,url可以是本地路径、rtmp地址等
    // 在不需要的时候通过avformat_close_input关闭文件流
    if(avformat_open_input(&mFormatContext, url.c_str(), NULL, NULL) < 0) {
        cout << "open " << url << " failed" << endl;
        return false;
    }
​
    // 对于没有文件头的格式如MPEG或者H264裸流等,可以通过这个函数解析前几帧得到视频的信息
    if(avformat_find_stream_info(mFormatContext, NULL) < 0) {
        cout << "can't find stream info in " << url << endl;
        return false;
    }
​
    // 查找视频轨道,实际上我们也可以通过遍历AVFormatContext的streams得到,代码如下:
    // for(int i = 0 ; i < mFormatContext->nb_streams ; i++) {
    //     if(mFormatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
    //         mVideoStreamIndex = i;
    //         break;
    //     }
    // }
    mVideoStreamIndex = av_find_best_stream(mFormatContext, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);
    if(mVideoStreamIndex < 0) {
        cout << "can't find video stream in " << url << endl;
        return false;
    }
​
    // 获取视频轨道的解码器相关参数
    AVCodecParameters* codecParam = mFormatContext->streams[mVideoStreamIndex]->codecpar;
    cout << "codec id = " << codecParam->codec_id << endl;
    
    // 通过codec_id获取到对应的解码器
    // codec_id是enum AVCodecID类型,我们可以通过它知道视频流的格式,如AV_CODEC_ID_H264(0x1B)、AV_CODEC_ID_H265(0xAD)等
    // 当然如果是音频轨道的话它的值可能是AV_CODEC_ID_MP3(0x15001)、AV_CODEC_ID_AAC(0x15002)等
    AVCodec* codec = avcodec_find_decoder(codecParam->codec_id);
    if(codec == NULL) {
        cout << "can't find codec" << endl;
        return false;
    }
​
    // 创建解码器上下文,解码器的一些环境就保存在这里
    // 在不需要的时候可以通过avcodec_free_context释放
    mCodecContext = avcodec_alloc_context3(codec);
    if (mCodecContext == NULL) {
        cout << "can't alloc codec context" << endl;
        return false;
    }
​
​
    // 设置解码器参数
    if(avcodec_parameters_to_context(mCodecContext, codecParam) < 0) {
        cout << "can't set codec params" << endl;
        return false;
    }
​
    // 打开解码器,从源码里面看到在avcodec_free_context释放解码器上下文的时候会close,
    // 所以我们可以不用自己调用avcodec_close去关闭
    if(avcodec_open2(mCodecContext, codec, NULL) < 0) {
        cout << "can't open codec" << endl;
        return false;
    }
​
    // 创建创建AVPacket接收数据包
    // 无论是压缩的音频流还是压缩的视频流,都是由一个个数据包组成的
    // 解码的过程实际就是从文件流中读取一个个数据包传给解码器去解码
    // 对于视频,它通常应包含一个压缩帧
    // 对于音频,它可能是一段压缩音频、包含多个压缩帧
    // 在不需要的时候可以通过av_packet_free释放
    mPacket = av_packet_alloc();
    if(NULL == mPacket) {
        cout << "can't alloc packet" << endl;
        return false;
    }
​
    // 创建AVFrame接收解码器解码出来的原始数据(视频的画面帧或者音频的PCM裸流)
    // 在不需要的时候可以通过av_frame_free释放
    mFrame = av_frame_alloc();
    if(NULL == mFrame) {
        cout << "can't alloc frame" << endl;
        return false;
    }
​
    // 可以从解码器上下文获取视频的尺寸
    // 这个尺寸实际上是从AVCodecParameters里面复制过去的,所以直接用codecParam->width、codecParam->height也可以
    mVideoWidth = mCodecContext->width;
    mVideoHegiht =  mCodecContext->height;
​
    // 可以从解码器上下文获取视频的像素格式
    // 这个像素格式实际上是从AVCodecParameters里面复制过去的,所以直接用codecParam->format也可以
    mPixelFormat = mCodecContext->pix_fmt;
​
    return true;
}

We use VideoDecoder::Load to open the video stream and prepare the decoder. After that is the decoding process. After the decoding is completed, call VideoDecoder::Release to release the resource:

void VideoDecoder::Release() {
    mUrl = "";
    mVideoStreamIndex = -1;
    mVideoWidth = -1;
    mVideoHegiht = -1;
    mDecodecStart = -1;
    mLastDecodecTime = -1;
    mPixelFormat = AV_PIX_FMT_NONE;
​
    if(NULL != mFormatContext) {
        avformat_close_input(&mFormatContext);
    }
​
    if (NULL != mCodecContext) {
        avcodec_free_context(&mCodecContext);
    }
    
    if(NULL != mPacket) {
        av_packet_free(&mPacket);
    }
​
    if(NULL != mFrame) {
        av_frame_free(&mFrame);
    }
}

video decoding

After the decoder is created, you can start decoding:

AVFrame* VideoDecoder::NextFrame() {
    if(av_read_frame(mFormatContext, mPacket) < 0) {
        return NULL;
    }
​
    AVFrame* frame = NULL;
    if(mPacket->stream_index == mVideoStreamIndex
        && avcodec_send_packet(mCodecContext, mPacket) == 0
        && avcodec_receive_frame(mCodecContext, mFrame) == 0) {
        frame = mFrame;
​
        ... //1.解码速度问题
    }
​
    av_packet_unref(mPacket); // 2.内存泄漏问题
​
    if(frame == NULL) {
        return NextFrame(); // 3.AVPacket帧类型问题
    }
​
    return frame;
}

Its core logic is actually the following three steps:

  1. Use av_read_frame to read video packets from a video stream

  2. Use avcodec_send_packet to send video packets to the decoder for decoding

  3. Use avcodec_receive_frame to read decoded frame data from the decoder

In addition to the key three steps, there are some details to pay attention to:

1. Decoding speed problem

Since the decoding speed is relatively fast, we can wait until the next frame needs to be played before decoding. This can reduce the CPU usage, and also reduce the high memory usage caused by the drawing thread stacking up the screen queue.

Since this demo does not have a separate decoding thread, decoding in the rendering thread, sdl rendering itself is time-consuming, so even if there is no delay, you will find that the picture is played at a normal speed. You can comment out the drawing code, and then add in this method Print it on the Internet, and you will find that the entire video is decoded at once.

2. Memory leak problem

After the decoding is completed, the data of the compressed data packet is not needed, and you need to use av_packet_unref to release the AVPacket.

In fact, AVFrame also needs to use av_frame_unref to release the pixel data of AVFrame after it is used, but av_frame_unref will be called in avcodec_receive_frame to clear the memory of the previous frame, and the data of the last frame will also be cleared by av_frame_free during Release. So we don't need to call av_frame_unref manually.

3. AVPacket frame type problem

Since there are types of video compression frames such as i frame, b frame, and p frame, not every frame can directly decode the original picture, and b frame is a two-way difference frame, that is to say, b frame records the current frame and the preceding and following frames. The difference, you need the following frame to decode.

If this frame of AVPacket does not decode the data, it will recursively call NextFrame to decode the next frame until the next frame of the original picture is decoded.

PTS synchronization

AVFrame has a pts member variable, which represents when the picture should be displayed. Since the video decoding speed is usually very fast, for example, a 1-minute video may be decoded in one second. So we need to calculate this frame When should it be played, adding a delay if the time is not up yet.

Some video streams do not have pts data, and the interval between each frame is unified to 32ms according to 30fps:

if(AV_NOPTS_VALUE == mFrame->pts) {
    int64_t sleep = 32000 - (av_gettime() - mLastDecodecTime);
    if(mLastDecodecTime != -1 && sleep > 0) {
        av_usleep(sleep);
    }
    mLastDecodecTime = av_gettime();
} else {
    ...
}

If the video stream has pts data, we need to calculate how many microseconds the pts is in the video.

The unit of pts can find the corresponding AVStream through AVFormatContext, and then get the time_base of AVStream:

AVRational timebase = mFormatContext->streams[mPacket->stream_index]->time_base;
AVRational是个分数,代表几分之几秒:

/**
 * Rational number (pair of numerator and denominator).
 */
typedef struct AVRational{
    int num; ///< Numerator
    int den; ///< Denominator
} AVRational;

We use timebase.num * 1.0f / timebase.den to calculate the value of this fraction, then multiply by 1000 to wait until ms, and then multiply by 1000 to get us. The second half of the calculation can actually be saved in VideoDecoder::Load to the member variable, but it is placed here for the convenience of explanation:

int64_t pts = mFrame->pts * 1000 * 1000 * timebase.num * 1.0f / timebase.den;

This pts is calculated from the beginning of the video, so we need to save the timestamp of the first frame first, and then calculate the number of microseconds currently played. The complete code is as follows:

if(AV_NOPTS_VALUE == mFrame->pts) {
    ...
} else {
    AVRational timebase = mFormatContext->streams[mPacket->stream_index]->time_base;
    int64_t pts = mFrame->pts * 1000 * 1000 * timebase.num * 1.0f / timebase.den;
​
    // 如果是第一帧就记录开始时间
    if(mFrame->pts == 0) {
        mDecodecStart = av_gettime() - pts;
    }
​
    // 当前时间减去开始时间,得到当前播放到了视频的第几微秒
    int64_t now = av_gettime() - mDecodecStart;
​
    // 如果这一帧的播放时间还没有到就等到播放时间到了再返回
    if(pts > now) {
        av_usleep(pts - now);
    }
}

other

The complete demo has been put on Github. The image rendering part is in the SdlWindow class. It uses SDL2 to do ui drawing. Since it has nothing to do with audio and video codecs, I won’t talk about it. The video decoding part is in the VideoDecoder class.

When compiling, you need to modify the path of ffmpeg and sdl2 in the Makefile, and then use the following command to play the video after make is compiled:

demo -p video path play video

PS:

Some functions will have a number suffix, such as avcodec_alloc_context3, avcodec_open2, etc. In fact, this number suffix means the first version of this function, as can be seen from the doc/APIchanges of the source code:

2011-07-10 - 3602ad7 / 0b950fe - lavc 53.8.0
  Add avcodec_open2(), deprecate avcodec_open().
  NOTE: this was backported to 0.7
​
  Add avcodec_alloc_context3. Deprecate avcodec_alloc_context() and
  avcodec_alloc_context2().

 

 

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/128007975