1 Introduction

Using ffmpeg to do audio and video synchronization, I personally think this is the most difficult one in the basic processing of ffmpeg. Countless people are stuck here, and it is not allowed. I have also tried various demos on the Internet, basically they are all scum, or It only supports a very small number of video files. For example, the received data packet is one frame of video and one frame of audio, or it cannot be synchronized at all, or the progress skips and crashes directly. In fact, the most perfect audio and video The synchronous processing demo is ffplay. I have personally tested dozens of various local audio and video files, and dozens of video streaming files, and they are all perfect. Of course, this is my own.

If you are only playing video streams (without audio streams), audio and video synchronization may not be required, so when you only played rtsp video streams at the beginning, you did not consider the issue of synchronization at all, because you did not encounter it and did not need it. When planting video streams such as rtmp, http, and m3u8, the problem is serious. It is a video stream file in hls format that comes over at one time, and small video files come over one by one. If there is no synchronization, it means that suddenly There have been a lot of pictures in the past, and the next time they come, they need to be calculated and synchronized by themselves. The data packets received last time are put into the queue, and they will be displayed when they need to be displayed.

Common audio and video synchronization methods:

It is controlled by fps. Fps indicates how many frames are played in one second, such as 25 frames. You can calculate the time it takes to decode one frame by yourself. One frame takes up (1000/25=40 milliseconds), and it is processed by delay. This is actually The worst way. Remember the time to start decoding startTime, calculate the pts time by av_rescale_q, the difference between the two is the time that needs to be delayed, call av_usleep to delay, only some files are normal, and many times it is not normal. The audio is synchronized to the video, and the video clock is used as the main clock. I have not tried it. Many people on the Internet say that this method is not good. The video is synchronized to the audio, and the audio clock is used as the main clock. I haven't tried it. It is said that most people use this method. The audio and video are synchronized to the external clock, and the external clock is used as the main clock. The final method adopted is easy to understand without interfering with each other, and each synchronizes itself according to the external clock. ffplay itself has three built-in synchronization strategies, which can be controlled by parameters. The default is to synchronize video to audio.

The benefits of this article, free to receive Qt development learning materials package, technical video, including (C++ language foundation, C++ design pattern, introduction to Qt programming, QT signal and slot mechanism, QT interface development-image drawing, QT network, QT database programming, QT project actual combat, QSS, OpenCV, Quick module, interview questions, etc.) ↓↓↓↓↓↓See below↓↓Click on the bottom of the article to receive the fee↓↓

2. Rendering

3. Related codes

#include "ffmpegsync.h"
#include "ffmpeghelper.h"
#include "ffmpegthread.h"

FFmpegSync::FFmpegSync(quint8 type, QObject *parent) : QThread(parent)
{
    this->stopped = false;
    this->type = type;
    this->thread = (FFmpegThread *)parent;
}

FFmpegSync::~FFmpegSync()
{

}

void FFmpegSync::run()
{
    if (!thread) {
        return;
    }

    this->reset();
    while (!stopped) {
        //暂停状态或者切换进度中或者队列中没有帧则不处理
        if (!thread->isPause && !thread->changePosition && packets.size() > 0) {
            mutex.lock();
            AVPacket *packet = packets.first();
            mutex.unlock();

            //h264的裸流文件同步有问题因为获取不到pts和dts(暂时用最蠢的延时办法解决)
            if (thread->formatName == "h264") {
                int sleepTime = (1000 / thread->frameRate) - 5;
                msleep(sleepTime);
            }

            //计算当前帧显示时间(外部时钟同步)
            ptsTime = FFmpegHelper::getPtsTime(thread->formatCtx, packet);
            if (!this->checkPtsTime()) {
                msleep(1);
                continue;
            }

            //显示当前的播放进度
            this->checkShowTime();

            //如果解码线程停止了则不用处理
            if (!thread->stopped) {
                //0-表示音频 1-表示视频
                if (type == 0) {
                    thread->decodeAudio1(packet);
                } else if (type == 1) {
                    thread->decodeVideo1(packet);
                }
            }

            //释放资源并移除
            mutex.lock();
            FFmpegHelper::freePacket(packet);
            packets.removeFirst();
            mutex.unlock();
        }

        msleep(1);
    }

    this->reset();
    this->clear();
    stopped = false;
}

void FFmpegSync::stop()
{
    if (this->isRunning()) {
        stopped = true;
        this->wait();
    }
}

void FFmpegSync::clear()
{
    mutex.lock();
    //释放还没有来得及处理的剩余的帧
    foreach (AVPacket *packet, packets) {
        FFmpegHelper::freePacket(packet);
    }
    packets.clear();
    mutex.unlock();
}

void FFmpegSync::reset()
{
    //复位音频外部时钟
    showTime = 0;
    bufferTime = 0;
    offsetTime = -1;
    startTime = av_gettime();
}

void FFmpegSync::append(AVPacket *packet)
{
    mutex.lock();
    packets << packet;
    mutex.unlock();
}

int FFmpegSync::getPacketCount()
{
    return this->packets.size();
}

bool FFmpegSync::checkPtsTime()
{
    //下面这几个时间值是参考的别人的
    bool ok = false;
    if (ptsTime > 0) {
        if (ptsTime > offsetTime + 100000) {
            bufferTime = ptsTime - offsetTime + 100000;
        }

        int offset = (type == 0 ? 1000 : 5000);
        //做梦都想不到倍速播放就这里控制个系数就行
        offsetTime = (av_gettime() - startTime) * thread->speed + bufferTime;
        if ((offsetTime <= ptsTime && ptsTime - offsetTime <= offset) || (offsetTime > ptsTime)) {
            ok = true;
        }
    } else {
        ok = true;
    }

    return ok;
}

void FFmpegSync::checkShowTime()
{
    //必须是文件(本地文件或网络文件)才有播放进度
    if (!thread->getIsFile()) {
        return;
    }

    //过滤重复发送播放时间
    bool showPosition = false;
    bool existVideo = (thread->videoIndex >= 0);
    if (type == 0) {
        //音频同步线程不能存在视频
        if (!existVideo) {
            showPosition = true;
        }
    } else if (type == 1) {
        //视频同步线程必须存在视频
        if (existVideo) {
            showPosition = true;
        }
    }

    //需要显示时间的时候发送对应进度(限定差值大于200毫秒没必要频繁发送)
    if (showPosition && (offsetTime - showTime > 200000)) {
        showTime = offsetTime;
        thread->position = showTime / 1000;
        emit receivePosition(thread->position);
    }
}

4. Features

4.1 Basic functions

Support various audio and video file formats, such as mp3, wav, mp4, asf, rm, rmvb, mkv, etc.
Support local camera equipment, you can specify the resolution and frame rate.
Support various video streaming formats, such as rtp, rtsp, rtmp, http, etc.
Local audio and video files and network audio and video files, automatically identify file length, playback progress, volume, mute status, etc.
The file can specify the playback position, adjust the volume, set the mute status, etc.
Supports double-speed playback of files, and can choose 0.5 times, 1.0 times, 2.5 times, 5.0 times and other speeds, which is equivalent to slow playback and fast playback.
Support start playback, stop playback, pause playback, continue playback.
Supports snapping screenshots, you can specify the file path, and you can choose whether to automatically display the preview after the snapping is completed.
Support video storage, manually start and stop recording, some kernels support continuing recording after pausing recording, and skip the part that does not need to be recorded.
Support mechanisms such as non-perceptual switching loop playback, automatic reconnection, etc.
Provides signals such as successful playback, playback completion, received decoded pictures, received captured pictures, video size changes, and recording status changes.
Multi-thread processing, one decoding thread, no stuck on the main interface.

4.2 Features

Support multiple decoding kernels at the same time, including qmedia kernel (Qt4/Qt5/Qt6), ffmpeg kernel (ffmpeg2/ffmpeg3/ffmpeg4/ffmpeg5), vlc kernel (vlc2/vlc3), mpv kernel (mpv1/mp2), Hikvision SDK, easyplayer kernel etc.
Very complete multiple base class design, adding a new decoding core only needs to implement a very small amount of code, and the whole set of mechanisms can be applied.
At the same time, it supports a variety of screen display strategies, automatic adjustment (the original resolution is smaller than the size of the display control, it will be displayed according to the original resolution, otherwise it will be scaled proportionally), proportional scaling (forever proportional scaling), stretching and filling (forever stretching and filling ). Three screen display strategies are supported in all kernels and in all video display modes.
At the same time, it supports a variety of video display modes, handle mode (pass in the control handle to the other party for drawing control), draw mode (call back to get the data and convert it to QImage and draw it with QPainter), GPU mode (call back to get the data and convert it to yuv for use QOpenglWidget draws).
Support multiple hardware acceleration types, ffmpeg can choose dxva2, d3d11va, etc., mpv can choose auto, dxva2, d3d11va, vlc can choose any, dxva2, d3d11va. Different system environments have different types of options, such as vaapi and vdpau for linux systems, and videotoolbox for macos systems.
The decoding thread is separated from the display window, and any decoding core can be specified to be mounted to any display window and switched dynamically.
Support shared decoding thread, which is enabled by default and automatically processed. When the same video address is recognized, a decoding thread is shared, which can greatly save network traffic and the push pressure of the other party's device in the network video environment. Top domestic video manufacturers all adopt this strategy. In this way, as long as one video stream is pulled, it can be shared to dozens or hundreds of channels for display.
Automatically identify the video rotation angle and draw it. For example, the video shot on a mobile phone is generally rotated by 90 degrees. It must be automatically rotated during playback, otherwise the default is upside down.
Automatically recognizes resolution changes during video streaming and automatically adjusts the size on the video controls. For example, the camera can dynamically configure the resolution during use, and when the resolution changes, the corresponding video controls also respond synchronously.
Audio and video files are automatically switched and played in a loop without perception, and there will be no visible switching traces such as black screens during switching.
The video control also supports any decoding core, any screen display strategy, and any video display mode.
The video control floating bar supports three modes of handle, drawing, and GPU at the same time, and the non-absolute coordinates can be moved around.
The local camera device supports playing by specifying the device name, resolution, and frame rate.
Recording files also support open video files, local cameras, network video streams, etc.
Respond to opening and closing instantly, whether it is opening a video or network stream that does not exist, detecting the existence of a device, waiting for a timeout in reading, and immediately interrupting the previous operation and responding when the close command is received.
Support to open various picture files, and support local audio and video files to drag and play.
The video control floating bar comes with functions such as start and stop recording switching, sound mute switching, snapping screenshots, and closing video.
The audio component supports sound waveform value data analysis, and can draw waveform curves and columnar sound bars based on the value, and provides sound amplitude signals by default.
The extremely detailed print information prompts in each component, especially the error message prompts, and the unified print format of the package. It is extremely convenient and useful to test the complex equipment environment on site, which is equivalent to accurately locating which channel and which step are wrong.
The code framework and structure are optimized to the best, the performance is powerful, and iteratively updated and upgraded continuously.
The source code supports Qt4, Qt5, Qt6, compatible with all versions.

4.3 Video Controls

Any number of osd label information can be added dynamically. Label information includes name, whether it is visible, font size, text text, text color, label picture, label coordinates, label format (text, date, time, date time, picture), label position (upper left, lower left, upper right, lower right, centered, custom coordinates).
Any number of graphics information can be dynamically added, which is very useful, for example, the graphics area information analyzed by the artificial intelligence algorithm can be directly sent to the video control. The graphic information supports any shape, which can be directly drawn on the original picture with absolute coordinates.
Graphic information includes name, border size, border color, background color, rectangular area, path collection, point coordinate collection, etc.
One or more of the three types of areas can be specified for each graphic information, and all specified areas will be drawn.
Built-in floating bar control, the floating bar position supports top, bottom, left, right.
The parameters of the floating bar control include margin, spacing, background transparency, background color, text color, pressed color, position, button icon code collection, button name identification collection, and button prompt information collection.
A row of tool buttons in the floating bar control can be customized. Through the structure parameter setting, the icon can choose a graphic font or a custom picture.
The floating bar button internally realizes functions such as video switching, snapping screenshots, mute switching, and turning off video, and you can also add your own corresponding functions in the source code.
The floating bar button corresponds to the button that has realized the function, and there is a corresponding icon switching process. For example, after the recording button is pressed, it will switch to the icon that is being recorded. After the sound button is switched, it will become a mute icon, and then switch again to restore.
After the button of the floating bar is clicked, it will be sent as a signal with the unique identification of the name, and it can be associated with the response processing by itself.
Prompt information can be displayed in the blank area of the floating bar. By default, the current video resolution is displayed, and information such as frame rate and code stream size can be added.
Video control parameters include border size, border color, focus color, background color (transparent by default), text color (default global text color), fill color (blank space outside the video is filled with black), background text, background image (if set Pictures are preferred), whether to copy pictures, scaling display mode (automatic adjustment, proportional scaling, stretching and filling), video display mode (handle, drawing, GPU), enable floating bar, floating bar size (horizontal is height, vertical is the width), the position of the floating bar (top, bottom, left, right).

4.4 kernel ffmpeg

Supports various audio and video files, local camera equipment, and various video stream network streams.
Support start playback, pause playback, continue playback, stop playback, set playback progress, double speed playback.
You can set the volume, mute switching, snap pictures, video storage.
Automatically extract album information such as title, artist, album, album cover, and automatically display the album cover.
Perfect support for audio and video synchronization and double-speed playback.
The decoding strategy supports speed priority, quality priority, equalization processing, and the fastest speed.
Support mobile phone video rotation angle display. For example, the video shot by a general mobile phone is rotated 90 degrees. When decoding and displaying, it needs to be rotated 90 degrees again to be correct.
Automatically convert yuv420 format, for example, the local camera is in yuyv422 format, some video files are in xx format, and the non-yuv420 format will be converted uniformly, and then processed.
Support hard decoding dxva2, d3d11va, etc., with extremely high performance, especially for large resolution such as 4K video.
The video response is extremely low and the delay is about 0.2s, and the fast response is about 0.5s to open the video stream, which is specially optimized.
Combination of hardware decoding and GPU drawing, extremely low CPU usage, better than clients such as Hikvision Dahua.
Supports various audio formats in video streams, including AAC, PCM, G.726, G.711A, G.711Mu, G.711ulaw, G.711alaw, MP2L2, etc. It is recommended to choose AAC for compatibility and cross-platform.
Video storage supports yuv, h264, mp4 formats, audio storage supports pcm, wav, aac formats. Default video mp4 format, audio aac format.
It supports separate storage of audio and video files, and also supports merging into one mp4 file. The default strategy is that no matter what audio and video file format is stored, it will eventually be converted into mp4 and aac formats, and then merged into an mp4 file with both audio and video.
Support local camera real-time video display with audio input and output, audio and video recording merged into one mp4 file.
Support H264/H265 encoding (now more and more surveillance cameras use H265 video stream format) to generate video files, and automatically identify and switch encoding formats internally.
Automatically recognize the dynamic resolution change of the video stream and reopen the video stream.
Support video stream playback with special characters in the user information (for example, characters such as +#@ in the user information), and built-in parsing and escaping processing.
Pure qt+ffmpeg decoding, non-sdl and other third-party rendering and playback dependencies, gpu rendering uses qopenglwidget, audio playback uses qaudiooutput.
At the same time, it supports ffmpeg2, ffmpeg3, ffmpeg4, and ffmpeg5 versions, all of which are compatible. If you need to support xp, you need to choose ffmpeg3 and below.