Qt audio and video development 07-merge audio and video files

I. Introduction

The corresponding files have been stored separately for audio and video before, because this demand is very small. Of course, some users do need to store audio and video separately, but after all, there are very few. Most users combine audio and video into one MP4 files, so if you want to merge into one file, it has been stored as two files. The easiest way is to call the ffmpeg command line to merge the two files. These conversion operations. It consumes a lot of CPU during the conversion. If the video file is very long and large, the CPU is basically exhausted.

For files merged by calling the ffmpeg command line, if the source is a local video file or a network video file, it will be normal after the merge, and the audio and video will be synchronized. It has been verified by the test that the audio and video are not synchronized, so the method of saving audio and video together will be re-studied later.

2. Rendering

insert image description here

3. Experience address

Domestic site: https://gitee.com/feiyangqingyun
International site: https://github.com/feiyangqingyun
Personal works: https://blog.csdn.net/feiyangqingyun/article/details/97565652
Experience address: https://pan.baidu.com/s/1d7TH_GEYl5nOecuNlWJJ7g Extraction code: 01jf File name: bin_video_demo/bin_linux_video.

4. Related codes

#include "ffmpegrun.h"
#include "ffmpegrunthread.h"
#include "qfileinfo.h"

bool FFmpegRun::run(const QStringList &args, bool exec)
{
    
    
    if (exec) {
    
    
        return FFmpegRunThread::execute(args);
    } else {
    
    
        //每个指令都开启独立的线程执行(执行完成会立即释放线程)
        FFmpegRunThread *runThread = new FFmpegRunThread;
        runThread->startExecute(args);
        return true;
    }
}

QString FFmpegRun::replaceSuffix(const QString &fileSrc, const QString &suffix)
{
    
    
    QString fileDst = fileSrc;
    fileDst.replace(QFileInfo(fileSrc).suffix(), suffix);
    return fileDst;
}

bool FFmpegRun::yuv420pToMp4(const QString &fileSrc)
{
    
    
    //根据文件名自动提取宽度高度帧率
    //要求名字格式 640x408x30.xxx 640_480_30.xxx 1920x1080_aaaa.xxx
    int frameRate = 25;
    int width = 640;
    int height = 480;

    QString name = QFileInfo(fileSrc).baseName();
    QStringList list;
    if (name.contains("_")) {
    
    
        list = name.split("_");
    } else if (name.contains("x")) {
    
    
        list = name.split("x");
    }

    if (list.size() >= 2) {
    
    
        width = list.at(0).toInt();
        height = list.at(1).toInt();
    }
    if (list.size() >= 3) {
    
    
        frameRate = list.at(2).toInt();
    }

    return yuv420pToMp4(fileSrc, frameRate, width, height);
}

bool FFmpegRun::yuv420pToMp4(const QString &fileSrc, int frameRate, int width, int height)
{
    
    
    QString fileDst = replaceSuffix(fileSrc, "mp4");
    return yuv420pToMp4(fileSrc, fileDst, frameRate, width, height);
}

//播放音频 ffplay -ar 48000 -ac 2 -f s16le -i d:/out.pcm
//倍速播放 ffplay f:/mp4/1.mp4 -af atempo=2 -vf setpts=PTS/2

//-c:v libx264 -preset fast -crf 18  framerate=r  video_size=s  vcodec=c:v  acodec=c:a
//ffmpeg.exe -threads auto -f rawvideo -framerate 30 -video_size 640x480 -pix_fmt yuv420p -i e:/1.yuv e:/1.mp4
//ffmpeg -threads auto -f rawvideo -r 30 -s 640x480 -pix_fmt yuv420p -i /home/liu/1.yuv /home/liu/1.mp4
bool FFmpegRun::yuv420pToMp4(const QString &fileSrc, const QString &fileDst, int frameRate, int width, int height)
{
    
    
    QStringList args;
    args << "-threads" << "auto";
    args << "-f" << "rawvideo";
#if 0
    args << "-framerate" << QString::number(frameRate);
    args << "-video_size" << QString("%1x%2").arg(width).arg(height);
#else
    args << "-r" << QString::number(frameRate);
    args << "-s" << QString("%1x%2").arg(width).arg(height);
#endif
    args << "-pix_fmt" << "yuv420p";
    args << "-i" << fileSrc;
    //下面这些参数非必须(linux上不支持)
#if 0
    args << "-vcodec" << "libx264";
    args << "-preset" << "fast";
    args << "-crf" << QString::number(18);
#endif
    args << fileDst;
    return run(args);
}

bool FFmpegRun::mp4ToYuv420p(const QString &fileSrc)
{
    
    
    QString fileDst = replaceSuffix(fileSrc, "yuv");
    return mp4ToYuv420p(fileSrc, fileDst, 0, 0);
}

//ffmpeg.exe -i e:/1.mp4 -s 640x480 -pix_fmt yuv420p e:/1.yuv
bool FFmpegRun::mp4ToYuv420p(const QString &fileSrc, const QString &fileDst, int width, int height)
{
    
    
    QStringList args;
    //尺寸为空则自动采用默认的
    if (width > 0 && height > 0) {
    
    
        args << "-s" << QString("%1x%2").arg(width).arg(height);
    }

    args << "-pix_fmt" << "yuv420p";
    args << "-i" << fileSrc;
    args << "-y" << fileDst;
    return run(args);
}

bool FFmpegRun::wavToAac(const QString &fileSrc)
{
    
    
    QString fileDst = replaceSuffix(fileSrc, "aac");
    return wavToAac(fileSrc, fileDst);
}

bool FFmpegRun::wavToAac(const QString &fileSrc, const QString &fileDst)
{
    
    
    QStringList args;
    args << "-i" << fileSrc;
    args << fileDst;
    return run(args, true);
}

bool FFmpegRun::aacAndH264ToMp4(const QString &fileSrc)
{
    
    
    QString aacFile = replaceSuffix(fileSrc, "aac");
    QString fileDst = replaceSuffix(fileSrc, "mp4");
    return mergeToMp4(aacFile, fileSrc, fileDst);
}

bool FFmpegRun::aacAndMp4ToMp4(const QString &fileSrc)
{
    
    
    QString aacFile = replaceSuffix(fileSrc, "aac");
    QString mp4File = replaceSuffix(fileSrc, "mp4");
    QString baseName = fileSrc;
    baseName.replace(".mp4", "");
    //搞个临时的名称转换好以后再重命名
    QString fileDst = baseName + "-tmp.mp4";
    return mergeToMp4(aacFile, mp4File, fileDst);
}

bool FFmpegRun::mergeToMp4(const QString &fileSrc1, const QString &fileSrc2, const QString &fileDst)
{
    
    
    //文件不存在不用继续
    if (QFile(fileSrc1).size() == 0 || QFile(fileSrc2).size() == 0) {
    
    
        return false;
    }

    //ffmpeg -i d:/1.aac -i d:/1.mp4 -y d:/out.mp4
    QStringList args;
    //QString arg = "-vcodec copy -acodec copy";
    //QString arg = "-c:v copy -c:a aac -strict experimental";
    //args << arg.split(" ");

    args << "-i" << fileSrc1;
    args << "-i" << fileSrc2;
    args << "-y" << fileDst;
    return run(args);
}

bool FFmpegRun::convertMp4(const QString &fileSrc)
{
    
    
    QString baseName = fileSrc;
    baseName.replace(".mp4", "");
    //搞个临时的名称转换好以后再重命名
    QString fileDst = baseName + "-tmp.mp4";
    return convertMp4(fileSrc, fileDst);
}

bool FFmpegRun::convertMp4(const QString &fileSrc, const QString &fileDst)
{
    
    
    //文件不存在不用继续
    if (QFile(fileSrc).size() == 0) {
    
    
        return false;
    }

    //ffmpeg -i d:/1.mp4 -y d:/out.mp4
    QStringList args;
    args << "-i" << fileSrc;
    args << "-y" << fileDst;
    return run(args);
}

5. Features

5.1 Basic functions

Support various audio and video file formats, such as mp3, wav, mp4, asf, rm, rmvb, mkv, etc.
Support local camera equipment, you can specify the resolution and frame rate.
Support various video streaming formats, such as rtp, rtsp, rtmp, http, etc.
Local audio and video files and network audio and video files, automatically identify file length, playback progress, volume, mute status, etc.
The file can specify the playback position, adjust the volume, set the mute status, etc.
Supports double-speed playback of files, and can choose 0.5 times, 1.0 times, 2.5 times, 5.0 times and other speeds, which is equivalent to slow playback and fast playback.
Support start playback, stop playback, pause playback, continue playback.
Supports snapping screenshots, you can specify the file path, and you can choose whether to automatically display the preview after the snapping is completed.
Support video storage, manually start and stop recording, some kernels support continuing recording after pausing recording, and skip the part that does not need to be recorded.
Support mechanisms such as non-perceptual switching loop playback, automatic reconnection, etc.
Provides signals such as successful playback, playback completion, received decoded pictures, received captured pictures, video size changes, and recording status changes.
Multi-thread processing, one decoding thread, no stuck on the main interface.

5.2 Features

Support multiple decoding kernels at the same time, including qmedia kernel (Qt4/Qt5/Qt6), ffmpeg kernel (ffmpeg2/ffmpeg3/ffmpeg4/ffmpeg5), vlc kernel (vlc2/vlc3), mpv kernel (mpv1/mp2), Hikvision SDK, easyplayer kernel etc.
Very complete multiple base class design, adding a new decoding core only needs to implement a very small amount of code, and the whole set of mechanisms can be applied.
At the same time, it supports a variety of screen display strategies, automatic adjustment (the original resolution is smaller than the size of the display control, it will be displayed according to the original resolution, otherwise it will be scaled proportionally), proportional scaling (forever proportional scaling), stretching and filling (forever stretching and filling ). Three screen display strategies are supported in all kernels and in all video display modes.
At the same time, it supports a variety of video display modes, handle mode (pass in the control handle to the other party for drawing control), draw mode (call back to get the data and convert it to QImage and draw it with QPainter), GPU mode (call back to get the data and convert it to yuv for use QOpenglWidget draws).
Support multiple hardware acceleration types, ffmpeg can choose dxva2, d3d11va, etc., mpv can choose auto, dxva2, d3d11va, vlc can choose any, dxva2, d3d11va. Different system environments have different types of options, such as vaapi and vdpau for linux systems, and videotoolbox for macos systems.
The decoding thread is separated from the display window, and any decoding core can be specified to be mounted to any display window and switched dynamically.
Support shared decoding thread, which is enabled by default and automatically processed. When the same video address is recognized, a decoding thread is shared, which can greatly save network traffic and the push pressure of the other party's device in the network video environment. Top domestic video manufacturers all adopt this strategy. In this way, as long as one video stream is pulled, it can be shared to dozens or hundreds of channels for display.
Automatically identify the video rotation angle and draw it. For example, the video shot on a mobile phone is generally rotated by 90 degrees. It must be automatically rotated during playback, otherwise the default is upside down.
Automatically recognizes resolution changes during video streaming and automatically adjusts the size on the video controls. For example, the camera can dynamically configure the resolution during use, and when the resolution changes, the corresponding video controls also respond synchronously.
Audio and video files are automatically switched and played in a loop without perception, and there will be no visible switching traces such as black screens during switching.
The video control also supports any decoding core, any screen display strategy, and any video display mode.
The video control floating bar supports three modes of handle, drawing, and GPU at the same time, and the non-absolute coordinates can be moved around.
The local camera device supports playing by specifying the device name, resolution, and frame rate.
Recording files also support open video files, local cameras, network video streams, etc.
Respond to opening and closing instantly, whether it is opening a video or network stream that does not exist, detecting the existence of a device, waiting for a timeout in reading, and immediately interrupting the previous operation and responding when the close command is received.
Support to open various picture files, and support local audio and video files to drag and play.
The video control floating bar comes with functions such as start and stop recording switching, sound mute switching, snapping screenshots, and closing video.
The audio component supports sound waveform value data analysis, and can draw waveform curves and columnar sound bars based on the value, and provides sound amplitude signals by default.
The extremely detailed print information prompts in each component, especially the error message prompts, and the unified print format of the package. It is extremely convenient and useful to test the complex equipment environment on site, which is equivalent to accurately locating which channel and which step are wrong.
The code framework and structure are optimized to the best, the performance is powerful, and iteratively updated and upgraded continuously.
The source code supports Qt4, Qt5, Qt6, compatible with all versions.

5.3 Video Controls

Any number of osd label information can be added dynamically. Label information includes name, whether it is visible, font size, text text, text color, label picture, label coordinates, label format (text, date, time, date time, picture), label position (upper left, lower left, upper right, lower right, centered, custom coordinates).
Any number of graphics information can be dynamically added, which is very useful, for example, the graphics area information analyzed by the artificial intelligence algorithm can be directly sent to the video control. The graphic information supports any shape, which can be directly drawn on the original picture with absolute coordinates.
Graphic information includes name, border size, border color, background color, rectangular area, path collection, point coordinate collection, etc.
One or more of the three types of areas can be specified for each graphic information, and all specified areas will be drawn.
Built-in floating bar control, the floating bar position supports top, bottom, left, right.
The parameters of the floating bar control include margin, spacing, background transparency, background color, text color, pressed color, position, button icon code collection, button name identification collection, and button prompt information collection.
A row of tool buttons in the floating bar control can be customized. Through the structure parameter setting, the icon can choose a graphic font or a custom picture.
The floating bar button internally realizes functions such as video switching, snapping screenshots, mute switching, and turning off video, and you can also add your own corresponding functions in the source code.
The floating bar button corresponds to the button that has realized the function, and there is a corresponding icon switching process. For example, after the recording button is pressed, it will switch to the icon that is being recorded. After the sound button is switched, it will become a mute icon, and then switch again to restore.
After the button of the floating bar is clicked, it will be sent as a signal with the unique identification of the name, and it can be associated with the response processing by itself.
Prompt information can be displayed in the blank area of the floating bar. By default, the current video resolution is displayed, and information such as frame rate and code stream size can be added.
Video control parameters include border size, border color, focus color, background color (transparent by default), text color (default global text color), fill color (blank space outside the video is filled with black), background text, background image (if set Pictures are preferred), whether to copy pictures, scaling display mode (automatic adjustment, proportional scaling, stretching and filling), video display mode (handle, drawing, GPU), enable floating bar, floating bar size (horizontal is height, vertical is the width), the position of the floating bar (top, bottom, left, right).

5.4 kernel ffmpeg

Supports various audio and video files, local camera equipment, and various video stream network streams.
Support start playback, pause playback, continue playback, stop playback, set playback progress, double speed playback.
You can set the volume, mute switching, snap pictures, video storage.
Automatically extract album information such as title, artist, album, album cover, and automatically display the album cover.
Perfect support for audio and video synchronization and double-speed playback.
The decoding strategy supports speed priority, quality priority, equalization processing, and the fastest speed.
Support mobile phone video rotation angle display. For example, the video shot by a general mobile phone is rotated 90 degrees. When decoding and displaying, it needs to be rotated 90 degrees again to be correct.
Automatically convert yuv420 format, for example, the local camera is in yuyv422 format, some video files are in xx format, and the non-yuv420 format will be converted uniformly, and then processed.
Support hard decoding dxva2, d3d11va, etc., with extremely high performance, especially for large resolution such as 4K video.
The video response is extremely low and the delay is about 0.2s, and the fast response is about 0.5s to open the video stream, which is specially optimized.
Combination of hardware decoding and GPU drawing, extremely low CPU usage, better than clients such as Hikvision Dahua.
Supports various audio formats in video streams, including AAC, PCM, G.726, G.711A, G.711Mu, G.711ulaw, G.711alaw, MP2L2, etc. It is recommended to choose AAC for compatibility and cross-platform.
Video storage supports yuv, h264, mp4 formats, audio storage supports pcm, wav, aac formats. Default video mp4 format, audio aac format.
It supports separate storage of audio and video files, and also supports merging into one mp4 file. The default strategy is that no matter what audio and video file format is stored, it will eventually be converted into mp4 and aac formats, and then merged into an mp4 file with both audio and video.
Support local camera real-time video display with audio input and output, audio and video recording merged into one mp4 file.
Support H264/H265 encoding (now more and more surveillance cameras use H265 video stream format) to generate video files, and automatically identify and switch encoding formats internally.
Automatically recognize the dynamic resolution change of the video stream and reopen the video stream.
Support video stream playback with special characters in the user information (for example, characters such as +#@ in the user information), and built-in parsing and escaping processing.
Pure qt+ffmpeg decoding, non-sdl and other third-party rendering and playback dependencies, gpu rendering uses qopenglwidget, audio playback uses qaudiooutput.
At the same time, it supports ffmpeg2, ffmpeg3, ffmpeg4, and ffmpeg5 versions, all of which are compatible. If you need to support xp, you need to choose ffmpeg3 and below.