[ffmpeg audio and video synchronization] Solve the problem of data synchronization between multiple threads in ffmpeg audio and video


1 Introduction

Audio-Video Synchronization (Audio-Video Synchronization) is a key issue in audio and video processing, especially in embedded systems and real-time systems, audio-video synchronization is an important factor to ensure user experience. In practical applications, we often need to process audio and video streams from different sources, which may have different time bases and latencies. To ensure simultaneous playback of audio and video, we need precise synchronization of these streams.

In this blog, we will discuss in depth how to use C++ multithreading technology to solve audio and video synchronization problems. We will first introduce the key concepts of audio and video synchronization, such as time stamp (Presentation Time Stamp, PTS) and time base (time base). We then show how to use these concepts to calculate the time difference between audio and video and achieve synchronization by delaying the playback of video frames. Finally, we will show how to implement this synchronization strategy using C++ multithreading techniques, and discuss how to avoid data races and expired time differences.

In this article, we'll pay special attention to multithreaded programming techniques in C++, including how to use mutexes ( std::mutex) to protect shared data, how to use std::this_thread::sleep_forfunctions for delays, and how to optimize the performance of multithreaded programs. We will explain these techniques through specific code examples and detailed comments, hoping to help you better understand and apply these advanced topics.

2. Key concepts of audio and video synchronization

Before the programming practice of audio and video synchronization, we need to understand some basic concepts and principles. These concepts include time stamp (Presentation Time Stamp, PTS), time basis and data type selection of time stamp.

Timestamp (Presentation Time Stamp, PTS)

In audio and video processing, each frame of audio or video will have a time stamp associated with it, called the presentation time stamp (Presentation Time Stamp, PTS). This timestamp indicates when the frame should be played. For example, the timestamp of the first frame of a video might be 0, and the timestamp of the second frame might be 0.033, which means that the second frame should be displayed 0.033 seconds after it started playing.

In FFmpeg, the timestamp is usually doublerepresented by a variable of type, and the unit is second. One of the main reasons for this design is that doublevariables of type can store a wider range of values ​​and have higher precision. This is very important when dealing with timestamps for videos, which usually need to be accurate down to the microsecond level.

Time base for audio and video

When dealing with audio and video data, audio and video usually have their own time base. This time base is a value representing the duration of each frame, also in seconds. For example, for a 30 frame/second video, its time base is 1/30=0.0333333 seconds.

When synchronizing audio and video, we usually choose the audio time base as a reference, because the human ear is more sensitive to sound delay. Then, based on the timestamps of the audio and video, we calculate the time difference between them to control the playback progress of the video.

Timestamp data type selection

In programming, we can choose different types of variables to store and process data. In FFmpeg, pts(Presentation Time Stamp) is designed as doubletype instead of uint64_t(64-bit unsigned integer), mainly for the following reasons:

  1. Precision and Range : doubleVariables of type can store a wider range of values ​​and have higher precision. This is very important when dealing with timestamps for videos, which usually need to be accurate down to the microsecond level.
  2. Time unit : In FFmpeg, ptsit is in seconds, which means it needs to be able to represent fractional parts. And uint64_tcan only represent integers, not decimals.
  3. Ease of operation : doublevariables of type are more convenient than when dealing with operations such as addition, subtraction, multiplication, and division uint64_t, especially operations involving floating-point numbers.
  4. Compatibility : Some codec libraries or hardware devices may require the use doubleof type timestamps. In order to be compatible with these devices, FFmpeg needs to use doubletype to represent pts.

The above are some key concepts in audio and video synchronization. After understanding these concepts, we can start to implement the strategy of audio and video synchronization. In the next section, we will introduce in detail how to use C++ multithreading technology to achieve audio and video synchronization.

3. Basic strategies for audio and video synchronization

When processing audio and video streams, Audio-Video Synchronization (AV Sync for short) is a core issue. Audio and video data are usually encoded and decoded separately, so we need to reasonably control their playback speed to ensure that audio and video can be synchronized. There are multiple steps involved in this process, which we describe in detail in this chapter.

3.1 Play based on audio timestamp

In a multimedia system, we usually play audio based on the timestamp. Because the human ear is more sensitive to delay in sound, if there is any delay in the audio, the audience will notice it immediately. So we set the audio as the baseline for playback, and then adjust the playback speed of the video to align with the audio.

Here's a simple example showing how to base playback on audio timestamps:

while (true) {
    
    
    // 获取音频帧
    AVFrame & audio_frame = audio_buffer->front();
    // 计算音频帧的时间戳(毫秒)
    double audio_pts = audio_frame.pts * m_audio_time_base * 1000;
    // 播放音频帧
    play_audio_frame(audio_frame);
    // 根据音频帧的时间戳进行延迟
    std::this_thread::sleep_for(std::chrono::milliseconds(static_cast<int>(audio_pts)));
}

In this example, we first get the audio frame from the buffer, then calculate the timestamp of the audio frame and convert it to milliseconds. Then we play the audio frame and delay according to the audio frame's timestamp. This way, we can ensure that the audio frames are played at the correct speed.

3.2 Calculate the time difference between audio and video

While playing based on the audio timestamp, we also need to calculate the time difference between the audio and video. The time difference is the difference between the timestamp of the audio frame and the timestamp of the video frame. We can use this difference to adjust the playback speed of the video to synchronize it with the audio.

Here is a simple example showing how to calculate the time difference between audio and video:

// 获取音频和视频帧
AVFrame & video_frame = video_buffer->front();
AVFrame & audio_frame = audio_buffer->front();
// 计算音频和视频帧的时间戳(毫秒)
double video_pts = video_frame.pts * m_video_time_base * 1000;
double audio_pts = audio_frame.pts * m_audio_time_base * 1000;
// 计算音频和视频的时间差
double diff = video_pts - audio_pts;

In this example, we first fetch audio frames and video frames from the buffer, then calculate their timestamps and convert them to milliseconds. Then we calculate the time difference between audio and video, and this difference is the time we need to adjust.

3.3 Synchronization by delaying the playback of video frames

After getting the time difference between audio and video, we can achieve audio and video synchronization by delaying the playback of video frames. If the timestamp of the video frame is faster than the timestamp of the audio frame, then we need to delay the playback of the video frame; otherwise, if the timestamp of the video frame is slower than the timestamp of the audio frame, then we need to play the video frame immediately.

Here's a simple example showing how to synchronize audio and video by delaying the playback of video frames:

if (diff > 0) {
    
    
    // 如果视频帧的时间戳快于音频帧的时间戳,那么就需要延迟视频帧的播放
    std::this_thread::sleep_for(std::chrono::milliseconds(static_cast<int>(diff)));
}
// 播放视频帧
play_video_frame(video_frame);

In this example, we first check the time difference between audio and video. If the time difference is greater than 0, then the playback of the video frame needs to be delayed. We use std::this_thread::sleep_forthe function to delay, and the delay time is the time difference between audio and video. Then we play the video frame. This way, we can ensure that the playback of the video frames is synchronized with the playback of the audio frames.

4. Use C++ multithreading to realize audio and video synchronization

在音视频处理中,音频(Audio)和视频(Video)通常被单独处理和播放,这就需要我们实现一种机制,使得音频和视频能够同步播放。C++ 的多线程(Multithreading)技术为我们提供了一种实现这种机制的方法。在本章中,我们将详细介绍如何使用C++多线程来实现音视频同步。

创建独立的音频和视频播放线程

在多线程编程中,线程(Thread)是操作系统能够进行运算调度的最小单位。它被包含在进程之中,是进程中的实际运算单位。在同一个进程中的多个线程之间,线程是彼此独立的,但它们共享进程的内存空间。

我们可以创建两个线程,一个用于播放音频,另一个用于播放视频。这两个线程可以并行运行,从而实现音频和视频的同步播放。

以下是创建音频和视频播放线程的示例代码:

// 创建音频播放线程
std::thread audioThread(&PlayMangent::playAudio, this);

// 创建视频播放线程
std::thread videoThread(&PlayMangent::playVideo, this);

// 等待音频播放线程结束
audioThread.join();

// 等待视频播放线程结束
videoThread.join();

在这段代码中,我们使用 std::thread 类的构造函数来创建线程。这个构造函数接受一个成员函数指针和一个类对象指针,然后创建一个新的线程,并在这个线程中调用指定的成员函数。playAudioplayVideo 函数应该包含音频和视频播放的相关代码。

使用互斥锁保护共享数据

在多线程环境下,数据竞争(Data Race)是一个常见的问题。当多个线程同时访问同一块内存区域,并且至少有一个线程在进行写操作,而且这些线程没有进行任何同步操作,这就会导致数据竞争。

为了避免数据竞争,我们需要使用某种同步机制来保护共享数据。在C++中,互斥锁(Mutex)是一种常用的同步机制。互斥锁可以保证在任何时刻,最多只有一个线程能够访问被保护的数据。

在我们的例子中,音频和视频播放的时间差(milliseconds_diff)是被两个线程共享的数据,所以我们需要使用互斥锁来保护它。

以下是使用互斥锁保护 milliseconds_diff 的示例代码:

std::mutex mtx;  // 创建一个互斥锁

// 在音频播放线程中更新milliseconds_diff
{
    
    
    std::lock_guard<std::mutex> lock(mtx);  // 锁定互斥锁
    milliseconds_diff = calculateDiff();  // 计算音频和视频的时间差
}  // 互斥锁在lock_guard对象销毁时自动解锁

// 在视频播放线程中读取milliseconds_diff
int diff;
{
    
    
    std::lock_guard<std::mutex> lock(mtx);  // 锁定互斥锁
    diff = milliseconds_diff;  // 读取音频和视频的时间差
}  // 互斥锁在lock_guard对象销毁时自动解锁

在这段代码中,我们使用 std::lock_guard 对象来管理互斥锁的锁定和解锁。当创建 std::lock_guard 对象时,互斥锁会被锁定;当 std::lock_guard 对象超出其作用范围时,互斥锁会被自动解锁。这样可以确保即使在异常情况下,互斥锁也能被正确地解锁。

使用 std::this_thread::sleep_for 函数进行延迟

为了实现音视频同步,我们需要能够控制视频播放的速度。一种简单的方法是在播放每一帧视频之后,让线程暂时休眠一段时间。在C++中,我们可以使用 std::this_thread::sleep_for 函数来实现这个功能。

std::this_thread::sleep_for 函数会阻塞当前线程一段时间。这个

函数接受一个表示时间长度的参数,然后阻塞当前线程直到这段时间过去。我们可以用这个函数来实现视频播放的延迟。

以下是使用 std::this_thread::sleep_for 函数进行延迟的示例代码:

int diff;
{
    
    
    std::lock_guard<std::mutex> lock(mtx);  // 锁定互斥锁
    diff = milliseconds_diff;  // 读取音频和视频的时间差
}  // 互斥锁在lock_guard对象销毁时自动解锁

std::this_thread::sleep_for(std::chrono::milliseconds(diff));  // 休眠一段时间

在这段代码中,我们首先获取音频和视频的时间差(diff),然后使用 std::this_thread::sleep_for 函数让线程休眠 diff 毫秒。这样就可以实现视频播放的延迟,从而实现音视频同步。

5. 避免数据竞争和过期的时间差值

在多线程环境下,我们需要特别注意数据竞争(Data Race)和过期的时间差值(Stale Difference)的问题。下面,我们将详细讨论这两个问题,并给出解决方案。

5.1 数据竞争

数据竞争(Data Race)是指多个线程同时访问同一块内存区域,且至少有一个线程在进行写操作,而这些线程没有进行任何同步操作。数据竞争会导致不确定的结果,可能使程序的行为变得难以预测。

在我们的音视频同步程序中,音频和视频线程都需要访问 milliseconds_diff 这个共享数据。如果我们不进行任何同步操作,那么就可能发生数据竞争。为了解决这个问题,我们可以使用互斥锁(Mutex)来保护 milliseconds_diff

互斥锁是一种同步原语,可以用来保护共享数据,避免数据竞争。当一个线程锁定互斥锁时,其他线程就不能锁定这个互斥锁,必须等待这个互斥锁被解锁后才能继续执行。这样就可以确保在任何时刻,只有一个线程能够访问被互斥锁保护的数据。

在 C++ 中,我们可以使用 std::mutex 类来创建互斥锁,使用 std::lock_guard 类来管理互斥锁的生命周期。std::lock_guard 是一个 RAII 风格的类,它在构造函数中锁定互斥锁,在析构函数中解锁互斥锁。这样可以确保即使在异常情况下,互斥锁也能被正确地解锁。

以下是一个示例代码:

std::mutex mtx;  // 创建互斥锁

void update_data() {
    
    
    std::lock_guard<std::mutex> lock(mtx);  // 锁定互斥锁
    // 更新共享数据
}

void read_data() {
    
    
    std::lock_guard<std::mutex> lock(mtx);  // 锁定互斥锁
    // 读取共享数据
}

在这个示例中,我们使用 std::lock_guard 来保证互斥锁在需要的时候被正确地锁定和解锁,从而避免了数据竞争。

5.2 过期的时间差值

过期的时间差值(Stale Difference)是指我们在读取 milliseconds_diff 的值后,但在使用这个值之前,milliseconds_diff 的值已经被其他线程更新了。这样就可能导致我们使用了过期的 milliseconds_diff 值进行延迟。

为了解决这个问题,我们可以在互斥锁的保护下读取和更新 milliseconds_diff。这样可以确保我们读取的 milliseconds_diff 值总是最新的。

以下是一个示例代码:

std::mutex mtx;  // 创建互斥锁
int milliseconds_diff;  // 共享数据

void update_diff() {
    
    
    std::lock_guard<std::mutex> lock(mtx);  // 锁定互斥锁
    // 更新milliseconds_diff的值
}

void use_diff() {
    
    
    int diff;
    {
    
    
        std::lock_guard<std::mutex> lock(mtx);  // 锁定互斥锁
        diff = milliseconds_diff;  // 读取milliseconds_diff的值
    }
    // 使用diff进行延迟
}

在这个示例中,我们在互斥锁的保护下读取 milliseconds_diff 的值,并把它赋给局部变量 diff,然后在没有持有锁的情况下使用 diff 进行延迟。这样就可以避免使用过期的 milliseconds_diff 值,而且不会过长时间地持有锁,从而提高了程序的性能。

总的来说,数据竞争和过期的时间差值都是多线程环境下需要注意的问题。通过使用互斥锁,我们可以有效地解决这两个问题,从而实现音视频同步。

6. 优化多线程程序的性能

在音视频同步的处理中,我们通常需要创建独立的音频和视频播放线程,并使用多线程同步的技术来保护共享的数据。在这个过程中,正确和高效地使用多线程编程的技术是非常重要的。本章我们将讨论如何优化多线程程序的性能。

6.1 缩小互斥锁的保护范围

在多线程编程中,互斥锁(std::mutex)是一种常用的线程同步技术,它可以保护共享的数据不被多个线程同时访问,从而避免数据竞争的问题。然而,互斥锁的使用也会带来一些性能开销。当一个线程持有互斥锁时,其他需要访问受保护数据的线程将被阻塞,直到锁被释放。因此,我们应该尽可能地缩小互斥锁的保护范围,以减少阻塞的时间和提高程序的并行度。

考虑以下代码示例:

std::chrono::milliseconds duration;
{
    
    
    std::lock_guard<std::mutex> lock(m_sync_mutex);
    duration = std::chrono::milliseconds(milliseconds_diff);
}
std::this_thread::sleep_for(duration);

在这段代码中,我们只在互斥锁的保护下读取 milliseconds_diff 的值,并把它赋给 duration。然后我们立即释放锁,这样其他线程就可以访问 milliseconds_diff 了。最后,我们在没有持有锁的情况下休眠。这样可以确保我们在休眠期间不会阻止其他线程访问 milliseconds_diff

这种技术通常被称为“最小化锁持有时间”(Minimize Lock Duration),它是一个广泛接受的多线程编程的最佳实践。Bjarne Stroustrup 在他的《C++ Programming Language》一书中也特别强调了这一点。

6.2 在 std::this_thread::sleep_for 函数中直接使用 std::chrono::milliseconds

在C++中,std::this_thread::sleep_for 函数用于阻塞当前线程一段时间。它接受一个 std::chrono::duration 类型的参数,表示阻塞的时间长度。

在我们的音视频同步处理中,我们需要根据音视频的时间差来延迟视频帧的播放。这个时间差是一个 double 类型的值,表示时间的长度(以毫秒为单位)。为了将这个时间差转换为 std::chrono::duration 类型的值,我们使用了 std::chrono::milliseconds 类型。考虑以下代码示例:

std::this_thread::sleep_for(std::chrono::milliseconds(milliseconds_diff));

在这段代码中,我们在 std::this_thread::sleep_for 函数中直接使用了 std::chrono::milliseconds,将 milliseconds_diff 的值转换为 std::chrono::duration 类型的值。这样就省去了一步额外的赋值操作,使代码更为简洁。

这种技术是基于C++的强大类型系统和灵活的函数重载机制。在Scott Meyers的《Effective Modern C++》一书中,他也推荐使用这种方法来简化代码和提高性能。

下表总结了本章介绍的两种优化技术的对比:

技术 优点 缺点
缩小互斥锁的保护范围 减少阻塞的时间,提高程序的并行度 需要更细致的设计和编程
std::this_thread::sleep_for 函数中直接使用 std::chrono::milliseconds 简化代码,提高性能 可能会降低代码的可读性

在实际的编程中,我们应该根据具体的情况和需求,选择最适合的优化技术。

结语

在我们的编程学习之旅中,理解是我们迈向更高层次的重要一步。然而,掌握新技能、新理念,始终需要时间和坚持。从心理学的角度看,学习往往伴随着不断的试错和调整,这就像是我们的大脑在逐渐优化其解决问题的“算法”。

这就是为什么当我们遇到错误,我们应该将其视为学习和进步的机会,而不仅仅是困扰。通过理解和解决这些问题,我们不仅可以修复当前的代码,更可以提升我们的编程能力,防止在未来的项目中犯相同的错误。

我鼓励大家积极参与进来,不断提升自己的编程技术。无论你是初学者还是有经验的开发者,我希望我的博客能对你的学习之路有所帮助。如果你觉得这篇文章有用,不妨点击收藏,或者留下你的评论分享你的见解和经验,也欢迎你对我博客的内容提出建议和问题。每一次的点赞、评论、分享和关注都是对我的最大支持,也是对我持续分享和创作的动力。


阅读我的CSDN主页,解锁更多精彩内容:泡沫的CSDN主页
在这里插入图片描述

Guess you like

Origin blog.csdn.net/qq_21438461/article/details/131989667
Recommended