Audio and video re-encoding timestamp synchronization problem

In the scene where audio and video are re-encoded and need to be synchronized, several basic principles need to be followed (otherwise the audio and video will be stuck and not smooth. Take the audio aac encoding frequency of 44.1k and the video h264 encoding 25 frame rate as an example) :

1. Ensure that the arrival interval of audio and video frames at the input end is basically accurate.

The duration of each frame of audio aac is 23.2ms (1000*1024/44100), and the duration of each frame of video is 40ms (1000/25). So, the arrival frequency of the original audio samples used for encoding (or the frequency obtained from the buffer) should be 441 samples/per channel/ per 10ms (assuming 16bits per sample, ie 882 bytes/channel/10ms, if the original audio The sampling rate is not 44.1k, it needs to be resampled before encoding); the original video frame arrival frequency (or the frequency obtained from the buffer) should be 1 frame/per 40ms (the video frame rate may need to be resampled). If the output audio and video streams are not smooth, you can first check the input interval of the audio and video streams at the input end.

2. Ensure that the audio and video stream timestamps at the output end use the same reference frame.

For example, both audio and video streams use the current system time as the same time reference system, but audio and video streams can have different system time starting points. For example, the audio stream starts at 1492764087000ms first, and the video starts at 1492764087700ms later 700ms. For example, if a 32-bit timestamp is used in rtmp, the audio and video streams can only use relative timestamps. Following the example above, the video stream starts from 0 when the audio timestamp increases to 700ms, which means that the video stream starts at 700ms of the audio stream. , so that the synchronization effect can be achieved. Another time stamp scheme is that each audio and video frame increases at a fixed interval. For example, the audio and video time stamps start from 0, each aac frame of audio increases by 23.2ms, and each video frame increases by 40ms. Under normal circumstances, the audio and video streams are simultaneously sent from 0 frames at respective intervals, but there are cases where the video stream is later than the audio stream or the audio stream is later than the video stream. In this case, timestamp synchronization is required, and the later stream start time should be set according to the time of the earlier stream.

The benefits of this article, C++ audio and video learning kits, technical videos , including (audio and video development, interview questions, FFmpeg , webRTC , rtmp , hls , rtsp , ffplay , srs ) ↓↓↓↓↓↓ See below↓↓Click at the bottom of the article Receive↓↓

3. Ensure that the audio and video intervals are basically accurate during cross output.

The output end here is completely equivalent to a hardware encoder. Only by ensuring that the audio and video frame interval of the cross output is stable, the smoothness of the playback end can be guaranteed. For example, rtmp, the output interval of each aac audio frame should be about 23ms, and the output interval of each video frame should be about 40ms, and the audio and video frames are cross output. In other words, send an aac audio frame every 23ms and a video frame every 40ms (you can use two separate threads to send the audio and video streams separately). If the above two problems are excluded and the playback still cannot be played smoothly, you can check whether this link is normal.

In short, this link of recoding and synchronization must be based on mathematical measurements.

The following is an adjusted implementation version based on the above rules in an actual project:

Where the actual environment does not meet the above assumptions:

1. The front-end input frame rate changes, x264 cannot stably output a fixed 24/25 frames per second.

Time stamp adjustment scheme: The standard audio and video frame interval is adjusted to be accumulated according to the actual time offset between frames. For example, audio is no longer accumulated at every frame interval of 23ms. The reason is that when the system time interval is 1s, the audio timestamp will increase by less than 1s; video is no longer accumulated at every frame interval of 40ms, but also changed to the actual frame interval. The interval, for example, the interval between each frame changes after each encoding, and the frame rate may be less than 24/25. Finally, the synchronization of audio and video timestamps is realized. Increasing the audio and video timestamps by the absolute time interval of the system is itself synchronized; if you use other artificially formulated timestamp increase schemes (such as the above audio and video increments at fixed intervals), you need to introduce a synchronization module to process the audio and video time. The problem of incremental offset between stamps is troublesome and difficult to precisely control. This method is actually the second principle mentioned above: audio and video use the same time reference system.

Several tried and wrong solutions in the project:

  1. The above standard audio and video frame intervals are accumulated to calculate timestamps. In this case, there will be a problem that the offset of the audio and video time stamps will become larger and larger. Maybe the video is larger than the audio or the audio is larger than the video, and the time stamp gap will become larger and larger.

  2. The video stream uses the original frame timestamp as a reference. This method is also not desirable, because the timestamp of the original frame is based on the frame rate of the original video stream. After you re-encode in x264, the frame rate is not the same. The interval between frames also has no reference significance.

The benefits of this article, C++ audio and video learning kits, technical videos , including (audio and video development, interview questions, FFmpeg , webRTC , rtmp , hls , rtsp , ffplay , srs ) ↓↓↓↓↓↓ See below↓↓Click at the bottom of the article Receive↓↓

Guess you like

Origin blog.csdn.net/m0_60259116/article/details/126390859