Explanation of FFMPEG time concept

1. This chapter introduces:

This chapter mainly introduces the time concept of FFMPEG, including time base, time stamp, time conversion, time comparison and other knowledge points. These knowledge points are very important for us to understand streaming, because audio and video synthesis is essentially a process of various time conversions.

2. Explanation of FFMPEG time base and time stamp:

2.1. Time base (time_base): The time base is also called the time base, which represents how many seconds each scale is. For example: if the video frame rate is 30FPS, then its time scale is {1,30}. It is equivalent to dividing 30 equal divisions within 1 second, that is, displaying a frame of video data every 1/30 second. Specifically as shown in the figure below:

image.png

In FFMPEG, the time reference is represented by the AVRational structure:

image.png

num: It is a reduction of numerator, representing the numerator den: It is a reduction of denominator, representing the denominator

The video time base is based on the frame rate, for example, 50 frames. FFMPEG is represented by AVRational video_timebase = {1, 50}.

The audio time base is based on the sampling rate, for example, the audio sampling rate is 48000HZ. FFMPEG is represented by AVRational audio_timebase = {1, 48000}.

For the package format: the time_base of the flv package format is {1,1000}, and the time_base of the ts package format is {1,90000}

image.png

From the information of ffplay in the above figure, we can see that there is a lot of information about the time base:

tbr: Indicates the frame rate, which is a benchmark. Generally speaking, tbr and fps are consistent. tbn: Indicates the video stream timebase (time base), for example: the data timebase in TS format is 90000, and the video stream timebase in flv format 1000 tbc: indicates the video stream codec timebase, which is generally twice the frame rate. For example: the frame rate is 30fps, then tbc is 60

2.2. Timestamp (PTS, DTS):

image.png

First of all, the timestamp refers to how many grids it occupies in the time axis. The unit of the timestamp is not the specific number of seconds, but the time scale. Only when the time base and timestamp are combined, can we really express what time is.

Let's say: There is a ruler with pts = 30 divisions, time_base = {1,30} each division is 1/30 cm so the length of this ruler = pts * time_base = 30 * 1/30 = 1 cm

PTS: The full name is Presentation Time Stamp (display time stamp), its main function is to measure when the decoded video frame is displayed.

Video PTS calculation: n is the nth video frame, timebase is {1, framerate}, fps is framerate pts = n *(( 1 / timebase) / fps): pts = pts++; Example: n = 1, pts = 1 n = 2, pts = 2 n = 3, pts = 3

Audio PTS calculation: n is the nth audio frame, nb_samples refers to the number of samples (AAC defaults to 1024), timebase is {1, samplerate}, samplerate is the sampling rate

num_pkt = samplerate/nb_samples pts = n * ( ( 1/ timebase) / num_pkt) pts = pts+1024 Example: n = 1, pts = 1024 n = 2, pts = 2048 n = 3, pts = 3072

2.3. DTS: Indicates the time stamp of compression decoding, and PTS is equal to DTS in the absence of B frames. Assuming that a B frame is introduced in the encoding, the time of the B frame must also be calculated.

No B-frames: dts = pts B-frames exist: dts = pts + b_time

[Learning address]: Advanced development of FFmpeg/WebRTC/RTMP/NDK/Android audio and video streaming media
[Article Benefits]: Receive more audio and video learning materials packages, Dachang interview questions, technical videos and learning roadmaps for free, including ( C/C++, Linux, FFmpeg webRTC rtmp hls rtsp ffplay srs, etc.) If you need it, you can click 1079654574 to join the group to receive it~

3. The principle of time conversion:

In FFMPEG, due to different composite streams, the time base is different, for example: ts time base time_base = {1,90000}, assuming a video time_base = {1,30}, we need to synthesize mpegts files, it needs Convert the grid occupied by time_base = {1,30} to the grid occupied by time_base = {1,90000}.

image.png

Use the following APIs for timebase conversion in FFMPEG:

image.png

void av_packet_rescale_ts(AVPacket *pkt, AVRational tb_src, AVRational tb_dst);

The first parameter: AVPacket structure pointer The second parameter: source time base The third parameter: destination time base

The usage of the above api is to convert the time base tb_src of AVPacket into the time base tb_dst. Below we use the example of H264 and AAC time base TS conversion to illustrate the usage of this conversion time base:

Convert video H264 time base to MPEGTS time base: av_packet_rescale_ts *DST_VIDEO_PTS = VIDEO_PTS * VIDEO_TIME_BASE / DST_TIME_BASE Example src_pts=1: dst_pts=1 (1/30)/(1/90000)=3000 H264 {1,30} MPEGTS {1, 90000} pts = 1 pts = 3000 pts = 2 pts = 6000 pts = 3 pts = 9000 pts = 4 pts = 12000

Convert audio AAC time base to MPEGTS time base: av_packet_rescale_ts *DST_AUDIO_PTS = AUDIO_PTS * AUDIO_TIME_BASE / DST_TIME_BASE example src_pts=1024: dst_pts=1024*(1/48000) (1/90000) AAC {1,48000} MPEGTS {1, 90000} pts =1024 pts = 1920 pts =2048 pts = 3840 pts =3072 pts = 5760 pts =4096 pts = 7680

From the above derivation results, it can be seen that if the API of av_packet_rescale_ts is used to convert the video time base, DST_VIDEO_PTS = VIDEO_PTS * VIDEO_TIME_BASE / DST_TIME_BASE is actually used to calculate the video time stamp of the push stream.

Similarly, using av_packet_rescale_ts to convert the audio time base actually uses DST_AUDIO_PTS = AUDIO_PTS * AUDIO_TIME_BASE / DST_TIME_BASE to calculate the audio time stamp of our actual streaming.

4. Comparison of FFMPEG timestamps:

image.png

int av_compare_ts(int64_t ts_a, AVRational tb_a, int64_t ts_b, AVRational tb_b) The first parameter: ts_a which refers to the current timestamp relative to tb_a The second parameter: the time base corresponding to ts_a The third parameter: ts_b which refers to The fourth parameter is the current timestamp relative to tb_a: the time base corresponding to ts_b Return value judgment: When ret == -1, the timestamp of ts_a is faster than the timestamp of ts_b. When ret == 1, the timestamp of ts_a is slower than the timestamp of ts_b. When ret == 0, the timestamp of ts_a is equal to the timestamp of ts_b

The main function of av_compare_ts is to perform real-time comparison of timestamps, and it can ensure that the current timestamp is accurate in real time. It will not cause timestamp confusion. The so-called confusion is equivalent to: the video timestamp is treated as an audio timestamp, and the audio timestamp is treated as a video timestamp.

The following picture is the process of encoding video, audio, comparing timestamps and synthesizing composite streams:

image.png

The video timestamp is compared with the video frame rate, and the audio timestamp is compared (here we default that the tb_a time base is the video time base, and the tb_b time base is the audio time base). So when the comparison result ret <= 0, the video data must be taken out, otherwise the audio data should be taken out. After taking out the video data, use av_packet_rescale_ts to perform time conversion, and also perform time conversion after taking out the audio data. After the audio and video data is time-converted, use av_interleaved_write_frame to write the composite stream.

Original link: Ten: Explanation of the concept of FFMPEG time - short book

Guess you like

Origin blog.csdn.net/irainsa/article/details/130589514