Analysis of ffplay player (6)----Audio and video synchronization analysis

1. Basics of audio and video synchronization

Video and audio are different threads, and the audio and video frames of the same pts will not be decoded at the same time, so audio and video synchronization is required;

1.1 Audio and video synchronization strategy

  1. Based on audio
    • If the video is slow, some video frames will be lost (the visual experience is frame dropping)
    • When the video is too fast, continue rendering the previous frame.
  2. Based on video
    • If the audio is slow, the playback will be accelerated (or frames will be dropped. If frames are dropped, the sound will be interrupted, and the experience will be particularly poor)
    • If the audio is too fast, slow down the data point (or repeat the previous frame)
    • Changing audio speed involves resampling
  3. Based on external clock
    • Comprehensive 1 2, change the playback speed according to the external clock
  4. Various video and audio outputs are not synchronized (generally not like this)

Since people are more sensitive to auditory changes than visual changes, audio is often used as the benchmark when choosing a synchronization strategy!

1.2 Audio and video synchronization concept

  1. DTS (Decoding Time Stamp): decoding timestamp, which tells the player when to decode this frame of data.
  2. PTS (Presentation Time Stamp): displays the timestamp, which tells the player when to play this frame of data
  3. time_base time base: it is the unit used by FFmpeg

If there are no B frames in the video, the order of DTS and PTS will be the same.

The time_base structure is:

typedef struct AVRational
    int num; ///< 分子
    int den; ///< 分母
} AVRational;

Calculate timestamp:

timestame(second)=pts*av_qtd(st->time_base)

Calculate frame duration:

time(second)=duration*av_qtd(st->time_base)

Different time base conversion:

int64_t av_rescale_q(int64_t a, AVRational bq, AVRational cq)

The concept of "clock" is needed when synchronizing audio and video. Audio, video, and external clocks all have their own clocks. Each one sets its own clock, and who is the benchmark when getting the clock?

typedef struct Clock {
    
    
    double	pts;            // 时钟基础, 当前帧(待播放)显示时间戳,播放后,当前帧变成上一帧
    // 当前pts与当前系统时钟的差值, audio、video对于该值是独立的
    double	pts_drift;      // clock base minus time at which we updated the clock
    // 当前时钟(如视频时钟)最后一次更新时间,也可称当前时钟时间
    double	last_updated;   // 最后一次更新的系统时钟
    double	speed;          // 时钟速度控制,用于控制播放速度
    // 播放序列,所谓播放序列就是一段连续的播放动作,一个seek操作会启动一段新的播放序列
    int	serial;             // clock is based on a packet with this serial
    int	paused;             // = 1 说明是暂停状态
    // 指向packet_serial
    int *queue_serial;      //指向当前数据包队列串行的指针,用于过时时钟检测
} Clock;

How the clock works:

  1. You need to constantly call set_clock_at to "correct the time", which requires pts, serial, time (system time)

  2. The time obtained is also an estimate, and the estimate is estimated through pts_drift of "time adjustment".

The timeline in the picture is incremented by time. We call set_Clock to adjust the time. Assume that pts lags behind time, then pts_drift=pts-time, and calculate the difference between pts and time.

After a while, you need get_clock to get pts, then you can calculate it through the difference pts_drift and time just now, pts=time+pts_drift, time can be obtained through the av_gettime_relative function provided by ffmpeg

1.3 Time units in FFmpeg

OFF_TIME_BASE

  • #define AV_TIME_BASE 1000000
  • FFmpeg internal timing unit

AV_TIME_BASE_Q

  • #define AV_TIME_BASE_Q (AVRational){1, AV_TIME_BASE}
  • The fractional representation of FFmpeg's internal time base is the reciprocal of AV_TIME_BASE

Time base conversion formula

  • timestamp (FFmpeg internal timestamp)=AV_TIME_BASE*time (seconds)
  • time(second)=timestamp*AV_TIME_BASE_Q

1.4 Time_base/duration analysis of different structures

FFmpeg has many time bases, corresponding to different structures, and the value of each time base is different.

  • AVFormatContext

    • duration: represents the length of the entire code stream. To obtain the normal duration, you need to divide it by AV_TIME_BASE . The result is seconds.
  • AVStream

    • time_base: The unit is seconds, for example, AAC audio may be {1,44100}, TS stream may be {1,90KHZ}
    • duration: indicates the duration of the data stream, the unit is AVStream->time_base
  • The time_base of AVStream is set in dumuxer or muxer

    TS

    • avpriv_set_pts_info(st,33,1,90000)

    FLV

    • avpriv_set_pts_info(st,32,1,10000)

    MP4

    • avpriv_set_pts_info(st,64,1,sc->time_scale)
    • avpriv_set_pts_info(st,64,1,track->timescale)

1.5 Analysis of pts/dts of different structures

Both AVPacket and AVFrame time come from AVStream->time_base

The main point here is that the time_base of each stream during encoding is set after avformat_write_header. The specific source code will not be analyzed for the time being, but don’t get caught up in encoding and don’t know where time_base comes from!

1.6 Analysis of Frame structure in ffplay

typedef struct Frame {
    
    
    AVFrame		*frame;         // 指向数据帧
    AVSubtitle	sub;            // 用于字幕
    int		serial;             // 帧序列,在seek的操作时serial会变化
    double		pts;            // 时间戳,单位为秒
    double		duration;       // 该帧持续时间,单位为秒
    int64_t		pos;            // 该帧在输入文件中的字节位置
    int		width;              // 图像宽度
    int		height;             // 图像高读
    int		format;             // 对于图像为(enum AVPixelFormat),
    // 对于声音则为(enum AVSampleFormat)
    AVRational	sar;            // 图像的宽高比(16:9,4:3...),如果未知或未指定则为0/1
    int		uploaded;           // 用来记录该帧是否已经显示过?
    int		flip_v;             // =1则垂直翻转, = 0则正常播放
} Frame;

Why are pts and duration in the Frame structure double types?

Because we converted pts and duration when writing the frame, which is implemented in the queue_picture function, the units of pts and duration at this time should be seconds !

1.7 Vidoe Frame PTS acquisition and correction

pts correction

frame->pts=frame->best_effort_timestamp;

/**
*使用各种启发式估计的帧时间戳,以流时基为单位
*-编码:未使用
*-解码:由libavcodec设置,由用户读取。
*/
int64_t best_effort_timestamp;

Correct the pts of the frame. The specific correction algorithm is set by libavcodec. The frame timestamp is basically the same as pts. If the current pts has an unreasonable value, a series of calibrations will be attempted to obtain this more reasonable value.

1.8 Obtaining Audio Frame PTS

ffplay has 3 pts conversions

  1. Converted from AVStream->time_base to {1, sampling rate}

    I think it is resampling that causes the sampling rate to change. If you also use AVStream->time_base, an error will occur!

    frame->pts = av_rescale_q(frame->pts, d->avctx->pkt_timebase, tb);
    
  2. Convert the sampling rate into seconds, which is what is needed when queue_picture is put into the frame queue.

    af->pts = (frame->pts == AV_NOPTS_VALUE) ? NAN : frame->pts * av_q2d(tb);
    
  3. Estimated adjustments based on the amount of data copied to SDL

    set_clock_at(&is->audclk, is->audio_clock -
    			(double)(2 * is->audio_hw_buf_size + is->audio_write_buf_size)
    			/ is->audio_tgt.bytes_per_sec,
    			is->audio_clock_serial,
    			audio_callback_time / 1000000.0);
    

2. Based on audio

ffplay adopts audio synchronization mode by default, and the clock setting of audio synchronization mode is set in sdl_audio_callback

static void sdl_audio_callback(void *opaque, Uint8 *stream, int len)
{
    
    
  
    audio_callback_time = av_gettime_relative(); // while可能产生延迟
    
       .............................................
    
    is->audio_write_buf_size = is->audio_buf_size - is->audio_buf_index;
    /* Let's assume the audio driver that is used by SDL has two periods. */
    if (!isnan(is->audio_clock)) {
    
    
        set_clock_at(&is->audclk, is->audio_clock -
                                      (double)(2 * is->audio_hw_buf_size + is->audio_write_buf_size)
                                          / is->audio_tgt.bytes_per_sec,
                     is->audio_clock_serial,
                     audio_callback_time / 1000000.0);
        sync_clock_to_slave(&is->extclk, &is->audclk);
    }
}

This is the same as the video playback thread I analyzed before. You can take a look.

The following process is the main process of video playback. You can also read the previous explanation.

]

The point is that it is more important to calculate the display duration of the previous frame. Take a look at the code:

static void video_refresh(void *opaque, double *remaining_time)
{
    
    
    VideoState *is = opaque;
    double time;

    Frame *sp, *sp2;

    if (!is->paused && get_master_sync_type(is) == AV_SYNC_EXTERNAL_CLOCK && is->realtime)
        check_external_clock_speed(is);

    if (!display_disable && is->show_mode != SHOW_MODE_VIDEO && is->audio_st) {
    
    
        time = av_gettime_relative() / 1000000.0;
        if (is->force_refresh || is->last_vis_time + rdftspeed < time) {
    
    
            video_display(is);
            is->last_vis_time = time;
        }
        *remaining_time = FFMIN(*remaining_time, is->last_vis_time + rdftspeed - time);
    }

    if (is->video_st) {
    
    
    retry:
        if (frame_queue_nb_remaining(&is->pictq) == 0) {
    
    // 帧队列是否为空
            // nothing to do, no picture to display in the queue
            // 什么都不做,队列中没有图像可显示
        } else {
    
     // 重点是音视频同步
            double last_duration, duration, delay;
            Frame *vp, *lastvp;

            /* dequeue the picture */
            // 从队列取出上一个Frame
            lastvp = frame_queue_peek_last(&is->pictq);//读取上一帧
            vp = frame_queue_peek(&is->pictq);  // 读取待显示帧
            // lastvp 上一帧(正在显示的帧)
            // vp 等待显示的帧

            if (vp->serial != is->videoq.serial) {
    
    
                // 如果不是最新的播放序列,则将其出队列,以尽快读取最新序列的帧
                frame_queue_next(&is->pictq);
                goto retry;
            }

            if (lastvp->serial != vp->serial) {
    
    
                // 新的播放序列重置当前时间
                is->frame_timer = av_gettime_relative() / 1000000.0;
            }

            if (is->paused)
            {
    
    
                goto display;
                printf("视频暂停is->paused");
            }
            /* compute nominal last_duration */
            //lastvp上一帧,vp当前帧 ,nextvp下一帧
            //last_duration 计算上一帧应显示的时长
            last_duration = vp_duration(is, lastvp, vp);

            // 经过compute_target_delay方法,计算出待显示帧vp需要等待的时间
            // 如果以video同步,则delay直接等于last_duration。
            // 如果以audio或外部时钟同步,则需要比对主时钟调整待显示帧vp要等待的时间。
            delay = compute_target_delay(last_duration, is); // 上一帧需要维持的时间
            time= av_gettime_relative()/1000000.0;
            // is->frame_timer 实际上就是上一帧lastvp的播放时间,
            // is->frame_timer + delay 是待显示帧vp该播放的时间
            if (time < is->frame_timer + delay) {
    
     //判断是否继续显示上一帧
                // 当前系统时刻还未到达上一帧的结束时刻,那么还应该继续显示上一帧。
                // 计算出最小等待时间
                *remaining_time = FFMIN(is->frame_timer + delay - time, *remaining_time);
                goto display;
            }

            // 走到这一步,说明已经到了或过了该显示的时间,待显示帧vp的状态变更为当前要显示的帧

            is->frame_timer += delay;   // 更新当前帧播放的时间
            if (delay > 0 && time - is->frame_timer > AV_SYNC_THRESHOLD_MAX) {
    
    
                is->frame_timer = time; //如果和系统时间差距太大,就纠正为系统时间
            }
            SDL_LockMutex(is->pictq.mutex);
            if (!isnan(vp->pts))
                update_video_pts(is, vp->pts, vp->pos, vp->serial); // 更新video时钟
            SDL_UnlockMutex(is->pictq.mutex);
            //丢帧逻辑
            if (frame_queue_nb_remaining(&is->pictq) > 1) {
    
    //有nextvp才会检测是否该丢帧
                Frame *nextvp = frame_queue_peek_next(&is->pictq);
                duration = vp_duration(is, vp, nextvp);
                if(!is->step        // 非逐帧模式才检测是否需要丢帧 is->step==1 为逐帧播放
                    && (framedrop>0 ||      // cpu解帧过慢
                        (framedrop && get_master_sync_type(is) != AV_SYNC_VIDEO_MASTER)) // 非视频同步方式
                    && time > is->frame_timer + duration // 确实落后了一帧数据
                    ) {
    
    
                    printf("%s(%d) dif:%lfs, drop frame\n", __FUNCTION__, __LINE__,
                           (is->frame_timer + duration) - time);
                    is->frame_drops_late++;             // 统计丢帧情况
                    frame_queue_next(&is->pictq);       // 这里实现真正的丢帧
                    //(这里不能直接while丢帧,因为很可能audio clock重新对时了,这样delay值需要重新计算)
                    goto retry; //回到函数开始位置,继续重试
                }
            }
........
}

Focus on delay = compute_target_delay(last_duration, is);

static double compute_target_delay(double delay, VideoState *is)
{
    
    
    double sync_threshold, diff = 0;

    /* update delay to follow master synchronisation source */
    /* 如果发现当前主Clock源不是video,则计算当前视频时钟与主时钟的差值 */
    if (get_master_sync_type(is) != AV_SYNC_VIDEO_MASTER) {
    
    
        /* if video is slave, we try to correct big delays by
           duplicating or deleting a frame
           通过重复帧或者删除帧来纠正延迟*/
        diff = get_clock(&is->vidclk) - get_master_clock(is);

        /* skip or repeat frame. We take into account the
           delay to compute the threshold. I still don't know
           if it is the best guess */
        sync_threshold = FFMAX(AV_SYNC_THRESHOLD_MIN,
                               FFMIN(AV_SYNC_THRESHOLD_MAX, delay));
        if (!isnan(diff) && fabs(diff) < is->max_frame_duration) {
    
     // diff在最大帧duration内
            if (diff <= -sync_threshold) {
    
          // 视频已经落后了
                delay = FFMAX(0, delay + diff); // 上一帧持续的时间往小的方向去调整
            }
            else if (diff >= sync_threshold && delay > AV_SYNC_FRAMEDUP_THRESHOLD) {
    
    
                //  delay = 0.2秒
                // diff  = 1秒
                // delay = 0.2 + 1 = 1.2
                // 视频超前
                //AV_SYNC_FRAMEDUP_THRESHOLD是0.1,此时如果delay>0.1, 如果2*delay时间就有点久
                delay = delay + diff; // 上一帧持续时间往大的方向去调整
                av_log(NULL, AV_LOG_INFO, "video: delay=%0.3f A-V=%f\n",
                       delay, -diff);
            }
            else if (diff >= sync_threshold) {
    
    
                // 上一帧持续时间往大的方向去调整
                // delay = 0.2 *2 = 0.4
                delay = 2 * delay; // 保持在 2 * AV_SYNC_FRAMEDUP_THRESHOLD内, 即是2*0.1 = 0.2秒内
                //                delay = delay + diff; // 上一帧持续时间往大的方向去调整
            } else {
    
    
                // 音视频同步精度在 -sync_threshold ~ +sync_threshold
                // 其他条件就是 delay = delay; 维持原来的delay, 依靠frame_timer+duration和当前时间进行对比
            }
        }
    } else {
    
    
        // 如果是以video为同步,则直接返回last_duration
    }

    av_log(NULL, AV_LOG_TRACE, "video: delay=%0.3f A-V=%f\n",
这个           delay, -diff);

    return delay;
}

This function determines the value of delay by comparing the calculated difference between the video clock and audio clock with delay.

step:

  1. To calculate the diff, use the value of the video clock-audio clock

  2. sync_threshold is the value of delay in the range of AV_SYNC_THRESHOLD_MIN and AV_SYNC_THRESHOLD_MIN. The calculation method is:

     sync_threshold = FFMAX(AV_SYNC_THRESHOLD_MIN,
                                   FFMIN(AV_SYNC_THRESHOLD_MAX, delay));
    
  3. If the difference in diff is larger than the set max_frame_duration, the delay will be directly returned without any adjustment. There will be an external decision to drop the frame or continue with the previous frame.

  4. If diff is less than max_frame_duration, then we need to adjust the delay, because the playback time cannot always be fixed as duration, but is determined by more practical errors. For example, if the audio is half a frame faster than the video, then the playback time of the video cannot be too long. , it has to be shortened. If it is half a frame slower, the video playback time will have to be extended.

  5. if (diff <= -sync_threshold) {
          
                // 视频已经落后了
                    delay = FFMAX(0, delay + diff); // 上一帧持续的时间往小的方向去调整
                }
                else if (diff >= sync_threshold && delay > AV_SYNC_FRAMEDUP_THRESHOLD) {
          
          
                    //  delay = 0.2秒
                    // diff  = 1秒
                    // delay = 0.2 + 1 = 1.2
                    // 视频超前
                    //AV_SYNC_FRAMEDUP_THRESHOLD是0.1,此时如果delay>0.1, 如果2*delay时间就有点久
                    delay = delay + diff; // 上一帧持续时间往大的方向去调整
                    av_log(NULL, AV_LOG_INFO, "video: delay=%0.3f A-V=%f\n",
                           delay, -diff);
                }
                else if (diff >= sync_threshold) {
          
          
                    // 上一帧持续时间往大的方向去调整
                    // delay = 0.2 *2 = 0.4
                    delay = 2 * delay; // 保持在 2 * AV_SYNC_FRAMEDUP_THRESHOLD内, 即是2*0.1 = 0.2秒内
                    //                delay = delay + diff; // 上一帧持续时间往大的方向去调整
                } else {
          
          
                    // 音视频同步精度在 -sync_threshold ~ +sync_threshold
                    // 其他条件就是 delay = delay; 维持原来的delay, 依靠frame_timer+duration和当前时间进行对比
                }
    
    • The first if (diff <= -sync_threshold)

      It means that the video is lagging behind. The diff at this time is a negative number, so the delay should become smaller. The method adopted is delay = FFMAX (0, delay + diff); after all, it cannot be 0.

    • 第二个if (diff >= sync_threshold && delay > AV_SYNC_FRAMEDUP_THRESHOLD)

      It means that the video is faster than the audio, and the display time is greater than AV_SYNC_FRAMEDUP_THRESHOLD, then the measure to be taken is to extend delay = delay + diff;

    • The third if (diff >= sync_threshold) is faster than the audio, and the display time is less than AV_SYNC_FRAMEDUP_THRESHOLD, the measure taken is delay = 2 * delay

3. Based on video

When the media stream contains only video components, the video will be used as the basis!

When using audio as the benchmark, the strategy of dropping frames or waiting is used, but when using video as the benchmark, you cannot simply drop frames, because people are very sensitive to sound, and it is very easy to feel it once the sound is interrupted!

3.1 Audio main process

Insert image description here

The audio synchronization strategy is to resample, changing the number of samples in a frame to achieve a variable speed effect. If the audio is slow, reduce the number of samples. If the audio is fast, increase the number of samples.

See audio_decode_frame for specific implementation:

static int audio_decode_frame(VideoState *is)
{
    
    
    int data_size, resampled_data_size;
    int64_t dec_channel_layout;
    av_unused double audio_clock0;
    int wanted_nb_samples;
    Frame *af;

    if (is->paused)
        return -1;

    do {
    
    
        // 若队列头部可读,则由af指向可读帧
        if (!(af = frame_queue_peek_readable(&is->sampq)))
            return -1;
        frame_queue_next(&is->sampq);
    } while (af->serial != is->audioq.serial);

    // 根据frame中指定的音频参数获取缓冲区的大小 af->frame->channels * af->frame->nb_samples * 2
    data_size = av_samples_get_buffer_size(NULL,
                                           af->frame->channels,
                                           af->frame->nb_samples,
                                           af->frame->format, 1);
    // 获取声道布局
    dec_channel_layout =
        (af->frame->channel_layout &&
         af->frame->channels == av_get_channel_layout_nb_channels(af->frame->channel_layout)) ?
            af->frame->channel_layout : av_get_default_channel_layout(af->frame->channels);
    // 获取样本数校正值:若同步时钟是音频,则不调整样本数;否则根据同步需要调整样本数
    wanted_nb_samples = synchronize_audio(is, af->frame->nb_samples);
    // is->audio_tgt是SDL可接受的音频帧数,是audio_open()中取得的参数
    // 在audio_open()函数中又有"is->audio_src = is->audio_tgt""
    // 此处表示:如果frame中的音频参数 == is->audio_src == is->audio_tgt,
    // 那音频重采样的过程就免了(因此时is->swr_ctr是NULL)
    // 否则使用frame(源)和is->audio_tgt(目标)中的音频参数来设置is->swr_ctx,
    // 并使用frame中的音频参数来赋值is->audio_src
    if (af->frame->format           != is->audio_src.fmt            || // 采样格式
        dec_channel_layout      != is->audio_src.channel_layout || // 通道布局
        af->frame->sample_rate  != is->audio_src.freq           || // 采样率
        // 第4个条件, 要改变样本数量, 那就是需要初始化重采样
        (wanted_nb_samples      != af->frame->nb_samples && !is->swr_ctx) // samples不同且swr_ctx没有初始化
        ) {
    
    
        swr_free(&is->swr_ctx);
        is->swr_ctx = swr_alloc_set_opts(NULL,
                                         is->audio_tgt.channel_layout,  // 目标输出
                                         is->audio_tgt.fmt,
                                         is->audio_tgt.freq,
                                         dec_channel_layout,            // 数据源
                                         af->frame->format,
                                         af->frame->sample_rate,
                                         0, NULL);
        if (!is->swr_ctx || swr_init(is->swr_ctx) < 0) {
    
    
            av_log(NULL, AV_LOG_ERROR,
                   "Cannot create sample rate converter for conversion of %d Hz %s %d channels to %d Hz %s %d channels!\n",
                   af->frame->sample_rate, av_get_sample_fmt_name(af->frame->format), af->frame->channels,
                   is->audio_tgt.freq, av_get_sample_fmt_name(is->audio_tgt.fmt), is->audio_tgt.channels);
            swr_free(&is->swr_ctx);
            return -1;
        }
        is->audio_src.channel_layout = dec_channel_layout;
        is->audio_src.channels       = af->frame->channels;
        is->audio_src.freq = af->frame->sample_rate;
        is->audio_src.fmt = af->frame->format;
    }

    if (is->swr_ctx) {
    
    
        // 重采样输入参数1:输入音频样本数是af->frame->nb_samples
        // 重采样输入参数2:输入音频缓冲区
        const uint8_t **in = (const uint8_t **)af->frame->extended_data; // data[0] data[1]

        // 重采样输出参数1:输出音频缓冲区尺寸
        uint8_t **out = &is->audio_buf1; //真正分配缓存audio_buf1,指向是用audio_buf
        // 重采样输出参数2:输出音频缓冲区
        int out_count = (int64_t)wanted_nb_samples * is->audio_tgt.freq / af->frame->sample_rate
                        + 256;

        int out_size  = av_samples_get_buffer_size(NULL, is->audio_tgt.channels,
                                                  out_count, is->audio_tgt.fmt, 0);
        int len2;
        if (out_size < 0) {
    
    
            av_log(NULL, AV_LOG_ERROR, "av_samples_get_buffer_size() failed\n");
            return -1;
        }
        // 如果frame中的样本数经过校正,则条件成立
        if (wanted_nb_samples != af->frame->nb_samples) {
    
    
            int sample_delta = (wanted_nb_samples - af->frame->nb_samples) * is->audio_tgt.freq
                               / af->frame->sample_rate;
            int compensation_distance = wanted_nb_samples * is->audio_tgt.freq / af->frame->sample_rate;
            // swr_set_compensation
            if (swr_set_compensation(is->swr_ctx,
                                     sample_delta,
                                     compensation_distance) < 0) {
    
    
                av_log(NULL, AV_LOG_ERROR, "swr_set_compensation() failed\n");
                return -1;
            }
        }
        av_fast_malloc(&is->audio_buf1, &is->audio_buf1_size, out_size);
        if (!is->audio_buf1)
            return AVERROR(ENOMEM);
        // 音频重采样:返回值是重采样后得到的音频数据中单个声道的样本数
        len2 = swr_convert(is->swr_ctx, out, out_count, in, af->frame->nb_samples);
        if (len2 < 0) {
    
    
            av_log(NULL, AV_LOG_ERROR, "swr_convert() failed\n");
            return -1;
        }
        if (len2 == out_count) {
    
    
            av_log(NULL, AV_LOG_WARNING, "audio buffer is probably too small\n");
            if (swr_init(is->swr_ctx) < 0)
                swr_free(&is->swr_ctx);
        }
        // 重采样返回的一帧音频数据大小(以字节为单位)
        is->audio_buf = is->audio_buf1;
        resampled_data_size = len2 * is->audio_tgt.channels * av_get_bytes_per_sample(is->audio_tgt.fmt);
    } else {
    
    
        // 未经重采样,则将指针指向frame中的音频数据
        is->audio_buf = af->frame->data[0]; // s16交错模式data[0], fltp data[0] data[1]
        resampled_data_size = data_size;
    }

    audio_clock0 = is->audio_clock;
    /* update the audio clock with the pts */
    if (!isnan(af->pts))
        is->audio_clock = af->pts + (double) af->frame->nb_samples / af->frame->sample_rate;
    else
        is->audio_clock = NAN;
    is->audio_clock_serial = af->serial;
    return resampled_data_size;
}

Here we focus on how to calculate the value of wanted_nb_samples and analyze the synchronize_audio function:

static int synchronize_audio(VideoState *is, int nb_samples)
{
    
    
    int wanted_nb_samples = nb_samples;

    /* if not master, then we try to remove or add samples to correct the clock */
    if (get_master_sync_type(is) != AV_SYNC_AUDIO_MASTER) {
    
    //不是以音频为基准时
        double diff, avg_diff;
        int min_nb_samples, max_nb_samples;

        diff = get_clock(&is->audclk) - get_master_clock(is);//过去音频和默认时钟的差值

        if (!isnan(diff) && fabs(diff) < AV_NOSYNC_THRESHOLD) {
    
    //如果差值大于AV_NOSYNC_THRESHOLD则正常播放,不做任何处理,一般这种情况就出现错误了
          
            is->audio_diff_cum = diff + is->audio_diff_avg_coef * is->audio_diff_cum;
            if (is->audio_diff_avg_count < AUDIO_DIFF_AVG_NB) {
    
    
                /* not enough measures to have a correct estimate */
                is->audio_diff_avg_count++; // 连续20次不同步才进行校正
            } else {
    
    
                /* estimate the A-V difference */
                avg_diff = is->audio_diff_cum * (1.0 - is->audio_diff_avg_coef);
                //                avg_diff = diff;
                if (fabs(avg_diff) >= is->audio_diff_threshold) {
    
    
                    wanted_nb_samples = nb_samples + (int)(diff * is->audio_src.freq);
                    min_nb_samples = ((nb_samples * (100 - SAMPLE_CORRECTION_PERCENT_MAX) / 100));
                    max_nb_samples = ((nb_samples * (100 + SAMPLE_CORRECTION_PERCENT_MAX) / 100));
                    // av_clip 用来限制wanted_nb_samples最终落在 min_nb_samples~max_nb_samples
                    // nb_samples *(90%~110%)
                    wanted_nb_samples = av_clip(wanted_nb_samples, min_nb_samples, max_nb_samples);
                }
                av_log(NULL, AV_LOG_INFO, "diff=%f adiff=%f sample_diff=%d apts=%0.3f %f\n",
                       diff, avg_diff, wanted_nb_samples - nb_samples,
                       is->audio_clock, is->audio_diff_threshold);
            }
        } else {
    
    
            // > AV_NOSYNC_THRESHOLD 阈值,该干嘛就干嘛
            /* too big difference : may be initial PTS errors, so
               reset A-V filter */
            is->audio_diff_avg_count = 0;
            is->audio_diff_cum       = 0;   // 恢复正常后重置为0
        }
    }

    return wanted_nb_samples;
}

audio_diff_cum in the code is to calculate the weighted sum

Let’s first take a look at what audio_diff_avg_coef is?

is->audio_diff_avg_coef  = exp(log(0.01) / AUDIO_DIFF_AVG_NB)//exp 自然常数

This value is fixed and is a fixed ratio when calculating the weighted sum.

Analyze it carefully: s->audio_diff_cum = diff + is->audio_diff_avg_coef * is->audio_diff_cum;

This number will eventually cycle through AUDIO_DIFF_AVG_NB times to obtain the weighted sum.

而avg_diff = is->audio_diff_cum * (1.0 - is->audio_diff_avg_coef);

It is the result avg_diff obtained by the weighted sum/weighted sum, and avg_diff is used later as a difference comparison.

is->audio_diff_threshold = (double)(is->audio_hw_buf_size) / is->audio_tgt.bytes_per_sec;

This value is the threshold time for calculating the difference. If it is greater than this value, synchronization must be performed~

The synchronization algorithm is simple

Get the number of samples of the difference through diff*sampling rate

wanted_nb_samples = nb_samples + (int)(diff * is->audio_src.freq);//获取想要的样本数
min_nb_samples = ((nb_samples * (100 - SAMPLE_CORRECTION_PERCENT_MAX) / 100));//样本数减少10%的样本数
max_nb_samples = ((nb_samples * (100 + SAMPLE_CORRECTION_PERCENT_MAX) / 100));//样本数增加10%的样本数
// av_clip 用来限制wanted_nb_samples最终落在 min_nb_samples~max_nb_samples
// nb_samples *(90%~110%)
wanted_nb_samples = av_clip(wanted_nb_samples, min_nb_samples, max_nb_samples);//av_clip就是取中间值,如果大于max就取max,小于min就取min,反正就是将wanted_nb_samples保证在nb_samples *(90%~110%)

Then when audio_decode_frame is reached, resampling will be performed. Resampling will directly call swr_set_compensation, which is the sample compensation function!

4. Based on external clock

The external clock is based on the first two clocks

Set via sync_clock_to_slave function

static void sync_clock_to_slave(Clock *c, Clock *slave)
{
    
    
    double clock = get_clock(c);
    double slave_clock = get_clock(slave);
    if (!isnan(slave_clock) && (isnan(clock) || fabs(clock - slave_clock) > AV_NOSYNC_THRESHOLD))
        set_clock(c, slave_clock, slave->serial);
}

Here c is the external clock and slave is other clock. It can be seen that when other clocks are set and the external clock is not set, or they are both set but the difference is greater than AV_NOSYNC_THRESHOLD, it must be reset.

Therefore, it can be seen that basically the external clock will only be set once. The specific setting depends on whether the first frame is audio or video, and can be set using their clock.

Summarize

The specific basis is to look at the parameters you set, and then get the master clock through get_master_clock, so all three major clocks will be set!!!

Each clock setting position:

The audio clock is:

 set_clock_at(&is->audclk, is->audio_clock -
                                      (double)(2 * is->audio_hw_buf_size + is->audio_write_buf_size)
                                          / is->audio_tgt.bytes_per_sec,
                     is->audio_clock_serial,
                     audio_callback_time / 1000000.0);

The video clock is:

 if (!isnan(vp->pts))
                update_video_pts(is, vp->pts, vp->pos, vp->serial);

The external clock is:

The following part of the audio clock settings:

set_clock_at(&is->audclk, is->audio_clock -
                                      (double)(2 * is->audio_hw_buf_size + is->audio_write_buf_size)
                                          / is->audio_tgt.bytes_per_sec,
                     is->audio_clock_serial,
                     audio_callback_time / 1000000.0);
        sync_clock_to_slave(&is->extclk, &is->audclk);

There is also the update_video_pts function of the video clock setting:

static void update_video_pts(VideoState *is, double pts, int64_t pos, int serial) {
    
    
    /* update current video pts */
    set_clock(&is->vidclk, pts, serial);
    sync_clock_to_slave(&is->extclk, &is->vidclk);
}

Guess you like

Origin blog.csdn.net/m0_60565784/article/details/131896496