x264 code analysis: reference frame management

 X264 is an open source code software that studies H.264 encoding. Compared with JM, its encoding performance has been greatly improved. It supports most H.264 feature tools, including: CABAC and CAVLC efficient entropy encoding , multi-reference frame prediction, all intra-predicted macroblock types (16x6l and 4x4), all forward inter-predicted P (frame) macroblock types (16xl6, 16x8, 8xl6, 8x8, 8x4, 4x8, and 4x4), Most commonly used bi-directional inter-prediction (B-frame) macroblock types (16xl6, 16x8, 8xl6 and 8x8), 1/4 pixel precision motion estimation, rate-distortion optimization, adaptive B-frame selection and B-frame can be used as reference frame.

Based on the reference frame management part of JM15.1 learned earlier, the basic flow of reference frame management in JM is as follows:
1. Initialization of the reference frame image list;
2. Reordering of the reference image list; the purpose of reordering is mainly to To reduce the encoding required for the reference frame index number, whether to reorder or not depends on the values ​​​​of sh->b_ref_pic_list_reordering_l0 and sh->b_ref_pic_list_reordering_l1
3. Frame encoding;
4. Marking of the reference image sequence;
here is encoded according to X264 Process, with the frame of reference as the focal point in which to track:

The overall process of X264: There are four levels (use R to represent the general route):

A. The main function main function, the main function Encode() of the encoding function , in which the for loop with i_frme_total as the restriction condition enters

B. encode frame() , this is the second layer of encoding, call x264_encoder_encode() to encode the frame layer, this function is mainly to encode the VCL layer and part of the NAL network adaptation layer. The sequence of coding in this function (because the reference frame appears in it, so carefully track the flow of this function, denoted by NO.):

NO.1  is mainly to store the frames to be encoded in fenc, and sort the frames to be encoded. fenc and fdec represent the buffer intervals representing encoding and decoding representatives, respectively. When encoding the current image, first check the reconstructed frame stored in fdec and add it to the list of reference frames.

In the x264_encoder_encode() function, it is implemented by calling the reference_update() function:

    // ok to call this before encoding any frames, since the initial values of fdec have b_kept_as_ref=0
    // 更新参考帧队列frames.reference[].若为B帧则不更新
    // 重建帧fdec移植参考帧列表,新建一个fdec
    if( reference_update( h ) )
        return -1;

reference_update() function implementation:

static inline int reference_update( x264_t *h )
{
    //如果当前帧不是被参考的帧
    if( !h->fdec->b_kept_as_ref )
    {
        if( h->i_thread_frames > 1 )
        {
            x264_frame_push_unused( h, h->fdec );
            h->fdec = x264_frame_pop_unused( h, 1 );
            if( !h->fdec )
                return -1;
        }
        return 0;
    }

    /* apply mmco from previous frame. */
    for( int i = 0; i < h->sh.i_mmco_command_count; i++ )
        for( int j = 0; h->frames.reference[j]; j++ )
            if( h->frames.reference[j]->i_poc == h->sh.mmco[i].i_poc )
                x264_frame_push_unused( h, x264_frame_shift( &h->frames.reference[j] ) );

    /* move frame in the buffer */
        //重建帧加入参考帧列表,这里fdec应该存储的是刚被重建的帧的内容
    x264_frame_push( h->frames.reference, h->fdec );
    //列表满了,则要移除1帧
    if( h->frames.reference[h->sps->i_num_ref_frames] )
        x264_frame_push_unused( h, x264_frame_shift( h->frames.reference ) );
    // 从unused队列里面找出一个不用的buff用来存储新的重建帧的内容
    h->fdec = x264_frame_pop_unused( h, 1 );
    if( !h->fdec )
        return -1;
    return 0;
}

NO.2 Create a reference frame list for the reconstructed frame, which is relative to the initialization of the reference frame image list in JM:

First determine the frame type of the current frame to be encoded (h->fenc->i_type) and reset the reference frame list

  • If the frame to be encoded is IDR, no reference frame is needed, that is, enter the function reference_reset( h ), clear all reference frames in the reference frame list, and set the reference priority of the current frame to the highest level: NAL_PRIORITY_HIGHEST
  • If the frame to be encoded is I/P/BREF, call reference_hierarchy_reset(h) to reset the reference frame

Purpose: Reset all reference frame management

    if( h->fenc->i_type == X264_TYPE_IDR )
    {
        // I与IDR区别
        // 注意IDR会导致参考帧列清空,而I不会
        // I图像之后的图像可以引用I图像之间的图像做运动参考
        /* reset ref pictures */
        i_nal_type    = NAL_SLICE_IDR;
        i_nal_ref_idc = NAL_PRIORITY_HIGHEST; // 参考优先级
        h->sh.i_type = SLICE_TYPE_I;
        //若是IDR帧,则清空所有参考帧
        reference_reset( h );
        h->frames.i_poc_last_open_gop = -1;
    }
    else if( h->fenc->i_type == X264_TYPE_I )
    {
        //I与IDR区别
        //注意IDR会导致参考帧列清空,而I不会
        //I图像之后的图像可以引用I图像之间的图像做运动参考
        i_nal_type    = NAL_SLICE;
        i_nal_ref_idc = NAL_PRIORITY_HIGH; /* Not completely true but for now it is (as all I/P are kept as ref)*/
        h->sh.i_type = SLICE_TYPE_I;
        reference_hierarchy_reset( h );
        if( h->param.b_open_gop )
            h->frames.i_poc_last_open_gop = h->fenc->b_keyframe ? h->fenc->i_poc : -1;
    }
    else if( h->fenc->i_type == X264_TYPE_P )
    {
        i_nal_type    = NAL_SLICE;
        i_nal_ref_idc = NAL_PRIORITY_HIGH; /* Not completely true but for now it is (as all I/P are kept as ref)*/
        h->sh.i_type = SLICE_TYPE_P;
        reference_hierarchy_reset( h );
        h->frames.i_poc_last_open_gop = -1;
    }
    else if( h->fenc->i_type == X264_TYPE_BREF )
    {
        //可以作为参考帧的B帧,这是个特色
        i_nal_type    = NAL_SLICE;
        i_nal_ref_idc = h->param.i_bframe_pyramid == X264_B_PYRAMID_STRICT ? NAL_PRIORITY_LOW : NAL_PRIORITY_HIGH;
        h->sh.i_type = SLICE_TYPE_B;
        reference_hierarchy_reset( h );
    }
    else    /* B frame */
    {
        //最普通
        i_nal_type    = NAL_SLICE;
        i_nal_ref_idc = NAL_PRIORITY_DISPOSABLE;
        h->sh.i_type = SLICE_TYPE_B;
    }

 reference_reset function:

static inline void reference_reset( x264_t *h )
{
    while( h->frames.reference[0] )
        x264_frame_push_unused( h, x264_frame_pop( h->frames.reference ) ); // 将参考帧弹出,放入到 unused 队列中
    h->fdec->i_poc =
    h->fenc->i_poc = 0;
}

 reference_hierarchy_reset function: (didn't quite understand)

static inline void reference_hierarchy_reset( x264_t *h )
{
    int ref;
    int b_hasdelayframe = 0;

    // 非参考B帧一般是IS_DISPOSABLE
    /* look for delay frames -- chain must only contain frames that are disposable */
    for( int i = 0; h->frames.current[i] && IS_DISPOSABLE( h->frames.current[i]->i_type ); i++ )
        b_hasdelayframe |= h->frames.current[i]->i_coded
                        != h->frames.current[i]->i_frame + h->sps->vui.i_num_reorder_frames;

    /* This function must handle b-pyramid and clear frames for open-gop */
    if( h->param.i_bframe_pyramid != X264_B_PYRAMID_STRICT && !b_hasdelayframe && h->frames.i_poc_last_open_gop == -1 )
        return;

    /* Remove last BREF. There will never be old BREFs in the
     * dpb during a BREF decode when pyramid == STRICT */
     // 标记要清除的参考帧, BREF 帧 或
    // h->frames.reference[ref]->i_poc < h->frames.i_poc_last_open_gop
    // 参考帧
    // 并设置list0重序标志
    for( ref = 0; h->frames.reference[ref]; ref++ )
    {
        if( ( h->param.i_bframe_pyramid == X264_B_PYRAMID_STRICT
            && h->frames.reference[ref]->i_type == X264_TYPE_BREF )
            || ( h->frames.reference[ref]->i_poc < h->frames.i_poc_last_open_gop
            && h->sh.i_type != SLICE_TYPE_B ) )
        {
            int diff = h->i_frame_num - h->frames.reference[ref]->i_frame_num;
            h->sh.mmco[h->sh.i_mmco_command_count].i_difference_of_pic_nums = diff;
            h->sh.mmco[h->sh.i_mmco_command_count++].i_poc = h->frames.reference[ref]->i_poc;
            x264_frame_push_unused( h, x264_frame_shift( &h->frames.reference[ref] ) );
            h->b_ref_reorder[0] = 1;
            ref--;
        }
    }

    /* Prepare room in the dpb for the delayed display time of the later b-frame's */
      // 计算list0 清除参考帧的数目(从距离远者开始清除)
    if( h->param.i_bframe_pyramid )
        h->sh.i_mmco_remove_from_end = X264_MAX( ref + 2 - h->frames.i_max_dpb, 0 );
}

NO.3  After completing the above settings of i_nal_type, i_nal_ref_idc, h->sh.i_type for different frames, call reference_build_list to build a list of reference frames, and reorder the list of reference frames according to the POC value.

static inline void reference_build_list( x264_t *h, int i_poc )
{
    int b_ok;

    /* build ref list 0/1 */
    h->mb.pic.i_fref[0] = h->i_ref[0] = 0;
    h->mb.pic.i_fref[1] = h->i_ref[1] = 0;
    if( h->sh.i_type == SLICE_TYPE_I )
        return;

    // 如果当前的slice类型为SLICE_TYPE_I, 则返回
    //初始化fref0和fref1,并得到fref0和fref1所参考的帧数,在程序中跟踪后发现,fref0会随参数的设置改变大小的
    for( int i = 0; h->frames.reference[i]; i++ )
    {
        // 如果帧被破坏, 则跳过继续
        if( h->frames.reference[i]->b_corrupt )
            continue;
        // 将参考帧队列中i_poc小于当前i_poc的帧
        // 放入参考帧队列ref[0];
        // 否则放入ref[1]中
        if( h->frames.reference[i]->i_poc < i_poc )
            h->fref[0][h->i_ref[0]++] = h->frames.reference[i];
        else if( h->frames.reference[i]->i_poc > i_poc )
            h->fref[1][h->i_ref[1]++] = h->frames.reference[i];
    }

    // h->sh.i_mmco_remove_from_end 在 x264_reference_hierarchy_reset
    // 函数中被设置 (h->sh.i_mmco_remove_from_end = X264_MAX( ref + 2 - h->frames.i_max_dpb, 0 );)
    // 标记清除参考帧列表0中的帧,从距离远的帧开始清除
    // 清除h->sh.i_mmco_remove_from_end个参考帧
    if( h->sh.i_mmco_remove_from_end )
    {
        /* Order ref0 for MMCO remove */
        do
        {
            b_ok = 1;
            for( int i = 0; i < h->i_ref[0] - 1; i++ )
            {
                if( h->fref[0][i]->i_frame < h->fref[0][i+1]->i_frame )
                {
                    XCHG( x264_frame_t*, h->fref[0][i], h->fref[0][i+1] );
                    b_ok = 0;
                    break;
                }
            }
        } while( !b_ok );

        for( int i = h->i_ref[0]-1; i >= h->i_ref[0] - h->sh.i_mmco_remove_from_end; i-- )
        {
            int diff = h->i_frame_num - h->fref[0][i]->i_frame_num;
            h->sh.mmco[h->sh.i_mmco_command_count].i_poc = h->fref[0][i]->i_poc;
            h->sh.mmco[h->sh.i_mmco_command_count++].i_difference_of_pic_nums = diff;
        }
    }


    // 依据参考帧与当前帧的poc距离来排序参考帧
    // 排序参考帧, 并且获取各自参考帧列表的最近的参考帧
    // 这里先依据POC距离来排序参考帧列表
    // 然后在x264_reference_check_reorder中
    // list0 依据 i_frame_num 来确定是否重序,
    // list1 依然依据 poc 来确定是否重序
    // 在函数x264_reference_check_reorder中,
    // 设置重序标志,在x264_slice_header_init
    // 利用这两个标志来确定将是否需要重序的标志
    // 写进slice_header里,并通过x264_slice_write
    // 写进码流
    /* Order reference lists by distance from the current frame. 按距离当前帧的距离排列参考帧列表*/
    for( int list = 0; list < 2; list++ )
    {
        h->fref_nearest[list] = h->fref[list][0];
        do
        {
            //相当于JM里面的list0,将ref0中的参考帧按照POC进行排序
            b_ok = 1;
            for( int i = 0; i < h->i_ref[list] - 1; i++ )
            {
                // 参考帧队列0放的是前向参考帧,而参考帧队列1放的是后向参考帧
                // 前向参考帧列表按降序排列,而后向参考帧列表按升序排列
                if( list ? h->fref[list][i+1]->i_poc < h->fref_nearest[list]->i_poc
                         : h->fref[list][i+1]->i_poc > h->fref_nearest[list]->i_poc )
                    h->fref_nearest[list] = h->fref[list][i+1];

                if( reference_distance( h, h->fref[list][i] ) > reference_distance( h, h->fref[list][i+1] ) )
                {
                    // 如果参考帧i到待编码帧的距离 大于 参考帧i+1到待编码帧的距离
                   // 交换在参考帧列表里的两参考帧
                    XCHG( x264_frame_t*, h->fref[list][i], h->fref[list][i+1] );
                    b_ok = 0;
                    break;
                }
            }
        } while( !b_ok );
    }

    // 检查参考帧列表是否需要重序
    reference_check_reorder( h );

    //在上面获的h->i_ref0,h->i_ref1在下面进行重新选择
    //因为在参数的初始化设置时,h->frames.i_max_ref1=1,则h->i_ref1会在0和1之间波动,
    //而h->frames.i_max_ref0=2,并且会随
    //命令行参数h->param.i_frame_reference的设置而发生变化,和都会去变化中的最小值,
    h->i_ref[1] = X264_MIN( h->i_ref[1], h->frames.i_max_ref1 );
    h->i_ref[0] = X264_MIN( h->i_ref[0], h->frames.i_max_ref0 );
    h->i_ref[0] = X264_MIN( h->i_ref[0], h->param.i_frame_reference ); // if reconfig() has lowered the limit

    /* For Blu-ray compliance, don't reference frames outside of the minigop. */
    if( IS_X264_TYPE_B( h->fenc->i_type ) && h->param.b_bluray_compat )
        h->i_ref[0] = X264_MIN( h->i_ref[0], IS_X264_TYPE_B( h->fref[0][0]->i_type ) + 1 );

    /* add duplicates */
    if( h->fenc->i_type == X264_TYPE_P )
    {
        int idx = -1;
        if( h->param.analyse.i_weighted_pred >= X264_WEIGHTP_SIMPLE )
        {
            x264_weight_t w[3];
            w[1].weightfn = w[2].weightfn = NULL;
            if( h->param.rc.b_stat_read )
                x264_ratecontrol_set_weights( h, h->fenc );

            if( !h->fenc->weight[0][0].weightfn )
            {
                h->fenc->weight[0][0].i_denom = 0;
                SET_WEIGHT( w[0], 1, 1, 0, -1 );
                idx = weighted_reference_duplicate( h, 0, w );
            }
            else
            {
                if( h->fenc->weight[0][0].i_scale == 1<<h->fenc->weight[0][0].i_denom )
                {
                    SET_WEIGHT( h->fenc->weight[0][0], 1, 1, 0, h->fenc->weight[0][0].i_offset );
                }
                weighted_reference_duplicate( h, 0, x264_weight_none );
                if( h->fenc->weight[0][0].i_offset > -128 )
                {
                    w[0] = h->fenc->weight[0][0];
                    w[0].i_offset--;
                    h->mc.weight_cache( h, &w[0] );
                    idx = weighted_reference_duplicate( h, 0, w );
                }
            }
        }
        h->mb.ref_blind_dupe = idx;
    }

    assert( h->i_ref[0] + h->i_ref[1] <= X264_REF_MAX );
    //h->mb.pic.i_fref[0]和h->mb.pic.i_fref[1] 比较重要,会在后面的宏块的编码中利用参考帧使用到
    h->mb.pic.i_fref[0] = h->i_ref[0];
    h->mb.pic.i_fref[1] = h->i_ref[1];
}

NO.4  The following part is the analysis of the NAL encoding part: For i_nal_type=NAL_SLICE_IDR, write the bit stream to obtain PPS, SPS, SEI, so as a GOP, when encoding the first IDR, its i_nal=4, because here To write 4 types of NALU: PPS, SPS, SEI, IDR, for each non-IDR frame, i_nal=1, that is, only this type of encoding is required.

So far, the reference frames have been initialized and sorted, and  the reference images are reordered  . Here is an analysis of how x264 writes the sequence of reference frames to be sorted into the code stream, and reorders ref0.

Then pass h->b_ref_reorder[0] and h->b_ref_reorder[1]; to see whether to reorder h->fref0 and h->fref1

    if( h->param.rc.b_stat_read && h->sh.i_type != SLICE_TYPE_I )
    {
        x264_reference_build_list_optimal( h );
        reference_check_reorder( h );
    }
/* Check to see whether we have chosen a reference list ordering different
 * from the standard's default. */
static inline void reference_check_reorder( x264_t *h )
{
    /* The reorder check doesn't check for missing frames, so just
     * force a reorder if one of the reference list is corrupt. */
     // 如果有参考帧被破坏了, 则重序
    for( int i = 0; h->frames.reference[i]; i++ )
        if( h->frames.reference[i]->b_corrupt )
        {
            h->b_ref_reorder[0] = 1;
            return;
        }
    // 对于list0, 用i_frame_num来判断是否重序
    // list0 是按i_frame_num降序排列的,
    // 因此如果framenum_diff > 0, 则list0重序
    // 对于list1, 用poc来判断是否重序,
    // list1是按poc升序排列的, 因此poc < 0 则list1重序
    for( int list = 0; list <= (h->sh.i_type == SLICE_TYPE_B); list++ )
        for( int i = 0; i < h->i_ref[list] - 1; i++ )
        {
            int framenum_diff = h->fref[list][i+1]->i_frame_num - h->fref[list][i]->i_frame_num;
            int poc_diff = h->fref[list][i+1]->i_poc - h->fref[list][i]->i_poc;
            /* P and B-frames use different default orders. */
            if( h->sh.i_type == SLICE_TYPE_P ? framenum_diff > 0 : list == 1 ? poc_diff < 0 : poc_diff > 0 )
            {
                h->b_ref_reorder[list] = 1;
                return;
            }
        }
}

C.  The next step is to check whether to reorder the settings of the above two parameters about h->b_ref_reorder[0] and h->b_ref_reorder[1], so enter the third part of the overall process: the stripe layer! The purpose is to find the grammatical elements of the reordered functions and write them in the slice header !
Enter the slice_header_init function in slice_init ( x264_t *h, int i_nal_type, int i_global_qp )  :

Calculate idc and arg as the offset of the reference frame

sh->ref_pic_list_order[list][i].idc = ( diff > 0 );
sh->ref_pic_list_order[list][i].arg = (abs(diff) - 1) & ((1 << sps->i_log2_max_frame_num) - 1);
    sh->b_num_ref_idx_override = 0;
    sh->i_num_ref_idx_l0_active = 1;
    sh->i_num_ref_idx_l1_active = 1;
    //调用前面的 h->b_ref_reorder[0]和 h->b_ref_reorder[1]
    sh->b_ref_pic_list_reordering[0] = h->b_ref_reorder[0];
    sh->b_ref_pic_list_reordering[1] = h->b_ref_reorder[1];

    /* If the ref list isn't in the default order, construct reordering header */
    for( int list = 0; list < 2; list++ )
    {
        if( sh->b_ref_pic_list_reordering[list] )
        {
            int pred_frame_num = i_frame;
            for( int i = 0; i < h->i_ref[list]; i++ )
            {
                int diff = h->fref[list][i]->i_frame_num - pred_frame_num;
                //这里获得sh->ref_pic_list_order[0][i].idc
                //和 sh->ref_pic_list_order[0][i].arg ,
                //在参考帧重排序作为偏移量
                sh->ref_pic_list_order[list][i].idc = ( diff > 0 );
                sh->ref_pic_list_order[list][i].arg = (abs(diff) - 1) & ((1 << sps->i_log2_max_frame_num) - 1);
                pred_frame_num = h->fref[list][i]->i_frame_num;
            }
        }
    }

After executing slice init , enter: slices_write( x264_t *h )  This is the key function of the third layer encoding.

The reference frame reordering is in the function slice_write: In this part, we find the part about h->b_ref_reorder[0] and h->b_ref_reorder[1] in the slice header: purpose:  reordering

    /* ref pic list reordering 参考帧列表重排序 */
    if( sh->i_type != SLICE_TYPE_I )
    {
        bs_write1( s, sh->b_ref_pic_list_reordering[0] );
        if( sh->b_ref_pic_list_reordering[0] )
        {
            for( int i = 0; i < sh->i_num_ref_idx_l0_active; i++ )
            {
                bs_write_ue( s, sh->ref_pic_list_order[0][i].idc );
                bs_write_ue( s, sh->ref_pic_list_order[0][i].arg );
            }
            bs_write_ue( s, 3 );
        }
    }
    if( sh->i_type == SLICE_TYPE_B )
    {
        bs_write1( s, sh->b_ref_pic_list_reordering[1] );
        if( sh->b_ref_pic_list_reordering[1] )
        {
            for( int i = 0; i < sh->i_num_ref_idx_l1_active; i++ )
            {
                bs_write_ue( s, sh->ref_pic_list_order[1][i].idc );
                bs_write_ue( s, sh->ref_pic_list_order[1][i].arg );
            }
            bs_write_ue( s, 3 );
        }
    }

After completing the sorting of the above reference frames, return to the overall process and
briefly introduce the third part of the encoding of the slice layer, as follows.
The most important function is slice_write(); (Encoder.c L1601)
The main part of this function is exist

while( (mb_xy = i_mb_x + i_mb_y * h->sps->i_mb_width) <= h->sh.i_last_mb )

In the loop, it mainly performs intra-frame and inter-frame prediction, motion estimation, motion compensation, 4×4DCT transformation, quantization and zig_zag scanning, and determination of P_skip and B_SKIP macroblock modes, entropy coding, etc., for macroblocks. This is coding core part.
x264_macroblock_analyse( h ) performs intra-frame and inter-frame motion estimation, and saves the motion vector;
x264_macroblock_encode( h ) performs 4×4DCT on the residual, quantization, zig_zag scanning, and reconstructs a reference frame that is synchronized with the decoding end.

D.   Back to the fourth part of the encoding in the overall process, the encoding of the macroblock layer:
functions to be analyzed: x264_macroblock_analyse( h ) and
x264_macroblock_encode( h )
x264_macroblock_analyse( x264_t *h ) (Analyse.c L2337)
for analysis Coding cost in various possible intra and inter prediction modes to find the most suitable prediction mode.

Guess you like

Origin blog.csdn.net/BigDream123/article/details/125451673