x265 intra-frame block division RDO process

1. CU, PU, ​​TU within the frame

CU: Coding unit, the maximum size of the CU in the frame in H.265 is 64x64, the minimum size is 8x8, and it can only be a square block

PU: Prediction unit, H.265 PU has two types:

  • SIZE_2Nx2N: There is no need to further divide the current CU when making predictions, and the PU size is equal to the CU size
  • SIZE_NxN: Only the 8x8 CU has this division method, that is, the current CU is recursively divided into 4 4x4 sub-PUs using a quadtree during prediction

TU: Transformation unit, supports 32x32, 16x16, 8x8, 4x4

The relationship between PU and TU: Because PU and TU are directly divided by CU, there is no definite relationship between their sizes. A PU can contain multiple TUs, and a TU can span multiple PUs, but the size of the two Both must be smaller than CU. For intra-frame coding, due to the dependency between adjacent PUs, that is, the current PU needs to refer to the adjacent coded PUs when making predictions. Therefore, one PU can contain multiple TUs, but one TU can only correspond to one PU at most.

2. Intra-frame block division process in x265z

In x265, it is the RDO process of block division of intra-frame blocks in the compressIntraCU function. When dividing, it is performed recursively, and H.265 only supports quadtree division, as shown in the following figure as an example.

 The 64x64 CU division process is as follows:

  1. Start the division from the root 64x64CU, and obtain the first 32x32 CU through quadtree division
  2. For the first 32x32 CU, first call the checkIntra function to perform the RDO of the intra prediction mode, and calculate the RD Cost; divide the 32x32 CU into a quadtree to obtain 4 16x16 CUs
  3. For the first 16x16 CU, first call the checkIntra function to perform the RDO in the intra prediction mode, and calculate the RD Cost; divide the 16x16 CU into a quadtree to obtain 4 8x8 CUs
  4. For four 8x8 CUs, call the checkIntra function for each 8x8CU to calculate the RD Cost
  5. Return to the 16x16 CU in the third step and compare the RD Cost obtained without quadtree division with the RD Cost obtained in the fourth step. The comparison result between the two determines whether the 16x16 CU is divided into 4 8x8 CU
  6. In the same way, compare the second, third, and fourth 16x16 CUs and add up the optimal RD Costs of these four 16x16CUs.
  7. Return to the 32x32 CU in the second step, compare the RD Cost of the first 32x32CU and the cumulative sum of the four 16x16 RD Costs obtained in the sixth, thereby deciding to divide the 32x32CU into a quadtree
  8. In the same way, calculate the optimal RD Cost of the second, third, and fourth 32x32CU, and decide whether to divide it into a quadtree

(Note: In theory, the maximum CU in the frame here should be 64x64, but I understand from the code in x265 that there is no 64x64 intra CU, I hope someone can correct me)

RDO reference for intra prediction mode: https://blog.csdn.net/BigDream123/article/details/112383895

The code and comments are as follows

uint64_t Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)
{
    uint32_t depth = cuGeom.depth;
    ModeDepth& md = m_modeDepth[depth];
    md.bestMode = NULL;

    bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
    bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);

    bool bAlreadyDecided = m_param->intraRefine != 4 && parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] != (uint8_t)ALL_IDX && !(m_param->bAnalysisType == HEVC_INFO);
    bool bDecidedDepth = m_param->intraRefine != 4 && parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
    int split = 0;
    if (m_param->intraRefine && m_param->intraRefine != 4)
    {
        split = m_param->scaleFactor && bDecidedDepth && (!mightNotSplit || 
            ((cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize] + 1))));
        if (cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize]) && !bDecidedDepth)
            bAlreadyDecided = false;
    }

    if (bAlreadyDecided)
    {
        if (bDecidedDepth && mightNotSplit)
        {
            Mode& mode = md.pred[0];
            md.bestMode = &mode;
            mode.cu.initSubCU(parentCTU, cuGeom, qp);
            bool reuseModes = !((m_param->intraRefine == 3) ||
                                (m_param->intraRefine == 2 && parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] > DC_IDX));
            if (reuseModes)
            {
                memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
                memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
            }
            checkIntra(mode, cuGeom, (PartSize)parentCTU.m_partSize[cuGeom.absPartIdx]);

            if (m_bTryLossless)
                tryLossless(cuGeom);

            if (mightSplit)
                addSplitFlagCost(*md.bestMode, cuGeom.depth);
        }
    }
    else if (cuGeom.log2CUSize != MAX_LOG2_CU_SIZE && mightNotSplit)
    {   // 如果当前尺寸不等于最大CU尺寸(64x64)且可能不会继续划分,则开始选择预测模式
        md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);
        checkIntra(md.pred[PRED_INTRA], cuGeom, SIZE_2Nx2N);
        checkBestMode(md.pred[PRED_INTRA], depth);
		
        if (cuGeom.log2CUSize == 3 && m_slice->m_sps->quadtreeTULog2MinSize < 3)
        {	// 如果当前CU尺寸为8x8,则计算将CU划分为4个4x4 PU进行预测所需的RD Cost
            md.pred[PRED_INTRA_NxN].cu.initSubCU(parentCTU, cuGeom, qp);
            checkIntra(md.pred[PRED_INTRA_NxN], cuGeom, SIZE_NxN);
            checkBestMode(md.pred[PRED_INTRA_NxN], depth);
        }

        if (m_bTryLossless)
            tryLossless(cuGeom);

        if (mightSplit)
            addSplitFlagCost(*md.bestMode, cuGeom.depth);
    }

    // stop recursion if we reach the depth of previous analysis decision
	// 如果我们达到先前分析决策的深度,停止递归
    mightSplit &= !(bAlreadyDecided && bDecidedDepth) || split;

    if (mightSplit)
    {   //如果可能继续划分,则进行递归划分
        Mode* splitPred = &md.pred[PRED_SPLIT];
        splitPred->initCosts();
        CUData* splitCU = &splitPred->cu;
        splitCU->initSubCU(parentCTU, cuGeom, qp);

        uint32_t nextDepth = depth + 1;
        ModeDepth& nd = m_modeDepth[nextDepth];
        invalidateContexts(nextDepth);
        Entropy* nextContext = &m_rqt[depth].cur;
        int32_t nextQP = qp;
        uint64_t curCost = 0;
        int skipSplitCheck = 0;

        for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
        {
            const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset + subPartIdx);
            if (childGeom.flags & CUGeom::PRESENT)
            {
                m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv, childGeom.absPartIdx);
                m_rqt[nextDepth].cur.load(*nextContext);

                if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)
                    nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));

                if (m_param->bEnableSplitRdSkip)
                {
                    curCost += compressIntraCU(parentCTU, childGeom, nextQP);
					// 如果当前划分的CU的RD Cost大于总的RD Cost,则停止划分
                    if (m_modeDepth[depth].bestMode && curCost > m_modeDepth[depth].bestMode->rdCost)
                    {
                        skipSplitCheck = 1;
                        break;
                    }
                }
                else
                    compressIntraCU(parentCTU, childGeom, nextQP);

                // Save best CU and pred data for this sub CU
                splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);
                splitPred->addSubCosts(*nd.bestMode);
                nd.bestMode->reconYuv.copyToPartYuv(splitPred->reconYuv, childGeom.numPartitions * subPartIdx);
                nextContext = &nd.bestMode->contexts;
            }
            else
            {
                /* record the depth of this non-present sub-CU */
                splitCU->setEmptyPart(childGeom, subPartIdx);

                /* Set depth of non-present CU to 0 to ensure that correct CU is fetched as reference to code deltaQP */
                if (bAlreadyDecided)
                    memset(parentCTU.m_cuDepth + childGeom.absPartIdx, 0, childGeom.numPartitions);
            }
        }
        if (!skipSplitCheck)
        {
            nextContext->store(splitPred->contexts);
            if (mightNotSplit)
                addSplitFlagCost(*splitPred, cuGeom.depth);
            else
                updateModeCost(*splitPred);

            checkDQPForSplitPred(*splitPred, cuGeom);
            checkBestMode(*splitPred, depth);
        }
    }

    if (m_param->bEnableRdRefine && depth <= m_slice->m_pps->maxCuDQPDepth)
    {
        int cuIdx = (cuGeom.childOffset - 1) / 3;
        cacheCost[cuIdx] = md.bestMode->rdCost;
    }

    if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
    {
        CUData* ctu = md.bestMode->cu.m_encData->getPicCTU(parentCTU.m_cuAddr);
        int8_t maxTUDepth = -1;
        for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
            maxTUDepth = X265_MAX(maxTUDepth, md.bestMode->cu.m_tuDepth[i]);
        ctu->m_refTuDepth[cuGeom.geomRecurId] = maxTUDepth;
    }

    /* Copy best data to encData CTU and recon */
    md.bestMode->cu.copyToPic(depth);
    if (md.bestMode != &md.pred[PRED_SPLIT])
        md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic, parentCTU.m_cuAddr, cuGeom.absPartIdx);

    return md.bestMode->rdCost;
}

 

Guess you like

Origin blog.csdn.net/BigDream123/article/details/112384849