H.266/VVC code learning: Dependent scalar quantization

The quantization module of VVC uses a new technology: Dependent scalar quantization. Scalar-dependent quantization means that a set of allowable reconstruction values ​​of transform coefficients depend on the values ​​of transform coefficient levels that precede the current transform coefficient level in the reconstruction order. Compared with the traditional independent scalar quantization used in HEVC, the main impact of this method is that the allowed reconstruction vector is denser in the N-dimensional vector space (N represents the number of transform coefficients in the transform block). This means that for a given average number of allowable reconstruction vectors per N-dimensional unit volume, the average distortion between the input vector and the nearest reconstruction vector is reduced.

The implementation process of relying on scalar quantization is:

  • Define two scalar quantizers with different reconstruction levels;
  • Define the conversion method between two scalar quantizers.

 The two scalar quantizers used are represented by Q0 and Q1, as shown in the figure above. The position of the available reconstruction level is uniquely specified by the quantization step δ (delta). The scalar quantizer (Q0 or Q1) used is not signaled explicitly in the bitstream. Conversely, the quantizer used for the current transform coefficient is determined by the parity of the transform coefficient level that precedes the current transform coefficient in the encoding/reconstruction order.

 As shown in the figure above, the conversion between the two scalar quantizers (Q0 and Q1) is realized by a state machine with 4 states. The status can take four different values: 0, 1, 2, 3. It is uniquely determined by the parity of the transform coefficient level before the current transform coefficient in the coding/reconstruction order. At the start of the inverse quantization of the transform block, the state is set to 0. The transform coefficients are reconstructed in the scan order (that is, they are entropy decoded in the same order). After reconstructing the current transform coefficient, update the state as shown in the figure, where k represents the value of the transform coefficient level.

In the VTM code, the entry function of DQ is the DepQuant::quant function. The main steps are as follows:

(1) Pre-initialize quantization related parameters

(2) Find the first non-zero coefficient of TU

(3) Initialize all state parameters

(4) Traverse all the coefficients from back to front, find the corresponding quantized state chain, quantized coefficient, and RD Cost corresponding to the quantized state for each coefficient. ( XDecideAndUpdate function implementation)

(5) Determine the minimum quantitative state chain of RD Cost

(6) According to the optimal state quantity, Hualian scans all coefficients in the forward direction, and stores the corresponding quantized transform coefficient levels.

code show as below:

  void DepQuant::quant( TransformUnit& tu, const CCoeffBuf& srcCoeff, const ComponentID compID, const QpParam& cQP, const double lambda, const Ctx& ctx, TCoeff& absSum, bool enableScalingLists, int* quantCoeff )
  {
    CHECKD( tu.cs->sps->getSpsRangeExtension().getExtendedPrecisionProcessingFlag(), "ext precision is not supported" );

    //===== reset / pre-init =====
    const TUParameters& tuPars  = *g_Rom.getTUPars( tu.blocks[compID], compID );
    m_quant.initQuantBlock    ( tu, compID, cQP, lambda );//初始化相关量化参数
    TCoeff*       qCoeff      = tu.getCoeffs( compID ).buf;
    const TCoeff* tCoeff      = srcCoeff.buf;//
    const int     numCoeff    = tu.blocks[compID].area();
    ::memset( tu.getCoeffs( compID ).buf, 0x00, numCoeff*sizeof(TCoeff) );
    absSum          = 0;

    const CompArea& area     = tu.blocks[ compID ];
    const uint32_t  width    = area.width;
    const uint32_t  height   = area.height;
    const uint32_t  lfnstIdx = tu.cu->lfnstIdx;
    //===== scaling matrix ====
    //const int         qpDQ = cQP.Qp + 1;
    //const int         qpPer = qpDQ / 6;
    //const int         qpRem = qpDQ - 6 * qpPer;

    //TCoeff thresTmp = thres;
    bool zeroOut = false;
    bool zeroOutforThres = false;
    int effWidth = tuPars.m_width, effHeight = tuPars.m_height;
    // MTS或者SBT模式,是否进行高频调零
    if( ( tu.mtsIdx[compID] > MTS_SKIP || (tu.cs->sps->getUseMTS() && tu.cu->sbtInfo != 0 && tuPars.m_height <= 32 && tuPars.m_width <= 32)) && compID == COMPONENT_Y)
    {
      effHeight = (tuPars.m_height == 32) ? 16 : tuPars.m_height;
      effWidth = (tuPars.m_width == 32) ? 16 : tuPars.m_width;
      zeroOut = (effHeight < tuPars.m_height || effWidth < tuPars.m_width);//是否高频调零
    }
    zeroOutforThres = zeroOut || (32 < tuPars.m_height || 32 < tuPars.m_width);
    //===== find first test position =====
    //===== 找到第一个测试位置 =====
    int firstTestPos = numCoeff - 1;
    if (lfnstIdx > 0 && tu.mtsIdx[compID] != MTS_SKIP && width >= 4 && height >= 4)
    {
      firstTestPos = ( ( width == 4 && height == 4 ) || ( width == 8 && height == 8 ) )  ? 7 : 15 ;
    }
    const TCoeff defaultQuantisationCoefficient = (TCoeff)m_quant.getQScale();
    const TCoeff thres = m_quant.getLastThreshold();
    for( ; firstTestPos >= 0; firstTestPos-- )//反向扫描找到第一个非零系数
    {
      if (zeroOutforThres && (tuPars.m_scanId2BlkPos[firstTestPos].x >= ((tuPars.m_width == 32 && zeroOut) ? 16 : 32)
                           || tuPars.m_scanId2BlkPos[firstTestPos].y >= ((tuPars.m_height == 32 && zeroOut) ? 16 : 32)))
        continue;
      TCoeff thresTmp = (enableScalingLists) ? TCoeff(thres / (4 * quantCoeff[tuPars.m_scanId2BlkPos[firstTestPos].idx]))
                                             : TCoeff(thres / (4 * defaultQuantisationCoefficient));

      if (abs(tCoeff[tuPars.m_scanId2BlkPos[firstTestPos].idx]) > thresTmp)
      {
        break;
      }
    }
    if( firstTestPos < 0 )
    {
      return;
    }

    //===== real init =====
    //===== 初始化所有状态 =====
    RateEstimator::initCtx( tuPars, tu, compID, ctx.getFracBitsAcess() );
    m_commonCtx.reset( tuPars, *this );
    for( int k = 0; k < 12; k++ )
    {
      m_allStates[k].init();
    }
    m_startState.init();

    //高频调零后实际存在系数的边界
    int effectWidth = std::min(32, effWidth);
    int effectHeight = std::min(32, effHeight);
    for (int k = 0; k < 12; k++)
    {
      m_allStates[k].effWidth = effectWidth;
      m_allStates[k].effHeight = effectHeight;
    }
    m_startState.effWidth = effectWidth;
    m_startState.effHeight = effectHeight;

    //===== populate trellis =====
    //===== 尝试不同的状态 =====
    //从后向前遍历所有的系数,针对每一个系数找出其最优的量化状态链
    // 这里的扫描顺序使用的是4x4子块扫描顺序的倒序
    for( int scanIdx = firstTestPos; scanIdx >= 0; scanIdx-- )
    {
      const ScanInfo& scanInfo = tuPars.m_scanInfo[ scanIdx ];
      if (enableScalingLists)
      {
        m_quant.initQuantBlock(tu, compID, cQP, lambda, quantCoeff[scanInfo.rasterPos]);
        xDecideAndUpdate( abs( tCoeff[scanInfo.rasterPos]), scanInfo, (zeroOut && (scanInfo.posX >= effWidth || scanInfo.posY >= effHeight)), quantCoeff[scanInfo.rasterPos] );
      }
      else
        xDecideAndUpdate( abs( tCoeff[scanInfo.rasterPos]), scanInfo, (zeroOut && (scanInfo.posX >= effWidth || scanInfo.posY >= effHeight)), defaultQuantisationCoefficient );
    }

    //===== find best path =====
    //=====确定RD cost最小的量化状态链 =====
    Decision  decision    = { std::numeric_limits<int64_t>::max(), -1, -2 };
    int64_t   minPathCost =  0;
    for( int8_t stateId = 0; stateId < 4; stateId++ )
    {
      int64_t pathCost = m_trellis[0][stateId].rdCost;
      if( pathCost < minPathCost )
      {
        decision.prevId = stateId;
        minPathCost     = pathCost;
      }
    }

    //===== backward scanning =====
    //=====根据上面确定的最优量化状态链正向扫描全部系数====
    int scanIdx = 0;
    for( ; decision.prevId >= 0; scanIdx++ )
    {
      decision          = m_trellis[ scanIdx ][ decision.prevId ];
      int32_t blkpos    = tuPars.m_scanId2BlkPos[scanIdx].idx;
      qCoeff[ blkpos ]  = ( tCoeff[ blkpos ] < 0 ? -decision.absLevel : decision.absLevel );//量化后的系数
      absSum           += decision.absLevel;
    }
  }

}; // namespace DQIntern

The xDecideAndUpdate function implements the quantization state, quantization coefficient and RD Cost of each transform coefficient by calling the xDecide function, and then updates the RD Cost of each quantization state chain.

  void DepQuant::xDecideAndUpdate( const TCoeff absCoeff, const ScanInfo& scanInfo, bool zeroOut, int quantCoeff )
  {
    Decision* decisions = m_trellis[ scanInfo.scanIdx ];

    std::swap( m_prevStates, m_currStates );

    xDecide( scanInfo.spt, absCoeff, lastOffset(scanInfo.scanIdx), decisions, zeroOut, quantCoeff );

    if( scanInfo.scanIdx )
    {
      if( scanInfo.eosbb )
      {
        m_commonCtx.swap();
        m_currStates[0].updateStateEOS( scanInfo, m_prevStates, m_skipStates, decisions[0] );
        m_currStates[1].updateStateEOS( scanInfo, m_prevStates, m_skipStates, decisions[1] );
        m_currStates[2].updateStateEOS( scanInfo, m_prevStates, m_skipStates, decisions[2] );
        m_currStates[3].updateStateEOS( scanInfo, m_prevStates, m_skipStates, decisions[3] );
        ::memcpy( decisions+4, decisions, 4*sizeof(Decision) );
      }
      else if( !zeroOut )
      {
        switch( scanInfo.nextNbInfoSbb.num )
        {
        case 0:
          //更新当前状态为0的rdcost的值为decisions[0].rdcost;
          m_currStates[0].updateState<0>( scanInfo, m_prevStates, decisions[0] );
          //更新当前状态为1的rdcost的值为decisions[1].rdcost;
          m_currStates[1].updateState<0>( scanInfo, m_prevStates, decisions[1] );
          //更新当前状态为2的rdcost的值为decisions[2].rdcost;
          m_currStates[2].updateState<0>( scanInfo, m_prevStates, decisions[2] );
          //更新当前状态为3的rdcost的值为decisions[3].rdcost;
          m_currStates[3].updateState<0>( scanInfo, m_prevStates, decisions[3] );
          break;
        case 1:
          m_currStates[0].updateState<1>( scanInfo, m_prevStates, decisions[0] );
          m_currStates[1].updateState<1>( scanInfo, m_prevStates, decisions[1] );
          m_currStates[2].updateState<1>( scanInfo, m_prevStates, decisions[2] );
          m_currStates[3].updateState<1>( scanInfo, m_prevStates, decisions[3] );
          break;
        case 2:
          m_currStates[0].updateState<2>( scanInfo, m_prevStates, decisions[0] );
          m_currStates[1].updateState<2>( scanInfo, m_prevStates, decisions[1] );
          m_currStates[2].updateState<2>( scanInfo, m_prevStates, decisions[2] );
          m_currStates[3].updateState<2>( scanInfo, m_prevStates, decisions[3] );
          break;
        case 3:
          m_currStates[0].updateState<3>( scanInfo, m_prevStates, decisions[0] );
          m_currStates[1].updateState<3>( scanInfo, m_prevStates, decisions[1] );
          m_currStates[2].updateState<3>( scanInfo, m_prevStates, decisions[2] );
          m_currStates[3].updateState<3>( scanInfo, m_prevStates, decisions[3] );
          break;
        case 4:
          m_currStates[0].updateState<4>( scanInfo, m_prevStates, decisions[0] );
          m_currStates[1].updateState<4>( scanInfo, m_prevStates, decisions[1] );
          m_currStates[2].updateState<4>( scanInfo, m_prevStates, decisions[2] );
          m_currStates[3].updateState<4>( scanInfo, m_prevStates, decisions[3] );
          break;
        default:
          m_currStates[0].updateState<5>( scanInfo, m_prevStates, decisions[0] );
          m_currStates[1].updateState<5>( scanInfo, m_prevStates, decisions[1] );
          m_currStates[2].updateState<5>( scanInfo, m_prevStates, decisions[2] );
          m_currStates[3].updateState<5>( scanInfo, m_prevStates, decisions[3] );
        }
      }

      if( scanInfo.spt == SCAN_SOCSBB )
      {
        std::swap( m_prevStates, m_skipStates );
      }
    }
  }
void DepQuant::xDecide( const ScanPosType spt, const TCoeff absCoeff, const int lastOffset, Decision* decisions, bool zeroOut, int quanCoeff)
  {
    ::memcpy( decisions, startDec, 8*sizeof(Decision) );

    if( zeroOut )
    {
      if( spt==SCAN_EOCSBB )
      {
        m_skipStates[0].checkRdCostSkipSbbZeroOut( decisions[0] );
        m_skipStates[1].checkRdCostSkipSbbZeroOut( decisions[1] );
        m_skipStates[2].checkRdCostSkipSbbZeroOut( decisions[2] );
        m_skipStates[3].checkRdCostSkipSbbZeroOut( decisions[3] );
      }
      return;
    }
    //存储4个预量化值的相关参数
    PQData  pqData[4];
    //对absCoeff进行4次预量化,得到量化后的变换系数level和量化成该值的rdcost,第0个和第3个量化值是偶数,第1个和第2个量化值是奇数;
    m_quant.preQuantCoeff( absCoeff, pqData, quanCoeff );
    //前一个量化状态是0,则当前状态可以是0或者2,根据rdcost更新decision[0/2].rdcost的值
    m_prevStates[0].checkRdCosts( spt, pqData[0], pqData[2], decisions[0], decisions[2]);
    //前一个量化状态是1,则当前状态可以是2或者0,根据rdcost更新decision[2/0].rdcost的值
    m_prevStates[1].checkRdCosts( spt, pqData[0], pqData[2], decisions[2], decisions[0]);
    //前一个量化状态是2,则当前状态可以是1或3,根据rdcost更新decision[1/3].rdcost的值
    m_prevStates[2].checkRdCosts( spt, pqData[3], pqData[1], decisions[1], decisions[3]);
    //前一个量化状态是3,则当前状态可以是3或者1,根据rdcost更新decision[3/1].rdcost的值
    m_prevStates[3].checkRdCosts( spt, pqData[3], pqData[1], decisions[3], decisions[1]);
    if( spt==SCAN_EOCSBB )
    {
        m_skipStates[0].checkRdCostSkipSbb( decisions[0] );
        m_skipStates[1].checkRdCostSkipSbb( decisions[1] );
        m_skipStates[2].checkRdCostSkipSbb( decisions[2] );
        m_skipStates[3].checkRdCostSkipSbb( decisions[3] );
    }
    //初始化状态0和2的RD Cost
    m_startState.checkRdCostStart( lastOffset, pqData[0], decisions[0] );
    m_startState.checkRdCostStart( lastOffset, pqData[2], decisions[2] );
  }

There are still many details that I haven't understood. . . Later, if you have time to understand, please add

Guess you like

Origin blog.csdn.net/BigDream123/article/details/106411490