H.266/VVC technology learning: motion compensated temporal filter (motion compensated temporal filter)

Motion compensated temporal filter (MCTF)

VTM supports Motion compensated temporal filter (MCTF). MCTF is a pre-encoding processing tool, that is, before the video frame is encoded, the frame is temporally filtered. This tool is controlled by the TemporalFilter option in CTC.

filtering process

Step1:

The strength of temporal filtering can be adjusted for different pictures in each GOP by using one or more TemporalFilterStrengthFrame# configuration options, where "#" is an integer. The specified strength applies to all images whose POC numbers are divisible by "#"; if more than one option is like this, the value of the option with the highest "#" value is used.

The design of MCTF is only suitable for pictures with low Id in the coding time domain. For example,

For the random access configuration (random access), the following two configurations can be preferentially used:

TemporalFilterStrengthFrame8  : 0.95

TemporalFilterStrengthFrame16 : 1.5

This will apply the filter to all pictures where (POC%8) == 0, the filter strength of different pictures is as follows:

 where n is the POC value of the picture.

 For low-delay configurations (low-delay), the following configurations can be preferred:

TemporalFilterStrengthFrame4 : 0.4

This will apply a filter of the following strengths to all pictures where (POC%4) == 0: 

 It is not recommended to apply this filter in All Intra configuration.

Step 2:

Applying MCTF, reads the four pictures preceding and the four pictures following the current picture (in display order) directly from the source video file; therefore, available pictures may exist outside the current GOP.

Only when the TemporalFilterFutureReference configuration option is set to 1, will the pictures after the current picture be used in display order. In low-latency configurations, this option is usually set to 0.

Step 3:

For each 8x8 block of the current picture, its motion is estimated relative to the available temporal neighboring pictures.

A layered motion estimation scheme is employed, as shown in the figure above. L0 is the original resolution, L1 is the downsampled version of L0, L2 is the downsampled version of L1, where the width and height of L1 are half the width and height of L0, and the width and height of L2 are half the width and height of L1 . Downsampling is obtained by computing the average of four corresponding sample values.

Before motion estimation, the current frame and the reference frame are down-sampled twice as shown above. Then the motion estimation process is performed:

  1. Motion estimation is performed for each 16x16 block in L2. Compute the sum of squared differences for each selected motion vector and select the motion vector corresponding to the smallest difference.
  2. When estimating motion in L1, the motion vector obtained in step 1 is used as the initial value. The same motion estimation is then performed for each 16x16 block in L0.
  3. When estimating motion in L0, use the motion vector obtained in step 2 as an initial value, and then perform the same motion estimation for each 16x16 block in L0.
  4. Using the motion vector obtained in step 3 as an initial value, use the following 8-tap interpolation filter to estimate the fractional precision motion of each 8x8 block.
const int EncTemporalFilter::m_interpolationFilter[16][8] =
{
  {   0,   0,   0,  64,   0,   0,   0,   0 },   //0
  {   0,   1,  -3,  64,   4,  -2,   0,   0 },   //1 -->-->
  {   0,   1,  -6,  62,   9,  -3,   1,   0 },   //2 -->
  {   0,   2,  -8,  60,  14,  -5,   1,   0 },   //3 -->-->
  {   0,   2,  -9,  57,  19,  -7,   2,   0 },   //4
  {   0,   3, -10,  53,  24,  -8,   2,   0 },   //5 -->-->
  {   0,   3, -11,  50,  29,  -9,   2,   0 },   //6 -->
  {   0,   3, -11,  44,  35, -10,   3,   0 },   //7 -->-->
  {   0,   1,  -7,  38,  38,  -7,   1,   0 },   //8
  {   0,   3, -10,  35,  44, -11,   3,   0 },   //9 -->-->
  {   0,   2,  -9,  29,  50, -11,   3,   0 },   //10-->
  {   0,   2,  -8,  24,  53, -10,   3,   0 },   //11-->-->
  {   0,   2,  -7,  19,  57,  -9,   2,   0 },   //12
  {   0,   1,  -5,  14,  60,  -8,   2,   0 },   //13-->-->
  {   0,   1,  -3,   9,  62,  -6,   1,   0 },   //14-->
  {   0,   0,  -2,   4,  64,  -3,   1,   0 }    //15-->-->
};

Step 4:

The pictures before and after the current picture are motion compensated according to the best motion vector of each 8×8 block to align the sample coordinates of each block in the current picture with the best matching coordinates in the reference picture.

Step 5:

Filter the luminance and chrominance components of the current picture respectively, and the filtering process is as follows:

Calculate the new sample value In of the current image using the following formula:

 where Io is the value of the original sample, Ir(i) is the value of the corresponding sample in motion-compensated picture i, and Wr(I, a) is the weight of motion-compensated picture I when the number of available motion-compensated pictures is equal to a. When all source images are available, if TemporalFilterFutureReference is equal to 0, a is equal to 4, if TemporalFilterFutureReference is equal to 1, then a is equal to 8.

For luma samples , the weights Wr(i,a) are calculated as follows:

 in,

 For i and a in the remaining cases:

 The calculation of adjustment coefficients wa and σw is used to calculate Wr(i,a) as follows:

Where noise and error are calculated with a luma block granularity of 8×8 and a chroma block granularity of 4×4.

 For chroma samples , the weights Wr(i,a) are calculated as follows:

 in:

 Step 6:

Encode the filtered image.

In VTM, the entry function of MCTF: filter function

bool EncTemporalFilter::filter(PelStorage *orgPic, int receivedPoc)
{
  bool isFilterThisFrame = false;
  if (m_QP >= 17)  // disable filter for QP < 17 Qp < 17时,不使用滤波
  {
    for (map<int, double>::iterator it = m_temporalFilterStrengths.begin(); it != m_temporalFilterStrengths.end(); ++it)
    {
      int filteredFrame = it->first;
      if (receivedPoc % filteredFrame == 0)
      {
        isFilterThisFrame = true; // 是filteredFrame的整数倍才会进行滤波
        break;
      }
    }
  }

  if (isFilterThisFrame) // 对当前帧进行滤波
  {
    int offset = m_FrameSkip;
    VideoIOYuv yuvFrames;
    yuvFrames.open(m_inputFileName, false, m_inputBitDepth, m_MSBExtendedBitDepth, m_internalBitDepth);
    // 跳过前面的帧,直到当前POC - 4
    yuvFrames.skipFrames(std::max(offset + receivedPoc - m_range, 0), m_sourceWidth - m_pad[0], m_sourceHeight - m_pad[1], m_chromaFormatIDC);

    std::deque<TemporalFilterSourcePicInfo> srcFrameInfo;

    int firstFrame = receivedPoc + offset - m_range; // 起始帧 当前帧POC-4
    int lastFrame  = receivedPoc + offset + m_range; // 最后帧 当前帧POC+4
    if (!m_gopBasedTemporalFilterFutureReference)
    {
      lastFrame = receivedPoc + offset - 1; // 不用未来帧参考
    }
    int origOffset = -m_range;

    // subsample original picture so it only needs to be done once 对原始图片进行子采样,因此只需执行一次
    PelStorage origPadded;

    origPadded.create(m_chromaFormatIDC, m_area, 0, m_padding);
    origPadded.copyFrom(*orgPic);
    origPadded.extendBorderPel(m_padding, m_padding);

    PelStorage origSubsampled2; // 下采样一次,对应L1层
    PelStorage origSubsampled4; // 下采样两次,对于L2层
    // 下采样
    subsampleLuma(origPadded, origSubsampled2);
    subsampleLuma(origSubsampled2, origSubsampled4);

    // determine motion vectors 遍历相邻帧 确定运动矢量
    for (int poc = firstFrame; poc <= lastFrame; poc++)
    {
      if (poc < 0)
      {
        origOffset++;
        continue; // frame not available POC号小于0,视频的起始位置处,跳过
      }
      else if (poc == offset + receivedPoc)
      { // hop over frame that will be filtered 跳过将被滤波的帧
        yuvFrames.skipFrames(1, m_sourceWidth - m_pad[0], m_sourceHeight - m_pad[1], m_chromaFormatIDC);
        origOffset++;
        continue;
      }

      srcFrameInfo.push_back(TemporalFilterSourcePicInfo());
      TemporalFilterSourcePicInfo &srcPic = srcFrameInfo.back();

      PelStorage dummyPicBufferTO; // Only used temporary in yuvFrames.read
      srcPic.picBuffer.create(m_chromaFormatIDC, m_area, 0, m_padding);
      dummyPicBufferTO.create(m_chromaFormatIDC, m_area, 0, m_padding);
      if (!yuvFrames.read(srcPic.picBuffer, dummyPicBufferTO, m_inputColourSpaceConvert, m_pad, m_chromaFormatIDC, m_clipInputVideoToRec709Range))
      {
        return false; // eof or read fail 读取文件失败
      }
      srcPic.picBuffer.extendBorderPel(m_padding, m_padding);
      srcPic.mvs.allocate(m_sourceWidth / 4, m_sourceHeight / 4);
      // 进行运动估计
      motionEstimation(srcPic.mvs, origPadded, srcPic.picBuffer, origSubsampled2, origSubsampled4);
      srcPic.origOffset = origOffset;
      origOffset++;
    }

    // filter 滤波
    PelStorage newOrgPic; // 存储滤波后的图
    newOrgPic.create(m_chromaFormatIDC, m_area, 0, m_padding);
    double overallStrength = -1.0;
    for (map<int, double>::iterator it = m_temporalFilterStrengths.begin(); it != m_temporalFilterStrengths.end(); ++it)
    {
      int frame = it->first;
      double strength = it->second;
      if (receivedPoc % frame == 0) // 根据POC号确定滤波强度
      {
        overallStrength = strength;
      }
    }
    // 滤波
    bilateralFilter(origPadded, srcFrameInfo, newOrgPic, overallStrength);

    // move filtered to orgPic 将滤波后的图拷贝到原始图
    orgPic->copyFrom(newOrgPic);

    yuvFrames.close();
    return true;
  }
  return false;
}

Guess you like

Origin blog.csdn.net/BigDream123/article/details/123590364