[Data Association] Corresponding feature association based on Patch, associating the current frame -> reference frame, inter-frame tracking

insert image description here

1. WarpPixelWise (find the current frame feature point position)

1.1 Functions

This is to project the pixels around a feature point in the reference frame onto the current frame according to the camera pose in the current frame , and store its pixel values ​​in a given array.

1.2 Function input and output

Input: The input of the function includes the current frame, the reference frame, the feature points in the reference frame, the pyramid level of the reference frame and the current frame, and the half-patch size. Output: A pointer to an array of output
pixel values.

1.3 Algorithm steps

  1. Calculate the distance from the feature point in the reference frame to the reference frame camera and the distance to the current frame camera;
  2. Back-project the coordinates of the feature points in the reference frame camera coordinate system into the three-dimensional space, and scale according to the distance to the camera to obtain the three-dimensional coordinates;
  3. Transform the three-dimensional coordinates into the current frame camera coordinate system, and project it onto the current frame image to obtain pixel coordinates;
  4. Scale the pixel coordinates to the search level and do the following for each pixel within half the patch size:
    • Transform pixel coordinates to three-dimensional coordinates in the reference frame camera coordinate system;
    • Project the three-dimensional coordinates onto the reference frame image to obtain pixel coordinates;
    • Scale the pixel coordinates to the reference frame pyramid level;
    • Computes the value of this pixel on the reference frame image according to bilinear interpolation and stores it in the output array.

The mathematical formula is described as follows:

p c u r = T c u r w o r l d T r e f w o r l d − 1 p r e f \mathbf{p}_{cur} = \mathbf{T}_{cur}^{world} \mathbf{T}_{ref}^{world^{-1}} \mathbf{p}_{ref} pcur=TcurworldTrefworld1pref

p r e f = d r e f ∥ p r e f c a m ∥ p r e f c a m \mathbf{p}_{ref} = \frac{d_{ref}}{\|\mathbf{p}_{ref}^{cam}\|}\mathbf{p}_{ref}^{cam} pref=prefcamdrefprefcam

p c u r s e a r c h = p c u r 2 l e v e l c u r \mathbf{p}_{cur}^{search} = \frac{\mathbf{p}_{cur}}{2^{level_{cur}}} pcursearch=2levelcurpcur

p e l e s e a r c h = p e l e p a t c h + p c u r s e a r c h \mathbf{p}_{ele}^{search} = \mathbf{p}_{ele}^{patch} + \mathbf{p}_{cur}^{search} pand l andsearch=pand l andpatch+pcursearch

pelecam = pelesearch 2 levelcur \mathbf{p}_{ele}^{cam} = \frac{\mathbf{p}_{ele}^{search}}{2^{level_{cur}}}pand l andcam=2levelcurpand l andsearch

p e l e w o r l d = T r e f w o r l d T c u r w o r l d − 1 p e l e c a m \mathbf{p}_{ele}^{world} = \mathbf{T}_{ref}^{world} \mathbf{T}_{cur}^{world^{-1}} \mathbf{p}_{ele}^{cam} pand l andworld=TrefworldTcurworld1pand l andcam

peleref = 1 2 levelref K refpelecam \mathbf{p}_{ele}^{ref} = \frac{1}{2^{level_{ref}}}\mathbf{K}_{ref}\mathbf{p} _{ele}^{cam}pand l andref=2levelref1Krefpand l andcam

I ( firstref ) = w 00 I ( ⌊ first , x ⌋ , ⌊ first , y ⌋ ) + w 01 I ( ⌊ first , x ⌋ , ⌊ first , y ⌋ + 1 ) + w 10 I ( ⌊ first , x ⌋ + 1 , ⌊ before , y ⌋ ) + w 11 I ( ⌊ before , x ⌋ + 1 , ⌊ before , y ⌋ + 1 ) I(\mathbf{p}_{ele}^{ref}) = \\ w_ {00}I(\lfloor \mathbf{p}_{ele,x}\rfloor,\lfloor \mathbf{p}_{ele,y}\rfloor) + w_{01}I(\lfloor \mathbf{p }_{ele,x}\rfloor,\lfloor \mathbf{p}_{ele,y}\rfloor+1) + \\w_{10}I(\lfloor \mathbf{p}_{ele,x} \rfloor+1,\lfloor \mathbf{p}_{ele,y}\rfloor) + w_{11}I(\lfloor \mathbf{p}_{ele,x}\rfloor+1,\lfloor \mathbf {p}_{ele,y}\rfloor+1)I(pand l andref)=w00I(⌊pe l e , x,phe and he _ _⌋)+w01I(⌊pe l e , x,phe and he _ _+1)+w10I(⌊pe l e , x+1,phe and he _ _⌋)+w11I(⌊pe l e , x+1,phe and he _ _+1)

where, T \mathbf{T}T represents the camera pose,p \mathbf{p}p represents pixel coordinates,ddd represents the distance,∥ ⋅ ∥ \|\cdot\| represents the modulus of the vector,K \mathbf{K}K represents the camera internal reference matrix,I ( ⋅ ) I(\cdot)I ( ) represents the value of a certain pixel on the image,w 00 , w 01 , w 10 , w 11 w_{00},w_{01},w_{10},w_{11}w00,w01,w10,w11Represents bilinear interpolation weights.

bool WarpPixelWise(const Frame& cur_frame, const Frame& ref_frame, const FeatureWrapper& ref_ftr,
    const int level_ref, const int level_cur, const int half_patch_size, uint8_t* patch) {
    
    
  double depth_ref = (ref_frame.pos() - ref_ftr.landmark->pos()).norm();
  double depth_cur = (cur_frame.pos() - ref_ftr.landmark->pos()).norm();

  // back project to 3D points in reference frame
  Eigen::Vector3d xyz_ref;
  ref_frame.cam()->backProject3(ref_ftr.px, &xyz_ref);
  xyz_ref = xyz_ref.normalized() * depth_ref;

  // project to current frame and convert to search level
  Eigen::Vector3d xyz_cur = cur_frame.T_cam_world() * (ref_frame.T_cam_world().inverse()) * xyz_ref;
  Eigen::Vector2d px_cur;
  cur_frame.cam()->project3(xyz_cur, &px_cur);
  Eigen::Vector2d px_cur_search = px_cur / (1 << level_cur);

  // for each pixel in the patch(on search level):
  // - convert to image level
  // - back project to 3D points
  // - project to ref frame and find pixel value in ref level
  uint8_t* patch_ptr = patch;
  const cv::Mat& img_ref = ref_frame.img_pyr_[level_ref];
  const int stride = img_ref.step.p[0];

  for (int y = -half_patch_size; y < half_patch_size; ++y) {
    
    
    for (int x = -half_patch_size; x < half_patch_size; ++x, ++patch_ptr) {
    
    
      const Eigen::Vector2d ele_patch(x, y);
      Eigen::Vector2d ele_search = ele_patch + px_cur_search;
      Eigen::Vector3d ele_xyz_cur;
      cur_frame.cam()->backProject3(ele_search * (1 << level_cur), &ele_xyz_cur);
      ele_xyz_cur = ele_xyz_cur.normalized() * depth_cur;
      Eigen::Vector3d ele_xyz_ref =
          ref_frame.T_cam_world() * (cur_frame.T_cam_world().inverse()) * ele_xyz_cur;
      Eigen::Vector2d ele_ref;
      ref_frame.cam()->project3(ele_xyz_ref, &ele_ref);
      ele_ref = ele_ref / (1 << level_ref);

      const int xi = std::floor(ele_ref[0]);
      const int yi = std::floor(ele_ref[1]);
      if (xi < 0 || yi < 0 || xi + 1 >= img_ref.cols || yi + 1 >= img_ref.rows) {
    
    
        VLOG(200) << "ref image: col-" << img_ref.cols << ", row-" << img_ref.rows;
        VLOG(200) << "xi: " << xi << ", "
                  << "yi: " << yi;
        return false;
      } else {
    
    
        const float subpix_x = ele_ref[0] - xi;
        const float subpix_y = ele_ref[1] - yi;
        const float w00 = (1.0f - subpix_x) * (1.0f - subpix_y);
        const float w01 = (1.0f - subpix_x) * subpix_y;
        const float w10 = subpix_x * (1.0f - subpix_y);
        const float w11 = 1.0f - w00 - w01 - w10;
        const uint8_t* const ptr = img_ref.data + yi * stride + xi;
        *patch_ptr = static_cast<uint8_t>(
            w00 * ptr[0] + w01 * ptr[stride] + w10 * ptr[1] + w11 * ptr[stride + 1]);
      }
    }
  }

  return true;
}

2. GetWarpMatrixAffine (calculate current frame -> reference frame affine transformation matrix)

2.1 Functions

The role of the function is to calculate the affine transformation matrix from the current frame to the reference frame , so that the projection of a feature point in the reference frame in the current frame coincides with the corresponding patch in the current frame.

2.2 Function input and output

Input: The input of the function includes the camera parameters of the reference frame and the current frame, the feature points in the reference frame, the direction vector of the feature points, the distance from the feature points to the reference frame camera, the pose transformation between the current frame and the reference frame, and the reference The pyramid level of the frame.
output: A pointer to the output affine transformation matrix.

2.3 Algorithm steps

The implementation steps of the function are as follows:

  1. Calculate the three-dimensional coordinates according to the direction vector and distance of the feature points;
  2. Move half the patch size along the horizontal and vertical directions in the reference frame, and back-project the moved pixel coordinates into the three-dimensional space to obtain three-dimensional vectors in two directions;
  3. Transform the three-dimensional vector into the current frame camera coordinate system, and project it onto the current frame to obtain three pixel coordinates;
  4. Computes an affine transformation matrix from three pixel coordinates.

The mathematical formula is described as follows:

x r e f = f r e f d r e f x d u , r e f = J d u , r e f x r e f x d v , r e f = J d v , r e f x r e f x c u r = T c u r _ r e f x r e f x d u , c u r = T c u r _ r e f x d u , r e f x d v , c u r = T c u r _ r e f x d v , r e f A c u r _ r e f = [ x d u , c u r − x c u r k H a l f P a t c h S i z e x d v , c u r − x c u r k H a l f P a t c h S i z e ] \begin{aligned} \mathbf{x}_{ref} &= \mathbf{f}_{ref} d_{ref} \\ \mathbf{x}_{du,ref} &= \mathbf{J}_{du,ref} \mathbf{x}_{ref} \\ \mathbf{x}_{dv,ref} &= \mathbf{J}_{dv,ref} \mathbf{x}_{ref} \\ \mathbf{x}_{cur} &= T_{cur\_ref} \mathbf{x}_{ref} \\ \mathbf{x}_{du,cur} &= T_{cur\_ref} \mathbf{x}_{du,ref} \\ \mathbf{x}_{dv,cur} &= T_{cur\_ref} \mathbf{x}_{dv,ref} \\ \mathbf{A}_{cur\_ref} &= \begin{bmatrix}\frac{\mathbf{x}_{du,cur}-\mathbf{x}_{cur}}{kHalfPatchSize} & \frac{\mathbf{x}_{dv,cur}-\mathbf{x}_{cur}}{kHalfPatchSize}\end{bmatrix} \end{aligned} xrefxd u , re fxdv,refxcurxd u , c u rxd v , c u rAcur_ref=frefdref=Jd u , re fxref=Jdv,refxref=Tcur_refxref=Tcur_refxd u , re f=Tcur_refxdv,ref=[kHalfPatchSizexd u , c u rxcurkHalfPatchSizexd v , c u rxcur]

where fref \mathbf{f}_{ref}frefRepresents the direction vector of the feature point, dref d_{ref}drefIndicates the distance from the feature point to the reference frame camera, xref \mathbf{x}_{ref}xrefIndicates the three-dimensional coordinates of the feature point in the reference frame camera coordinate system, xdu , ref \mathbf{x}_{du,ref}xd u , re fxdv, ref \mathbf{x}_{dv,ref}xdv,refIndicates the three-dimensional coordinates in the reference frame after moving half the patch size in the horizontal and vertical directions, J du , ref \mathbf{J}_{du,ref}Jd u , re fJ dv , ref \mathbf{J}_{dv,ref}Jdv,refRespectively represent the three-dimensional vector corresponding to the corresponding pixel coordinates in the reference frame camera coordinate system after moving half the patch size in the horizontal and vertical directions, T cur _ ref T_{cur\_ref}Tcur_refRepresents the pose transformation between the current frame and the reference frame, xcur \mathbf{x}_{cur}xcurxdu , cur \mathbf{x}_{du,cur}xd u , c u rxdv, cur\mathbf{x}_{dv,cur}xd v , c u rRespectively represent xref \mathbf{x}_{ref}xrefxdu , ref \mathbf{x}_{du,ref}xd u , re fxdv, ref \mathbf{x}_{dv,ref}xdv,refThe three-dimensional coordinates obtained after transforming to the camera coordinate system of the current frame, xdu , cur − xcurk H alf P atch S ize \frac{\mathbf{x}_{du,cur}-\mathbf{x}_{cur}} {kHalfPatchSize}kHalfPatchSizexd u , c u rxcur x d v , c u r − x c u r k H a l f P a t c h S i z e \frac{\mathbf{x}_{dv,cur}-\mathbf{x}_{cur}}{kHalfPatchSize} kHalfPatchSizexd v , c u rxcurrespectively represent the three-dimensional vector corresponding to the corresponding pixel coordinates in the current frame camera coordinate system after moving half the patch size in the horizontal and vertical directions, A cur _ ref \mathbf{A}_{cur\_ref}Acur_refRepresents the affine transformation matrix from the current frame to the reference frame.

void GetWarpMatrixAffine(const CameraPtr& cam_ref, const CameraPtr& cam_cur,
    const Eigen::Ref<Keypoint>& px_ref, const Eigen::Ref<BearingVector>& f_ref,
    const double depth_ref, const Transformation& T_cur_ref, const int level_ref,
    AffineTransform* A_cur_ref) {
    
    
  CHECK_NOTNULL(A_cur_ref);

  // Compute affine warp matrix A_ref_cur
  const int kHalfPatchSize = 5;
  const Position xyz_ref = f_ref * depth_ref;
  Position xyz_du_ref, xyz_dv_ref;
  // NOTE: project3 has no guarantee that the returned vector is unit length
  // - for pinhole: z component is 1 (unit plane)
  // - for omnicam: norm is 1 (unit sphere)
  cam_ref->backProject3(
      px_ref + Eigen::Vector2d(kHalfPatchSize, 0) * (1 << level_ref), &xyz_du_ref);
  cam_ref->backProject3(
      px_ref + Eigen::Vector2d(0, kHalfPatchSize) * (1 << level_ref), &xyz_dv_ref);
  if (cam_ref->getType() == Camera::Type::kPinhole) {
    
    
    xyz_du_ref *= xyz_ref[2];
    xyz_dv_ref *= xyz_ref[2];
  } else {
    
    
    xyz_du_ref.normalize();
    xyz_dv_ref.normalize();
    xyz_du_ref *= depth_ref;
    xyz_dv_ref *= depth_ref;
  }

  Keypoint px_cur, px_du_cur, px_dv_cur;
  cam_cur->project3(T_cur_ref * xyz_ref, &px_cur);
  cam_cur->project3(T_cur_ref * xyz_du_ref, &px_du_cur);
  cam_cur->project3(T_cur_ref * xyz_dv_ref, &px_dv_cur);
  A_cur_ref->col(0) = (px_du_cur - px_cur) / kHalfPatchSize;
  A_cur_ref->col(1) = (px_dv_cur - px_cur) / kHalfPatchSize;
}



3. GetWarpMatrixAffine (calculate current frame -> reference frame affine transformation matrix)

3.1 Functions

The function of this function is to calculate which pyramid level to search for matching points to obtain the best performance .

3.2 Function input and output

Input: Its input includes an affine transformation matrix and a maximum pyramid level.
Output: The output is a pyramid level.

3.3 Algorithm steps

  1. Calculate the determinant D of the affine transformation matrix;
  2. If D is greater than 3.0 and the search level is smaller than the maximum pyramid level, add 1 to the search level and multiply D by 0.25 until D is less than or equal to 3.0 or the search level reaches the maximum pyramid level.

The mathematical formula is described as follows:

s e a r c h _ l e v e l = min ⁡ ( max ⁡ _ l e v e l , max ⁡ { n ∣ D ( A c u r _ r e f ) > 3. 0 n } ) search\_level = \min(\max\_level, \max\{n|D(A_{cur\_ref}) > 3.0^n\}) search_level=min(max_level,max{ nD(Acur_ref)>3.0n})

其中, D ( A c u r _ r e f ) D(A_{cur\_ref}) D(Acur_ref) represents the affine transformation matrixA cur _ ref A_{cur\_ref}Acur_refdeterminant of .

The principle of this function is to choose the pyramid level to search according to the uncertainty of the affine transformation matrix. When the affine transformation matrix is ​​more uncertain, it is necessary to search on higher pyramid levels to obtain better matching performance. When the affine transformation matrix is ​​more definite, the search can be performed on a lower pyramid level to improve computational efficiency.


int GetBestSearchLevel(const AffineTransform& A_cur_ref, const int max_level) {
    
    
  // Compute patch level in other image
  int search_level = 0;
  double D = A_cur_ref.determinant();
  while (D > 3.0 && search_level < max_level) {
    
    
    search_level += 1;
    D *= 0.25;
  }
  return search_level;
}

Guess you like

Origin blog.csdn.net/Darlingqiang/article/details/131332430