Optical flow estimation (1) Introduction and operation of optical flow

Today is the 29th of the Chinese New Year, and tomorrow we will post Spring Festival couplets! It can be regarded as rushing out an article a few years ago (I haven’t posted anything for too long o.o), and it can also be regarded as the beginning of my own research on the deep learning part of optical flow estimation. Graduation, internship, work and other things are piled up like a mountain in front of me. I hope everything goes well next year!

1. The basic concept of optical flow

1. Optical flow and optical flow field

(1) Optical flow

​Optical flow is the instantaneous velocity of the pixel movement of a spatially moving object on the observation imaging plane . Some people also define the instantaneous change rate of grayscale/brightness of a specific pixel in a two-dimensional plane image as optical flow ( Optical flow can also be defined as the distribution of apparent velocities of movement of brightness pattern in an image.), when the time interval is very Hours (such as between two consecutive frames of a video) are also equivalent to the displacement of the projection point of a spatial point in the imaging plane . Generally speaking, the optical flow is caused by the movement of the foreground object itself in the scene, the movement of the camera, or the joint movement of the two, resulting in relative movement.

​Plainly speaking, when the human eye observes a moving object, the scene of the object forms a series of continuously changing images on the retina of the human eye, and this series of continuously changing information continuously "flows" through the retina (that is, the image plane ), forming a series of movements of image brightness/grayscale patterns, like a "flow" of light, so it is called optical flow. It can also be understood as the flow of pixel intensities in an image. Optical flow expresses the change of the image, and because it contains the information of the target's motion, it can be used by the observer to determine the motion of the target.

​Image X (t-1 frame) and image Y (t frame) are generally two adjacent frames in the video stream, and the position of a certain pixel A in image X is ( x 1 , y 1 ) (x_1, y_1 )(x1,y1) , the position in image Y after motion is( x 2 , y 2 ) (x_2,y_2)(x2,y2) , thenthe optical flow of this pixel point A can be calculated as( ux , uy ) = ( x 2 , y 2 ) − ( x 1 , y 1 ) (u_x,u_y) = (x_2,y_2) - (x_1, y_1)(ux,uy)=(x2,y2)(x1,y1), vector( ux , uy ) (u_x,u_y)(ux,uy) is the optical flow generated by the pixel, which includes motion in the x direction and y direction, and the value of the optical flow is a sub-pixel floating-point value. As shown in Figure 1 below, point A in the left picture moves to the right picture, and the arrow indicates the optical flow vector of point A between two adjacent frames.
insert image description here

(2) Optical flow field

​The collection of a series The optical flow field refers to a two-dimensional (2D) instantaneous velocity field composed of a series of pixels in the image. The two-dimensional velocity vector is the projection of the three-dimensional velocity vector of the visible point in the target object on the imaging surface. Generally speaking, the three-dimensional motion field Corresponds to an optical flow field formed on a two-dimensional image by projection.
insert image description here

2. Optical flow method and sports field

(1) Sports field

​The sports field is actually the movement of objects in the three-dimensional real world. A series of motion vectors constitute the sports field , which can be used to describe the real state of motion of objects. The motion field in space is projected onto a two-dimensional image plane (human eyes or camera) to represent an optical flow field. The relationship between the motion field and the optical flow field is shown in the figure below.
insert image description here
insert image description here

​It can be seen from the above that the relationship between the optical flow vector and the motion vector is very close, but the two are not completely consistent, or there is not necessarily a complete correspondence between the sports field and its optical flow field. Because the generation of optical flow is due to the relative motion between the object and the camera, and this relative motion is not completely equal to the real motion state. There is a common example in life: Barbershops often use rotating signs to attract customers (as shown in the figure below). At this time, from the perspective of the optical flow field, the signs are rotating upward; but from the perspective of the sports field From the looks of it, the signboard actually moves horizontally. Therefore, the optical flow field is not strictly equal to its sports field , but this situation is uncommon. In most cases, the performance of the two is consistent. Then we will assume that the optical flow field corresponds to the sports field , which is also Our application of it to problems related to motion vectors provides the possibility.
insert image description here

(2) Optical flow method

​Because it is very difficult to directly calculate and analyze the motion field in three-dimensional space, and the motion field corresponds to the optical flow field formed on the two-dimensional image by projection, and the optical flow is essentially the projection of the object motion of the three-dimensional scene to the two-dimensional image plane The brightness changes of the pixels represented not only contain the motion information of the observed object, but also contain rich information about the three-dimensional structure of the scene. Therefore, moving objects can be analyzed through optical flow analysis , and optical flow method has emerged as the times require and has become an important tool for analyzing moving objects in computer vision.

​The optical flow method uses the changes of pixels in the image sequence in the time domain and the correlation between adjacent frames to find the corresponding relationship between the previous frame and the current frame, so as to calculate the distance of objects between adjacent frames. A method of motion information, that is, a method of estimating the real motion state of an object (optical flow field => motion field) from two-dimensional image data . The purpose of studying the optical flow field is to approximate the motion field that cannot be obtained directly from the sequence images, and estimate the speed and direction of the object's movement according to the intensity change of the gray value of the pixel in the image. In a general ideal case, the optical flow field should correspond to the motion field.

3. Sparse optical flow and dense optical flow

​The ideal output of an optical flow calculation is an estimated correlation of the velocity of each pixel in the two images, or equivalently, a displacement vector for each pixel in one image, indicating the relative position of that pixel in the other image, If this method is used for each pixel in the image, it is usually called " dense optical flow "; if only a subset of certain points in the image is tracked , it is called " sparse optical flow ".

(1) dense optical flow

​Dense optical flow is an optical flow calculation method that performs point-by-point matching calculations for an image or a specified area . It calculates the offset of all points on the image to form a dense optical flow field. Through this dense optical flow field, pixel-level image registration can be performed. Because the optical flow vector is dense, the effect of its registration is obviously better than that of sparse optical flow registration. However, its side effects are also relatively obvious. Because the offset of each pixel needs to be calculated, the calculation amount is also obviously large, and the timeliness is poor.

(2) Sparse optical flow

​Sparse optical flow does not perform point-by-point calculations for each pixel of the image. It usually needs to specify a set of points for tracking . This set of points should preferably have some obvious characteristics, such as Harris corner points, etc., so that the tracking will be relatively Stable and reliable. Sparse tracking is much less computationally expensive than dense tracking.
insert image description here

4. Optical flow estimation

​Optical flow estimation is a method for establishing/computing the optical flow field between two frames of images. Common optical flow estimation methods include gradient-based methods, matching-based methods, deep learning-based methods, energy-based methods, etc. In the following studies, we will focus on deep learning-based methods for research. From the above introduction, it can be seen that the core problem of optical flow estimation is how to match the corresponding pixels of two pictures, so as to calculate the corresponding optical flow vector.
insert image description here

2. Representation and visualization of optical flow

1. Optical flow representation method

​The result of optical flow estimation is a matrix of the same size as the original image双通道 , which is generally represented by a three-dimensional floating-point array [height, width, 2] . Among them, the first channel (height, width, 0) represents the x-direction (horizontal direction or image row vector direction) offset vector of the pixel at (height, width); and the second channel (height, width, 1) represents ( height, width) offset vector in the y direction (vertical direction or image column vector direction) of the pixel. To be careful of:

  • The numerical value in the optical flow array indicates the offset, and the positive and negative values ​​indicate the offset direction
  • The final application object of the optical flow array is the pixel coordinates (x, y) of the two-dimensional image frame , not the pixel value at (x, y)
  • The values ​​in the optical flow array are floating point numbers, not integers. This means that after the optical flow warp operation, the pixel coordinates of the t-1 frame image may not exactly fall on the integer coordinates of the t frame image, and other processing may be required

2. Optical flow visualization

​The calculation result of the optical flow is a two-channel three-dimensional floating-point number array. In many cases, we need to visualize the optical flow result to intuitively reflect the motion state of the object. As a vector field, the optical flow field needs to show the magnitude and direction of each vector at the same time when it is visualized . Therefore, there are two common optical flow visualization methods as follows:

(1) Optical flow arrow diagram

​The simplest visualization method is to use arrows to represent the optical flow, where the direction and length of the arrow represent the direction and size of each optical flow vector , as shown in the figure below. The advantage of using arrows to represent optical flow is that it is simple and intuitive, but there are also some shortcomings. First of all, in this way, the optical flow of each pixel is represented by arrows one by one. When the image resolution is increased and the optical flow pixels are very dense, the full representation of the optical flow cannot be achieved in a limited area; secondly, with The increase in arrow density will make the image very messy. Therefore, the optical flow arrow diagram is generally suitable for the visualization of sparse optical flow .
insert image description here

(2) Optical flow chromaticity diagram

​For dense optical flow, we can color the optical flow with the result of optical flow calculation through the color model, and visualize it in the form of pseudo-color map (RGB color map) . Among them, the hue (or hue, that is, different colors) indicates the direction of motion , the saturation (or hue intensity, that is, the depth of color) indicates the speed of motion or the size of the offset , and the most central position represents no movement and offset occurs, up The color is white. There are two main methods to convert the optical flow calculation results into a chromaticity diagram, which we will introduce below. The Color wheel for the visualization reference of the entire chromaticity diagram is as follows:
insert image description here

(1) HSV color model conversion

​HSV is a method of representing the RGB color space as a cone, where H stands for hue (hue, color type), S stands for saturation (hue intensity, color depth), and V stands for value (brightness). The three main components of the HSV model are introduced as follows:

  • H (hue, hue): This parameter is represented by an angle, and the value range is 0°~360°. If starting from red and counting counterclockwise, red is 0°, green is 120°, and blue is 240°. Their complementary colors are: yellow is 60°, cyan is 180°, purple is 300°;
  • S (saturation, hue strength): Radial scale, the value range is 0.0~1.0. Saturation indicates the degree to which the color is close to the spectral color, and any color can be regarded as the result of mixing a certain spectral color with white. Among them, the larger the proportion of spectral color, the higher the degree of color close to spectral color, the higher the saturation of color, and the deeper the color.
  • V (lightness, brightness): vertical direction, the value range is 0.0 (black, bottom) ~ 1.0 (white, top), indicating the brightness of the color.
    insert image description here

​When using The specific steps are:

  • First, the size of the optical flow should be normalized, and then the direction of the optical flow should be mapped to the hue component H. Because the optical flow field is a two-channel three-dimensional array, the vector at each position can be expressed as coordinates (x, y) in the Cartesian coordinate system. Then we can further convert the Cartesian coordinate system x and y into polar coordinates, where the polar angle actan2(y,x) represents the direction (mapped to the hue of the HSV model)
  • Then the magnitude of the optical flow is mapped to the saturation component S, the greater the optical flow, the greater the saturation, and the place where the optical flow is 0 is displayed in white. In the polar coordinate system, the polar radius (the square root of x and y) represents the size of the offset
  • Finally, the luminance component V can be unified to a single value. For example, in order to make the image easier to observe, it can be fixed to the brightest 255. Finally, convert the HSV model representation into a pseudo-color map in RGB format for display.

Reference articles and sample codes are as follows:

insert image description here

def viz_flow(flow):
    # 色调H:用角度度量,取值范围为0°~360°,从红色开始按逆时针方向计算,红色为0°,绿色为120°,蓝色为240°
    # 饱和度S:取值范围为0.0~1.0
    # 亮度V:取值范围为0.0(黑色)~1.0(白色)
    h, w = flow.shape[:2]
    hsv = np.zeros((h, w, 3), np.uint8)
    # cv2.cartToPolar(x, y[, magnitude[, angle[, angleInDegrees]]]) → magnitude, angle
    #	- params x,y:直角坐标系的横坐标、纵坐标,ndarray 多维数组,浮点型。默认为弧度制
    #	- return magnitude, angle:极坐标系下的极径值、极角值,ndarray 多维数组,与输入的 x, y 具有相同的尺寸和数据类型
    mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1])
    hsv[...,0] = ang*180/np.pi/2 # 弧度转化为角度
    hsv[...,1] = cv2.normalize(mag,None,0,255,cv2.NORM_MINMAX) #标准化
    # flownet是将V赋值为255, 此函数遵循flownet,饱和度S代表像素位移的大小,亮度都为最大,便于观看
    # 也有的光流可视化讲s赋值为255,亮度代表像素位移的大小,整个图片会很暗,很少这样用
    hsv[...,2] = 255
    # cv2.cvtColor(frame,COLOR_STYLE) 改变图像的颜色空间,opencv中默认的颜色空间是BGR
    # backward conversions HSV to RGB/BGR with H range 0..180 if 8 bit image
    bgr = cv2.cvtColor(hsv,cv2.COLOR_HSV2BGR)
    return bgr

(2) Munsell color system conversion

​This method uses the Munsell Color System (MunsellColor System, Wiki ) to transform the optical flow results. The MunsellColor System is a color description system created by the American artist Albert H. Munsell (Albert H. Munsell, 1858-1918) in 1898. It uses a cylinder to roughly divide the color space, and you can understand the specific knowledge by yourself. This method has been used in many articles (FlowNet, PWC-Net, etc.) to visualize the results of optical flow calculations, and has received a lot of verification.
insert image description here

Reference articles and sample codes are as follows:

import numpy as np

def make_colorwheel():
    """
    Generates a color wheel for optical flow visualization as presented in:
        Baker et al. "A Database and Evaluation Methodology for Optical Flow" (ICCV, 2007)
        URL: http://vision.middlebury.edu/flow/flowEval-iccv07.pdf
    Code follows the original C++ source code of Daniel Scharstein.
    Code follows the the Matlab source code of Deqing Sun.
    Returns:
        np.ndarray: Color wheel
    """

    RY = 15
    YG = 6
    GC = 4
    CB = 11
    BM = 13
    MR = 6

    ncols = RY + YG + GC + CB + BM + MR
    colorwheel = np.zeros((ncols, 3))
    col = 0

    # RY
    colorwheel[0:RY, 0] = 255
    colorwheel[0:RY, 1] = np.floor(255*np.arange(0,RY)/RY)
    col = col+RY
    # YG
    colorwheel[col:col+YG, 0] = 255 - np.floor(255*np.arange(0,YG)/YG)
    colorwheel[col:col+YG, 1] = 255
    col = col+YG
    # GC
    colorwheel[col:col+GC, 1] = 255
    colorwheel[col:col+GC, 2] = np.floor(255*np.arange(0,GC)/GC)
    col = col+GC
    # CB
    colorwheel[col:col+CB, 1] = 255 - np.floor(255*np.arange(CB)/CB)
    colorwheel[col:col+CB, 2] = 255
    col = col+CB
    # BM
    colorwheel[col:col+BM, 2] = 255
    colorwheel[col:col+BM, 0] = np.floor(255*np.arange(0,BM)/BM)
    col = col+BM
    # MR
    colorwheel[col:col+MR, 2] = 255 - np.floor(255*np.arange(MR)/MR)
    colorwheel[col:col+MR, 0] = 255
    return colorwheel


def flow_uv_to_colors(u, v, convert_to_bgr=False):
    """
    Applies the flow color wheel to (possibly clipped) flow components u and v.
    According to the C++ source code of Daniel Scharstein
    According to the Matlab source code of Deqing Sun
    Args:
        u (np.ndarray): Input horizontal flow of shape [H,W]
        v (np.ndarray): Input vertical flow of shape [H,W]
        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.
    Returns:
        np.ndarray: Flow visualization image of shape [H,W,3]
    """
    flow_image = np.zeros((u.shape[0], u.shape[1], 3), np.uint8)
    colorwheel = make_colorwheel()  # shape [55x3]
    ncols = colorwheel.shape[0]
    rad = np.sqrt(np.square(u) + np.square(v))
    a = np.arctan2(-v, -u)/np.pi
    fk = (a+1) / 2*(ncols-1)
    k0 = np.floor(fk).astype(np.int32)
    k1 = k0 + 1
    k1[k1 == ncols] = 0
    f = fk - k0
    for i in range(colorwheel.shape[1]):
        tmp = colorwheel[:,i]
        col0 = tmp[k0] / 255.0
        col1 = tmp[k1] / 255.0
        col = (1-f)*col0 + f*col1
        idx = (rad <= 1)
        col[idx]  = 1 - rad[idx] * (1-col[idx])
        col[~idx] = col[~idx] * 0.75   # out of range
        # Note the 2-i => BGR instead of RGB
        ch_idx = 2-i if convert_to_bgr else i
        flow_image[:,:,ch_idx] = np.floor(255 * col)
    return flow_image


def flow_to_color(flow_uv, clip_flow=None, convert_to_bgr=False):
    """
    Expects a two dimensional flow image of shape.
    Args:
        flow_uv (np.ndarray): Flow UV image of shape [H,W,2]
        clip_flow (float, optional): Clip maximum of flow values. Defaults to None.
        convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.
    Returns:
        np.ndarray: Flow visualization image of shape [H,W,3]
    """
    assert flow_uv.ndim == 3, 'input flow must have three dimensions'
    assert flow_uv.shape[2] == 2, 'input flow must have shape [H,W,2]'
    if clip_flow is not None:
        flow_uv = np.clip(flow_uv, 0, clip_flow)
    u = flow_uv[:,:,0]
    v = flow_uv[:,:,1]
    rad = np.sqrt(np.square(u) + np.square(v))
    rad_max = np.max(rad)
    epsilon = 1e-5
    u = u / (rad_max + epsilon)
    v = v / (rad_max + epsilon)
    return flow_uv_to_colors(u, v, convert_to_bgr)

3. Optical flow warp

​warp can be translated as distortion, deformation, and mapping . The meaning of the optical flow warp operation is to apply the calculated optical flow to the target image frame to obtain the resulting image frame after the optical flow affects the offset or motion transformation. . For example, if there is an optical flow of t->t+1 image frame, we apply it to the image frame t, then the ideal result should be the image frame t+1, and the whole transformation process is a kind of optical flow warp operation. Optical flow warp mainly includes forward warp and backward warp , which will be introduced next.
insert image description here

1.fordward warp

​If there are two frames of adjacent images (or left and right) I 1 , I 2 I_1,I_2I1,I2,对应 I 1 → I 2 I_1 \rightarrow I_2 I1I2The optical flow Flow is F 1 → 2 F_{1\rightarrow2}F12(Optical flow is the offset relationship of corresponding points on the two images), then under the forward warp operation, the pixel value of the first frame image I 1 ( x , y ) I_1(x,y)I1(x,y ) will appear on the second frame of the imageI 2 ( ( x , y ) + F 1 → 2 ) I_2((x,y)+F_{1\rightarrow2})I2((x,y)+F12)坐标位置上,即I 1 ( x , y ) = I 2 ( x + Δ x , y + Δ y ) I_1(x,y) = I_2(x+\Delta x,y+\Delta y)I1(x,y)=I2(x+Δx , _y+Δy),其中 F 1 → 2 ( x , y ) = ( Δ x , Δ y ) F_{1\rightarrow2}(x,y) = (\Delta x,\Delta y) F12(x,y)=( Δ x ,y ) _ _

​Forward warp can be simply understood as "forward deformation/distortion", that is, the direction of image transformation is the same as the flow direction of optical flow , which is consistent with our inherent cognitive thinking. For example, use I 1 → I 2 I_1\rightarrow I_2I1I2The optical flow will be I 1 I_1I1Transform to I 2 I_2I2perspective, this is a forward warp;
insert image description here

​The idea of ​​forward warp is to traverse each point , through the optical flow FS ource → D estination F_{Source\rightarrow Destination}source imagep_sourceFSourceDestinationThe coordinate offset at the midpoint is obtained p_sourceby projecting its pixel value into . If the coordinates are not integers, it is generally rounded up or the nearest neighbor is selected . Its simple implementation code is as follows:destination imagep_destinationp_destination

im1 = torch.zeros_like(im0)
B = im0.shape[0]
H = im0.shape[2]
W = im0.shape[3]
round_flow = torch.round(flow)
for b in range(B):
    # 遍历source image中的每个点p_source
	for h in range(H):
		for w in range(W):
            # 获取投影到destination image中的对应坐标点p_destination,采用四舍五入取整
			x = w + int(round_flow[b, h, w, 0])
			y = h + int(round_flow[b, h, w, 1])
            # 判断映射位置是否在有效范围内,若在则赋值,否则保留zero
			if x >= 0 and x < W and y >= 0 and y < H:
				im1[b, :, y, x] = im0[b, :, h, w]
return im1

​The implementation and idea of ​​forward warp are very simple, but it also brings many problems:

  • Hole problem: The mapping relationship of forward warp is neither injective nor surjective, but a discretized random mapping. This will lead to destination imagesome positions in which there are nosource image points projected from it have multiplesource image points projected from Its solution is considered as follows:

    • Multi-point mapping: For the situation where multiple points are mapped to one point at the same time, we can choose the point with the largest "movement range" , because the "movement range" is usually the foreground, the foreground moves more, and the foreground is mainly reserved .
    • Pointless mapping: This situation is generally considered to be handled by interpolation. However, as the projected frame, the points at each position are not uniformly distributed. It is difficult to directly use bilinear interpolation algorithmdestination image that requires a fixed linear position. The easiest way is to directly use the nearest neighbor or round up.

    Of course, there are also some people who specialize in how to solve the problem of forward warp voids, and have proposed many new ideas and methods, such as the article softmax-splatting .

  • Technical limitations: Because the nearest neighbor or rounding method is often used to take points in the forward warp, this makes it impossible to use the backpropagation algorithm in the forward warp, making its application in deep learning not obvious.

  • Thread safety: If it is in a multi-threaded and parallel scenario, the CUDA version still has thread safety issues, which affect the speed

2.backward warp

​If there are two frames of adjacent images (or left and right) I 1 , I 2 I_1,I_2I1,I2,对应 I 2 → I 1 I_2 \rightarrow I_1 I2I1The optical flow Flow is F 2 → 1 F_{2\rightarrow1}F21, then under the backward warp operation, the pixel point I 2 ( x , y ) I_2(x,y) of the second frame imageI2(x,y ) will be on the first frame of the imageI 1 ( ( x , y ) + F 2 → 1 ) I_1((x,y)+F_{2\rightarrow1})I1((x,y)+F21) Find the value within the coordinate position range (equivalent to I 1 I_1I1is known, I 2 I_2I2是未知的),即I 2 ( x , y ) = I 1 ( x + Δ x , y + Δ y ) I_2(x,y) = I_1(x+\Delta x,y+\Delta y)I2(x,y)=I1(x+Δx , _y+Δ y ),其中F 2 → 1 ( x , y ) = ( Δ x , Δ y ) F_{2\rightarrow1}(x,y) = (\Delta x,\Delta y)F21(x,y)=( Δ x ,Δy ) . _ Notethe situation that the mapping coordinates are not integerswhen looking for values, but becauseI 1 I_1I1is known, I 2 I_2I2is unknown, we are for I 2 I_2I2Each coordinate ( x , y ) of (x,y)(x,y ) atI 1 I_1I1( x + Δ x , y + Δ y ) (x+\Delta x,y+\Delta y)(x+Δx , _y+Δ y ) , socoordinates that are not integers can befoundI 1 I_1I1The calculation is approximated by bilinear interpolation within the range , so the hole problem of forward warp will not occur.

​Backward warp can be understood as "backward deformation/distortion", that is, the image transformation direction is opposite to the flow direction of the optical flow , which is exactly different from our inherent cognitive thinking. For example, use I 2 → I 1 I_2\rightarrow I_1I2I1The optical flow will be I 1 I_1I1Transform to I 2 I_2I2; or use I 1 → I 2 I_1\rightarrow I_2I1I2The optical flow, will I 2 I_2I2Transform to I 1 I_1I1perspective, this is a kind of backward warp;
insert image description here

​The idea of ​​backward warp is to traverse each point in the traversal , through the optical flow FD estination → Source F_{Destination\rightarrow Source}destination imagep_destinationFDestinationSourceComputes the corresponding point of a point p_destinationin such that the pixel value of is equal to the value of . If the coordinates are not integers, bilinear interpolation is generally used to approximate the calculation ( it is known and referenced), so there will be no problems. Its simple implementation code is as follows:Source imagep_sourcep_destinationp_sourcep_sourceSource ImageForward Warping

def backward_warp(self, x, flo):
        """
        warp an image/tensor (im2) back to im1, according to the optical flow(im1->im2)
        x: [B, C, H, W] (im2)
        flo: [B, 2, H, W] flow
        """
        B, C, H, W = x.size()
        # mesh grid 
        xx = torch.arange(0, W).view(1,-1).repeat(H,1)
        yy = torch.arange(0, H).view(-1,1).repeat(1,W)
        xx = xx.view(1,1,H,W).repeat(B,1,1,1)
        yy = yy.view(1,1,H,W).repeat(B,1,1,1)
        grid = torch.cat((xx,yy),1).float()

        if x.is_cuda:
            grid = grid.cuda()
        vgrid = Variable(grid) + flo

        # scale grid to [-1,1] 
        vgrid[:,0,:,:] = 2.0*vgrid[:,0,:,:].clone() / max(W-1,1)-1.0
        vgrid[:,1,:,:] = 2.0*vgrid[:,1,:,:].clone() / max(H-1,1)-1.0

        vgrid = vgrid.permute(0,2,3,1)  
        # 双线性插值 functional.grid_sample
        output = nn.functional.grid_sample(x, vgrid)
        # implementational hack in PyTorch0.2 for the warping function.
        mask = torch.autograd.Variable(torch.ones(x.size())).cuda()
        mask = nn.functional.grid_sample(mask, vgrid)

        # if W==128:
            # np.save('mask.npy', mask.cpu().data.numpy())
            # np.save('warp.npy', output.cpu().data.numpy())
        
        mask[mask<0.9999] = 0
        mask[mask>0] = 1
        
        return output*mask

​The implementation and ideas of backward warp are also very clear, mainly how to use interpolation reasonably to process data. Backward warp also has some advantages and disadvantages as follows:

  • Algorithm optimization: backward warp introduces various interpolation to calculate coordinates , which will not cause the image to be split or cause holes, and backward warp can perform backpropagation calculations , which makes the application of backward warp more extensive and simple, especially in depth in the field of study.
  • Ghosting problem: backward warp will cause ghosting problems in the image . This is because this non-order-preserving transformation will cause the positional relationship between pixels to change before and after mapping. When the foreground and background have relative motion, there will be heavy The essential reason for this ambiguity and invalid information is the occlusion formed by relative motion . A general solution to the ghosting problem is to introduce an occlusion mask to refine the image .

The reference articles are as follows:

4. Application of optical flow

​Optical flow is widely used in scenes such as video frame interpolation, occlusion detection, motion estimation, and some indicator calculations (such as the analysis of cardiac strain indicators in medicine). Optical flow technology has achieved good results in these applications. And with the continuous development, in addition to the traditional optical flow technology, some optical flow technologies based on deep learning are also maturing, bringing more and more convenience to people's lives.

Guess you like

Origin blog.csdn.net/qq_40772692/article/details/128743758