A brief introduction to Optical Flow Guided Feature (OFF)

The code for this paper was initially only available in the Caffe version, and the pytorch version is attached here.

JoeHEZHAO/Optical-Flow-Guided-Feature-Pytorch: Optical Flow Guided Feature for Action Recognition-Pytorch (github.com) icon-default.png?t=M85Bhttps://github.com/JoeHEZHAO/Optical-Flow-Guided-Feature-Pytorch is required for some reasons I will dismantle the OFF part of the code introduced in this paper, so I will first cover the relevant parts of the paper, and then explain further. The level is limited and the expression is inaccurate. Please forgive me.

Paper portal:

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition (thecvf.com) icon-default.png?t=M85Bhttps://openaccess.thecvf.com/content_cvpr_2018/papers/Sun_Optical_Flow_Guided_CVPR_2018_paper.pdf For traditional CNN, when extracting features from videos It is more difficult because videos are different from images, and spatial features + temporal features are more representative of videos. This paper is based on the network structure of TSN. On this basis, the author designed an OFF Unit to extract time dimension features. If you are interested in the overall structure of the network, you can check the paper. Here I will only introduce the structure and code of the OFF Unit. part.

First, let’s take a look at the position of OFF Unit in the overall network structure:

Figure 1 Paper network structure

In the figure above, there are two subnetworks for feature extraction, which extract features at different time periods. An OFF subnetwork composed of OFF Units extracts time information through the above two subnetworks, and finally performs classification through the fusion of the Class Score of each subnetwork. Next, take a closer look at the structure diagram of OFF Unit:

Figure 2 OFF Unit structure 

In the paper, the combination of Sobel operator and Subtract is called OFF, and together with the previous 1*1 convolution, it forms the OFF layer. The features are convolved twice through the OFF Unit. One branch uses the Sobel operator to extract spatial features, and the element-wise subtraction (Subtract) operation is used to extract temporal information. Combined with Figure 1, the information coming out of the OFF Unit enters the next module through ReseNet.

The basic code of the OFF module is reflected in the code. It does not implement OFF separately as a Class. The paper contains multiple OFFs. Some OFFs have different input channel numbers and sizes when performing convolution. Here Just take motion_3a as an example. As for how to join other network structures, further calculation and design of the shapes inside need to be carried out, otherwise problems such as inconsistent dimensions will occur. Here, OFF is implemented as a Class. The relevant code is as follows: Please correct me if there are any errors.

from __future__ import print_function, division, absolute_import
import torch
import torch.nn as nn
import torch.utils.model_zoo as model_zoo
import os
import sys
from torch.autograd import Variable
from util import SobelFilter, SobelFilter_Diagonal
from basic_ops import *
import pdb

class OFFNet(nn.Module):
    
    def __init__(self, batch, length, in_channels, h, w):
        super(OFFNet, self).__init__()
        self.batch = batch
        self.length = length

        self.motion_conv_gen = nn.Conv2d(in_channels[0], 128, kernel_size=(1, 1), stride=(1,1))
        self.motion_spatial_down = nn.Conv2d(in_channels[1], 32, kernel_size=(1,1), stride=(1,1))
        self.motion_spatial_grad = nn.Conv2d(in_channels[2], 32, kernel_size=(3,3), stride=(1,1), padding=(1,1), groups=32, bias=True)
        self.motion_relu_gen = nn.ReLU()
        self.dropout = nn.Dropout(p=0.8)


    def forward(self, x):
        # print(x.shape)
        # motion operating on [batch * length, c, h, w] level
        # motion_conv_gen = self.motion_conv_gen(x)
        motion_conv_gen = self.motion_conv_gen(x)
        motion_relu_gen = self.motion_relu_gen(motion_conv_gen)
        channel_size = motion_relu_gen.shape[1] # 
        reshape_rgb_frames = motion_relu_gen.view(self.batch, -1, self.h, self.w)
        # print(reshape_rgb_frames.shape)
        last_frames = reshape_rgb_frames[:, channel_size:, :, :]
        # print(last_frames.shape)
        first_frames = reshape_rgb_frames[:, :-channel_size, :, :]
        # print(first_frames.shape)
        eltwise_motion = last_frames - first_frames
        # print(eltwise_motion.shape)
        # convert back to [batch * (time - 1), c, h, w]
        temporal_grad = eltwise_motion.view(-1, channel_size, self.h, self.w) 
       
        spatial_frames = x[:self.batch * (self.length - 1), :, :, :]
        # downgrade dimension to 32
        spatial_down = self.motion_spatial_down(spatial_frames) 
        spatial_grad = self.motion_spatial_grad(spatial_down)
        spatial_grad = self.dropout(spatial_grad)
        # print(spatial_grad.shape)
        # concatenate temporal and spatial dimension
        # import pdb;pdb.set_trace()
        # print(spatial_grad.shape)
        # print(temporal_grad.shape)
        motion = torch.cat((spatial_grad, temporal_grad), dim=1)
        return motion



#in_channels[motion_3a:[256,256,32],
#            motion_3b:[320,320,32],
#            motion_3c:[576,576,32],
#            motion_4a:[576,576,32],
#            motion_4b:[576,576,32],
#            motion_4c:[608,608,32],
#            motion_4d:[608,608,32],
#            motion_5a:[1024,1024,32],
#            motion_5b:[1024,1024,32]
#            ]
          ]

There are comments in the code, which can be read corresponding to the structure of the OFF layer. Since the number of input channels is inconsistent when OFF is used for convolution in the paper, the number of input channels is passed in as a parameter here to facilitate adding the module to other modules. The number of output channels in this paper is the same and no further processing is performed. It can also be changed as a parameter. The bottom one is the number of input channels for all OFF operations in this paper. As for the sobel operator, etc., if you want to know more, you can check the relevant information by yourself.

Guess you like

Origin blog.csdn.net/Mr___WQ/article/details/127106835