Video object segmentation double pyramid network-based study notes

Key words 

Label use of machine learning methods spread

Algorithm previously proposed

1. The time for space and time intensive bilateral filtering network.
2. Only one mask is refined to a still image by a trained network depth, and in a test using the test video in the first frame memory to the target
Appearance (ie online fine-tuning), so as to enhance performance.
3. to achieve higher accuracy enhanced segmentation strategies through large amounts of data.
 
The greatest advantage of the pyramids is not fine-tune the convolution method
Below is the basic idea
  
 
 
Network segmentation is based on VGG16
[16] The full convolution networks, and, in addition to
An outer convolution of all layers before adding four VGG16 modulation operation, a specific tone
And molding process parameters are described below.
A modulator for dividing the visual network corresponding to a given target appearance, from which
Note given frame to extract semantic information, e.g., type, color, shape and texture
Li, and generates the scale parameter of the corresponding channel to adjust the characteristics of the different channels of FIG.
Weights repositioned to a location such that a given target [14] in the dividing network. this
As used herein VGG16 neural network model modulator visual image of the first frame
Cutting around the target size of 224 × 224 pixels as input, and modifies the final
Classification for one layer, to match the number of parameters of the network division modulation layer.
All parameters are multiplied by the visual modulation characteristic diagram showing the concrete expression of the formula (1)
Shows.
Spatial modulator generates pixel level offset parameter of the target object is to
Provide a rough position of a priori information on the object. FIG herein to generate two-dimensional thermal prediction mask on the previous frame, thereby obtaining a
To give a rough estimate of the target position, then as the spatial modulator input. for
FIG match different features of segmentation in a network resolution, the two spatial modulator
FIG thermal dimensional sampled at different scales, and thus obtain a layer corresponding to each convolution
Spatial displacement parameters, wherein FIG spatial modulation parameters and the respective layers are added
It now appears that the method of this article is based on a method vgg16, each with two matrices, representing the visual parameters, and offset parameters
Then according to formula
After Fcn determined, after using the full convolution neural network
Qualitative results for film paper
2.5 qualitative results
In Figure 3, this shows the proposed method of blocking a portion (FIG.
3 (a)), cluttered background (FIG. 3 (b)), the motion blur (FIG. 3 (c)) and
FIG effect test on kite-surf sequence (FIG. 3 (d)). Partially occluded
Case only division without being blocked target portion, to be cluttered background
To target and background in the case of similar target separation, motion blur needs
Vague target site be more finely divided. In the above case the algorithm
Under can accurately segment the given target, especially in the kite-surf sequence may be
Divided to more accurately target small FIG. Can be seen that smaller from FIG. 3 (d)
The real target segmentation map labels are still some gaps, how to be more fully
Using local information (such as some feature points of the region of interest) and the whole
Board information (such as region of interest categories, color and texture semantic information)
It will be one of the following research directions.

Guess you like

Origin www.cnblogs.com/coolwx/p/11517306.html