[Video super resolution] ESPCN paper notes

  ESPCN is twitter2017年提出来的实时视频超分辨率的方法. Some understanding of the papers the following records.

 

The above image is the structure of the entire network. The input image adjacent frames t, t-1 and t + 1, in a particular network, successive input sheets 3, 5, 7 and 9, there is the effect of specific impact analysis in the paper, here are three example. Calculating pixel by pixel displacement of an image t-1 and t LR frames (motion estimation through a network), and then apply this displacement in LR image t-1 above, to give the warp over the image t-1. The warp had t-1 / t + 1 with a space-time network and the t input image, the SR image to obtain a final single frame t.

So are the two main networks, a motion estimation a spatio-temporal.

First introduced motion estimation.

 

Do this in two steps, a coarse do first of flow forecasting, prediction do fine flow, so you can use a small amount of computation to calculate a larger displacement. It is also commonly used in video SR in practice.

Then introduce another important World Network.

World Network of things to do like this, the LR image synthesis few different frame of a SR, so the integration of both time also interpolated in space.

The figure describes several common temporal network

a) early fusion 

N input images concat up and then do a convolution filter n channel, so put all the pictures in the first layer to integrate the

b) slow fusion

Not fused all pictures at the time of the first layer, such as shown in the drawings, each of two adjacent only the fusion, which is slow fusion

c) 3D convolution

We will first output a first enlarged view of convolution feature map layer, a channel will find that each two adjacent frames is obtained

 

Let's equivalent to the common 3D convolution.

 Here, 3D conv can be seen as slow fusion weights shared version, simply swap a bit on the temporal and spatial, explain a little.

3D convolution的第一层的第一个feature map,可以看做是slow fusion的第一层的所有feature map的channel 1concat起来的。所不同的是,需要slow fusion中的同一层的weight要一样(即权值共享)。这种权值共享的slow fusion一大好处就是计算力比较省,因为可以复用之前的结果。比如t+2 t+1 t t-1 t-2和t+1 t t-1 t-2 t-3其实中间有一些结果是可以复用的。

具体的结果和分析可以去看论文。 

Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation (CVPR 2017)
See https://arxiv.org/abs/1611.05250

Guess you like

Origin www.cnblogs.com/sunny-li/p/11033140.html