论文学习——《Good View Hunting: Learning Photo Composition from Dense View Pairs》

This article contribution

1. Create a large set of data --Comparative Photo Composition (CPC) dataset;

It proposed a novel framework for knowledge transfer training models real-time VPN anchor box (view proposal model) based;

The first to use Siamese architecture view of a view on training evaluation model, then we will model as a teacher for scoring candidate anchor box on a variety of images, in these ratings will depend on the teachers training for students VPN model to output the same anchor box score rankings. To train students, we made an average of two from both error (MPSE) loss.

VPN model: an image as input, and outputs a list of predefined anchor block corresponding to the score.

Training View Proposal Networks

This paper presents a framework for knowledge transfer, under the supervision of teachers model VEN (View Evaluation Net) will View Proposal Net (VPN) training for student model. VEN, it takes a view as input, and predicted scores composed, so this directly in our CPC training data set. In order to transfer knowledge, we run on a given image VEN anchor box, then use the predicted scores, the average new pair squared error (MPSE) loss training VPN. 

 

 VPN:SSD+MultiBox

Backbone network is based on the SSD (truncated after Conv9) in the top of the backbone network, we added a convolution layer, a reservoir layer, and a mean whole connecting layer, the N output points, corresponding to N predefined anchor box. We advance through intensive slide in different proportions and normalized image aspect ratio is defined anchor box set, the result set of N = 895 a predefined anchor box.

VEN: Siamese structure

We use the training structure VEN Siamese, Siamese structure is made of two heavy composition VEN sharing rights, each output score image corresponding to the input image. Herein VEN is based VGG16 (truncated maximum pooled after the last layer), which contains two new fully connected (FC) and a layer of new output layer. Since our model only output a ranking score, instead of the probability distribution of the plurality of classes 1000, so we will FC channel layer are reduced to 512 and 1024.

Guess you like

Origin www.cnblogs.com/yuehouse/p/11791327.html