1. The use of scenarios
A visual scene for mass identification, i.e., given a landmark image accurately recognize the position of the image.
2. innovation
He proposed a convolutional network for training end scene recognition task, an image recall VLAD layer;
made weak supervise ordering loss;
very good.
3. algorithm principle
3.1 overall network structure
3.2 NetVLAD
Vector of Locally Aggregated Descriptors (VLAD) is a descriptive pool method for extracting information on statistical descriptor of the partial polymerization. Whereas bag-of-visual-words , VLAD stores the sum of residuals (difference vector between the descriptor and its corresponding cluster centre) for each visual word.
Given N D-dimensional partial image descriptors as input, K is defined a poly cluster center as a parameter VLAD, VLAD layer wherein V is the output image dimensions. Wherein the element is calculated as: