Multi-view Deep Network for Cross-view Classification

Multi-view Deep Network for Cross-view Classification

CVPR 2016 IEEE Conference on Computer Vision and Pattern Recognition
Meina Kan1,2 Shiguang Shan1,2 Xilin Chen1
1Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
2CAS Center for Excellence in Brain Science and Intelligence Technology {kanmeina, sgshan, xlchen}@ict.ac.cn

Summary

Cross-view recognition is an important problem in computer vision, which is mainly to classify samples between different views. The huge difference between the different views makes this problem quite challenging. In order to eliminate complex (even highly non-linear) viewpoint differences to obtain good cross-view recognition, we propose a multi-view deep network (MvDN), which seeks to share between multiple viewpoints Non-linear discrimination and viewpoint invariant representation. Specifically, our proposed MvDN network consists of two sub-networks. The view-specific sub-network attempts to eliminate view-specific changes, and the following public sub-network attempts to obtain a common representation shared by all views. As the goal of the MvDN network, the Fisher loss, namely the Rayleigh quotient goal, is calculated from the samples of all views to guide the learning of the entire network. Therefore, the notation from the top level of the MvDN network is robust to differences and distinguishable. Cross-pose and cross-feature face recognition experiments on 3 datasets of 13 and 2 views respectively show the superiority of this method, especially compared with typical linear methods.
Insert picture description here
The above picture is the author's main idea, including two sub-networks. The first network is the view-specific sub-network, and the other is the common sub-network gc. The fi of the first network, the sub-network unique to the i-th view, is responsible for eliminating the specific information of the i-th view, and the gc, the common sub-network shared by all views, further extracts the discriminative representation shared by all views.

introduction (daily diss)

CCA: CCA is only applicable to dual-view scenarios.
MCCA: By maximizing the total correlation between any two views, the v view transformation corresponding to each view can be obtained. Although the view difference can be minimized by the above method, the discriminant information such as class label is not explicitly considered, which is not conducive to identification and classification.
MVDA (multi-view discriminant analysis): These discriminant methods benefit from supervised information and are usually better than unsupervised information. Most of these methods are linear and may not be sufficient for challenging scenarios.
KCCA: Kernel function is too difficult to find
DCCA: better than kcca. But it is also unsupervised.
In general, many studies have solved the problem of how to deal with viewpoint differences for viewpoint recognition or viewpoint classification. However, they are linear and cannot handle challenging non-linear scenes well. Unsupervised depth methods cannot identify them, or kernelized supervised methods may suck out sample problems. In order to solve these problems, we propose an explicit nonlinear supervision method-multi-view deep network.
The multi-viewpoint deep network (MvDN) we propose uses a deep architecture to process the differences and differences of viewpoints at the same time, so as to realize the description between multiple viewpoints and the viewpoint-invariant representation. Specifically, our proposed MvDN includes two sub-networks, a view-specific sub-network and a common sub-network shared by all views. The Rayleigh quotient target of all view samples is used to ensure the discrimination of the entire network. Therefore, the features from the top level of MvDN indicate that they are robust to viewing changes.
Insert picture description here
This is the network optimization function, which is actually the optimization goal of many papers. SyB represents the gap between classes, and SyW represents the gap within classes. Of course, a good classification must be the smaller the gap within the class, the better, and the larger the gap between classes, the better.
Insert picture description here
This is the explanation of the gap within the class. It is to find the difference between each sample of each class and the average of all samples of this class (ignoring the view), and then add up all the classes to get the total intra-class spacing.
Insert picture description here
This is the inter-class spacing. nk refers to how many classes there are in each class. Then multiply this nk by the average value of the current class and subtract the average value of all samples (that is, all samples of all views of all classes), and then get Out.
Then the next step is optimization

Calculation

First, randomly assign network weights, then forward propagation to find loss,
Insert picture description here
and then calculate the derivative of J to Y.
Insert picture description here
Here is a mathematical transformation. It doesn't matter if you don't understand it. . . It would be troublesome to explain. . .
Then the next step is to calculate the gradient of Y to the gc network.
Insert picture description here
After the natural gc network, it is to calculate the gradient of Y to the fi network
Insert picture description here
and finally use BFGS to optimize the entire network.
Insert picture description here
Insert picture description here
In fact, it is the BP back propagation process. Finally, the entire network is trained.
Then you can use this trained network to classify the test machine.

Guess you like

Origin blog.csdn.net/Asure_AI/article/details/102826691