Interpretation CVPR 2019 paper | Figure filtering semi-supervised learning method has high utilization label

FIG tag having a high filtering efficiency semi-supervised learning method

13894005-0a22b9835bcbebbe.jpg

This article interpretation of the direction of the semi-supervised learning paper received a CVPR2019 Assembly, click here to jump to view the original arXiv. Before AI Yanxishe live CVPR papers share the meeting, Wu Xiaoming teacher co-author made a site to share, you can be the first to jump into the video playback 1:59:23 at the scene to listen to Wu explained. Based on the original paper and the teacher's PPT content of the article was re-combing and interpretation, moderate levels of detail, we want to help.

"Label-Efficient Semi-Supervised via Graph Learning Filtering" article studied the semi-supervised learning problem, the authors propose a semi-supervised learning framework based on map filtering and using the framework of two typical view of semi-supervised classification LP GCN improved and, at the same time not only a new method of using the connection information and the feature information of the node of FIG., but also improve the efficiency of the label, the proposed method is improved in all the experimental data sets have achieved the best results . Author unified framework with a view of the filter looks completely different LP and GCN, its "low pass filter" point of view succinctly explained the reason these two methods work in practice, increasing the researchers recognized for such methods knowledge level. In the following directories are arranged below, you can click on the link to jump quickly to some of their concerns:

1. Preliminaries

1.1 target problem

1.2 spectrum analysis chart

1.3 convolution filtering FIG.

1.4 Filtering and semi-supervised classification FIG.

2. Motivation

3. Methods

3.1 and FIG LP filter

1) original LP algorithm

2) LP filter in the perspective of FIG.

3) GLP Generalized label propagation

3.2 Filtering and FIG GCN

1) Original GCN

2) GCN Perspective FIG filter

3) Plus FIG filter GCN

4. Implementation and verification

4.1 smoothing intensity and computing performance

4.2 Comparison semi-supervised classification task

4.3 Zero-Shot semi-supervised regression comparison task

5. Summary

1. Preliminaries

1.1 target problem

FIG i.e. semi-supervised classification given graph G, and the signal matrix X and the node label matrix Y (sparse matrix, only a portion of the node label) required output label matrix Y '(each node given category), the following Example FIG. :

13894005-110b97c99f96f64d.jpg

1,3,4 nodes in the graph that there is a node marked, the node 2 as node unlabeled. Each node corresponds to a feature vector x_i X I , semi-supervised classification feature is the use of objects and labels of all nodes labeled node to predict the category node (Node 2) of unlabeled.

FIG semi-supervised classification based on the similarity between the distribution node label, the higher the similarity between the nodes, which is the same class the higher the probability, so they may be given to one of the nodes to another node of the category. Implicit expression node only is reflected by the similarity of the structure of FIG connection, the nodes may also be features, so only the connection information and characteristic information to get a better utilization of the node characterized similarity, thereby improving the accuracy of label allocation .

1.2 spectrum analysis chart

Can be used without a directed graph Laplacian matrix L represents, characterized in decomposition of L can be obtained for a set of orthonormal eigenvectors of (and may constitute an orthonormal basis orthogonal matrix), each feature vector \ phi_i [Phi] i has a corresponding eigenvalue \ lambda_i [lambda] i . FIG field of signal processing, which is called orthonormal basis of FIG Fourier group (Fourier Basis) corresponding to the signal characteristic value representative of the group of frequencies (Frequency) . Related defined below formula:

13894005-7e8a138027f6081c.jpg

Wherein W is the adjacency matrix, D is a matrix \ PhiΦ L eigenvectors are arrayed in columns of an orthogonal matrix, \ LambdaΛ feature vector is composed of L diagonal matrix. It is noteworthy that, Laplace matrix has a variety of definitions, the definition of the first formula above only the most primitive kind. Another Laplace normalized and normalized symmetric Laplacian is defined as follows:

13894005-a4dd903ec494e0f1.jpg

They differ mainly in the "symmetry" and "normalization", the difference is not great, in order to avoid confusing the reader here first so they are listed. FIG frequency of the signal reflects the degree of change in its low-frequency signal smoother, the signal values ​​of neighboring nodes high similarity, the high-frequency signal varies more severe, the signal values ​​of neighboring nodes may be different. Spectral analysis purposes is to decompose the complex signal into a certain relatively simple linear combination of the base signal, by studying the properties of these signals and their simple proportion in the original signal will be able to infer the nature of the original signal, which is a complex of to simplify, it reflects of much smaller deconstruction philosophy. For example, a one-dimensional Fourier transform to decompose a one-dimensional signal as a linear combination of the complex exponential function (discrete signals) or the integral sum (continuous signal). Common two-dimensional image and the DCT transform image is decomposed into a linear combination of cosine basis signals.

Similarly, in this concept to FIG signal processing, the signal becomes the base Laplacian eigenvector matrix \ phi_i [Phi] I , arbitrarily FIG signal can be expressed as \ phi_i [Phi] I a linear combination (a linear combination of orthonormal basis vectors that can represent all the n-dimensional space).

13894005-cee71a3547a39cad.jpg

FIG signal (graph signal) is a signal, the figure defined in each node on the map corresponds to a real value, the vector may be F \ in ^ {n-R & lt \ Times. 1} FR & lt n- ×. 1 FIG represents a signal, wherein n represents the number of nodes of FIG. FIG multivariate signal or signals may be multi-dimensional matrix of FIG. X-\ in ^ {n-R & lt \ Times m} X-R & lt n- × m represents, at this time each figure corresponds to a node of a length m dimensional vector, the feature vector representation of the node.

1.3 convolution filtering FIG.

The so-called "filtering" refers to a particular filtered signal spectrum component, for example a common low-pass filter refers to preserve the low-frequency component signal and filter out high-frequency components. Filtering may be seen as a signal mapping function that receives the original signal and a filtered output signal. Typically by filtering in the filter is implemented by multiplying the original signal, the paper defines the following FIG convolution filter :

13894005-45b8f7ac36eef1e0.jpg

Wherein the function P (\ lambda_i) P ( [lambda] I ) is referred to as a frequency response function . Convolution of this filter is convolved FIG signal f can be obtained:

13894005-5fbe0d3911075f8a.png

Can be found, the filtered signal \ bar F F each group ¯ signal in \ phi_i [Phi] I ratio C_i C I are the function P (\ lambda_i) P ( [lambda] I ) scaled. Frequency response function in response to the filter control group of signals of different frequencies, we can design different frequency response filtering properties meet the demand function of different scenarios.

1.4 Filtering and semi-supervised classification FIG.

The basic assumption is semi-supervised classification map "adjacent node labels like" This means that we want to have around the mark signal node graph is smooth, low frequency, we expect to learn a signal representing low frequency, it should be used with low pass filter and the frequency response function properties. Author is to use this "low-pass filter map" perspective uniform interpretation of GCN and based on this LP and make effective improvements.

2. Motivation

Classic Label Propagation algorithm can only use structure diagram, you can not use the feature node. The new Graph Convolution Network Label to need a lot of training and validation, but the number of labels semi-supervised learning task is small, it may be difficult to work under the GCN less label data, the label because of its efficiency is low of. In order to solve the problem LP nodes can not use features, the authors made improvements to its proposed GLP. To address high GCN model complexity and low utilization of the label, the author had changed its hand raised IGCN. Both methods are proposed in the framework of FIG filtering, it may be collectively explained theoretically, in practice, also has a good effect.

3. The method of introduction

3.1 and FIG LP filter

1) original LP algorithm

Label propagation algorithm (Lable Propagation) referred LP, it is a simple and effective method Figure semi-supervised learning, scientific research and industrial practice have been widely used. The object is to obtain an LP can meet real label matrix Y while smooth enough prediction / embedded matrix Z. In Fig. Optimization goal LP model are as follows:

13894005-0cd2f9732722adb7.png

Matrix wherein the Z \ {^ in n-R & lt \ Times L} the ZR & lt n- × L , where L L is the number of categories, each row of the matrix represents the probability of a multi-class prediction node (node may be considered embedding vector). The first prediction error constraint matrix optimization target and the true label matrix as small as possible, the second target matrix bound the Z the Z transform on the graph as smooth as possible, is called Laplacian regularization term, regular intensity parameter \ Alpha [alpha] control. This is an unconstrained quadratic optimization problem, there are analytical solutions:

13894005-e4fe6b1818f9dacd.jpg

To give the Z the Z Thereafter, each row can be selected directly reach a maximum value as a class of category node or more step operation column normalized before.

2) LP filter in the perspective of FIG.

When viewed from the perspective of FIG filter, LP algorithm can be divided into three parts:

13894005-b8add58273902110.jpg

To meet the above definition of the convolution filter, called autoregressive (Auto-Regressive) filter whose response function is activated:

13894005-e16977fafa410927.jpg

From the chart, autoregressive low-pass filter is obviously to suppress high frequency, the intensity with which the smoothing \ Alpha [alpha] increases increases.

13894005-07a5cba3dcc5499b.jpg

3) the promotion of the label propagation algorithm GLP

The three major components of the LP's Perspective view of the filter is generalized generalization proposed generalized label propagation algorithm (Generalized Label Propagation) GLP. Specific promotion are as follows:

1. FIG signal: using a feature vector composed of all the nodes as an input signal feature matrix X

2. Filter: meet the definition can be any low pass filter convolution FIG.

3. Classifier: belt may be fitted on any classifier training node label characterizing obtained

This promotion very natural, very easy to understand, the key step is a node feature matrix low-pass filter in FIG. This step can be regarded as feature extraction process, of low-pass filter to extract a smoother characterization data, and taking into account the link information and node information of FIG feature, make up the defects of the original node can not take advantage LP features. The filter and classifier promote generalization of the algorithm greatly increases the flexibility of the GLP can easily be integrated into different problem areas.

3.2 Filtering and FIG GCN

1) Original GCN

GCN is one definition of a network diagram convolution in the spectrum, it is simple and effective, has excellent performance in the semi-supervised classification. First, the adjacency matrix GCN was added and self-ligation to obtain normalized symmetric renormalization matrix \ tilde are W_s W is ~ S :

13894005-5d56e0976700ac01.jpg

The first equation is equivalent to the introduction of each node is connected to itself, the second formulas \ tilde are W_s W is ~ S is a matrix \ tilde are W is W is ~ a normalized symmetry of Laplace matrix. Based on \ tilde W_s W ~ S , the original author defined hierarchy GCN communication process is as follows:

13894005-d761cd197cf532db.jpg

Wherein ^ {H (t)} H ( t ) denotes the t-th input layer, \ Theta ^ {(t)} [Theta] ( t ) is a parameter to be learned in the t-th layer, \ Sigma [sigma] can be It is a neural network in a variety of activation function.

FIG one convolution to the left of the input signal is multiplied by a renormalization matrix \ tilde are W_s W is ~ S , then use the parameter matrix \ ThetaΘ for projection conversion, and finally activation function nonlinear transformation using . Finally, using the softmax activation function to classify a plurality of stacked layers in FIG convolution, such as double FIG convolution:

13894005-70ceaa917fa0257c.png

You can use this back-propagation algorithm GCN trained to get the final model.

2) GCN Perspective FIG filter

Can be seen from the equation bilayer GCN, GCN fact, constantly repeated 1) left by 2) 3 homography transformation) converting three activation steps. Wherein projection transform and transform activation is a common operation common in the neural network, and "left by" renormalization matrix can be seen in FIG filtering the input signal, since the matrix satisfies renormalization:

13894005-16b9a74a180b030b.jpg

Wherein \ tilde are L L ~ matrix \ tilde are W is W is ~ symmetrical normalized Laplacian. Filter \ tilde are W_s W is ~ S corresponding to a frequency response function as:

13894005-d1d3e0a1b8589cb9.jpg

If we double GCN in the sequence of three operations Alternatively little bit, all of the two on the first step of FIG filter layers, such as:

13894005-7c862da713716afd.png
13894005-8c1b644cb5a9ad48.png

Can be seen from the figure, the filter is a low pass filter in the interval [0,1], and it increases with the degree of low pass k increases. Therefore, the multilayer GCN more low pass filters, a single layer is larger than the intensity of the smoothing filter GCN.

13894005-639a34793182c227.jpg

In the perspective of FIG filtering, we can explain the GCN 1) use of normalized reasons of symmetry defined Laplace 2) using the renormalization techniques (add custom connection) with the nature of the frequency response function. Symmetrical normalization may Laplacian matrix eigenvalues ​​range limited between [0, 2], and a return to the range of techniques can be further reduced characteristic value, which causes the filter closer to the low-pass completely. Cora OF dataset on visualization, as shown on the front renormalization GCN frequency response function as shown:

13894005-843907b1597ac2ba.jpg

We can see that both single-layer or double-layer, return to the eigenvalues ​​of a matrix of Laplace are limited to less than 1.5, verify that the authors explain renormalization techniques for work in the application. In fact, it can prove renormalization technique characteristic value range is compressed to the Laplacian matrix section:

13894005-2fc54a85d6905ef2.png

Wherein D_m D m denotes the maximum degree of nodes in the graph (Degree) in, \ lambda_m [lambda] m is the maximum normalized symmetric eigenvalue of a Laplacian matrix.

3) Plus FIG filter GCN

Although a plurality of stacked layers may be increased GCN smoothing filter strength, but also introduces more parameters to be learned, it makes the network more data are needed to avoid over-fitting the training. To solve this problem the author proposes IGCN (Improved Graph Convolutional Network). IGCN GCN in the renormalization matrix replaced with all normalized filter, namely:

13894005-eb001f549e8c5e02.png

The authors say Filters:

13894005-3aed980e1c8434bf.jpg

Renormalization is (Renormalization) filter, the frequency response function is:

13894005-de854b9d708c0385.jpg

Obviously, IGCN can be directly controlled by adjusting the intensity of the filtered k smooth and does not introduce additional parameters to be learned, it can be maintained at a shallow level of the model, without too much data to achieve the desired training effect, improving the model label use efficiency.

4. Implementation and verification

4.1 smoothing intensity and computing performance

Whether or GLP IGCN, can be controlled by controlling the smoothing filter strength parameter or parameters k, how it should be adjusted according to the intensity smooth application scenario? The authors noted that tag data set of intensity of a smooth adjustment of the reference key, the label is small when the smoothing intensity ratio should be larger, so far no node label can be obtained similar features represented by the node label; when the label rate large, smooth intensity should be small, not too far from the propagation range of the tag, in order to maintain the diversity of characteristics. Use of different intensity RNM smoothing filter in Cora data set and to do the experiment done t-SNE visualization, the following results:

13894005-740feb6f35caa7b0.jpg

Can be found, along with efforts to smooth the increase in class cluster filtering results become more and more compact, more and more large cluster spacing, classification border has become more clear, at this time only a small amount of label can be classified, This visually explains the relationship between the intensity of the smoothing rate is provided to the data tag.

In addition to smoothing the intensity, also we discussed two filter computing performance . Since the AR filter according to a high operation cost matrix inverse operation, use of a polynomial of order k expand approximated to reduce computational complexity. For the order k RNM filter, Laplacian matrix can be used in practical applications are often sparse calculated acceleration characteristics. The author analyzes the computational complexity of the two filters in the text, in theory, intended to illustrate its usefulness.

4.2 semi-supervised classification task experimental comparison

Author citation network in the four data sets and a set of experimental data mapping knowledge semi-supervised classification task, and compared with many excellent models, including Youshua Bengio team of GAT. Node size of several data sets, the number of categories and the number of features are not the same, so the experimental arrangement can measure the performance of the model under various scenarios, more comprehensive. Each data set of statistical indicators in the following table:

13894005-0afb8d3c30941c7c.jpg

In order to facilitate understanding of the magnitude of the data, I use k (thousand), and w (million) a simplified representation of the data. In addition to the choice of different sizes of data sets, the data sets of a further set of different tags to observe the relationship between the strength and smoothness of the label, which is set as follows:

13894005-7e1f245d7897f706.jpg

Each model in the classification accuracy for each data set as follows:

13894005-5467f8f29719c85a.jpg

Experimental results show that the transformation by the method of FIG filtering performance is superior to all other scenarios the comparison model. But from the point of view of time on the run, GLP and IGCN's obviously much more time consuming, most noticeable on larger data sets NELL, but generally is within an acceptable range.

4.3 Zero-Shot semi-supervised regression experimental comparison task

In addition to the classification, GLP and IGCN can also be used for semi-supervised regression. CVPR paper in 2018 made use of GCN 0 sample picture identification (Zero-Shot image recognition) task, namely to use only the relationship between the text description and category category category is no training samples (Unseen Class) learning a visual classifier. The paper given at a known class to the classification of the premise, the use of embedded text description of each feature category and category relations as input, and a known classifier last layer target weight is trained a back layer 6 the GCN, GCN and then output as an unknown type of classifier last layer weights and the composition of the new weight classification predict all categories. In this paper, the Zero-Shot learning cleverly transformed into Figure semi-supervised regression problems, which makes many semi-supervised regression methods can be applied to 0 sample learning task, the proposed method (GCNZ) of the overall process as shown below:

13894005-3a6b137f6035911b.jpg

Use of GLP IGCN and replace the article GCN, and using a pre-trained network ResNet-50 sample experiment 0 AWA2 learning tasks in the data set, performance comparison results of the model as follows:

13894005-a68c5628b76c9cae.png

We can see, IGCN best. Since GCNZ In his already State-of-the-art level, and its core components is GCN, the good effect of GLP and IGCN off than GCN, so for their promotion in the 0 sample image recognition tasks are expected of the this experiment illustrates the application of the generalized method of filtering FIG. Author has released the source code to its github repository

5. Summary

Overall, the idea is to map the filter and concise and practical, although it looks simple but very profound understanding, there is a strong theoretical guidance and practical significance. The paper closed, just remember the "low-pass filtering," the word essence. Since the semi-supervised classification problem is the pursuit of smooth and low-pass, so the article on a low-pass filter, for example to show us a map filtering concise and powerful, but the view of the filter frame and ideas on the "non-low-pass," the problem is consistently applied, for example, we can design pass, high-pass filter to solve related problems. I think this paper is an important contribution to drawing semi-supervised learning opens the door, offers a broad line, I believe that based on FIG filter can do a lot of articles. Once again, I felt the importance of the principle of thinking and explore from this article, all of the improvement and upgrading are inseparable from the principles and operational mechanisms deeper understanding. I hope to see in the future CVPR more insightful, strong explanatory article.

I read as above @ May this sincere Original starting in AI Yanxishe CVPR team, I have tried to ensure that the correct interpretation point of view, precisely, but I scholarship dredging shallow after all, if there be deficiencies text welcome criticism. All methods of interpretation of the original authors of all.

Meanwhile CVPR 2019 Oral Selected Papers summary, CV worth of papers are here , waiting for you to look at Oh (Update) want to see the relevant documents and information? Click on the link below to visit the fast CVPR team it ~

https://ai.yanxishe.com/page/meeting/44

Reproduced in: https: //www.jianshu.com/p/dba417e506b0

Guess you like

Origin blog.csdn.net/weixin_34402090/article/details/91170265