Graph Convolution Network (GCN)

Table of contents

I. Introduction

2. GCN principle

3. GCN is used for node classification

4. Summary


I. Introduction

Before the emergence of graph neural networks, general neural networks could only process conventional European data, which was characterized by fixed arrangement rules and order of nodes, such as 2-dimensional grids and 1-dimensional sequences.

 In recent years, the application of deep learning to tasks related to processing graph-structured data has attracted increasing attention. The emergence of graph neural networks has made major breakthroughs in the above tasks, and has been widely used in fields such as social networks, natural language processing, computer vision and even life sciences. Graph neural network can regard actual problems as connection and message propagation problems between nodes in the graph, model the dependencies between nodes, and thus be able to handle graph structure data well.

In recent years, CNN has been widely used in many fields such as image recognition and natural language processing. However, it can only efficiently process regular Euclidean data such as grids and sequences, and cannot effectively process social multimedia networks. Non-Euclidean data in graph structures such as data, chemical composition structure data, biological protein data, and knowledge graph data. Because the structure of graphs is generally extremely irregular, the surrounding structure of each node may be unique, and its neighbor nodes and connected edges are different, so it is difficult for traditional CNN to process this type of graph structure data. Graph Convolution Neural Network (Graph Convolution Network, GCN) is an application of CNN on graph-structured non-Euclidean data. GCN is an application of graph neural network. An important branch, most existing graph algorithm models are basically derived based on GCN.

2. GCN principle

For a graph structure data setG, there are N nodes (nodes), each node has its own characteristics, we set these The characteristics of the node form a matrix of size , and represents the hidden state dimension of each node. In addition, the relationship between each node can also be extracted as a -sized relationship matrix, also called an adjacency matrix. and are the input features in the GCN model. N*DXDN*NAXA

X represents the node characteristics, each node has its own vector representation; A represents the graph structure characteristics, that is, the information of the edges between nodes.

A layered GCN based on layered propagation rulesL, the propagation method between layers is:

H^{(l+1)}=\sigma (\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{(l)}W^{(l)})

in:

\tilde{A}=A+I_{N}, I_{N} is an identity matrix of N dimensions. The reason why an identity matrix is ​​added is because the diagonal of the adjacency matrix A is all 0 (the node does not have a self-loop relationship), so when it is combined with the characteristic matrix < a i=4>The characteristics of the node itself will be ignored when multiplied. And adding an identity matrix to will make the diagonal elements become 1. HAI_{N}

\tilde{D} is the degree matrix of \tilde{A}, and the formula is: \tilde{D}_{ii}={\sum}_{j}\tilde{A}_{ij}. \tilde{A} is a matrix that has not been normalized. If it is directly multiplied with H, it will change the original distribution of the features. Because different nodes have different edge numbers and weights, and some nodes are connected to multiple edges, this results in the aggregated eigenvalues ​​of nodes with multiple edges being much larger than nodes with fewer edges. Therefore, \tilde{A} needs to be standardized so that each row of \tilde{A} adds up to 1, and \tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}} is symmetrical and normalized matrix.

W^{(l)} is the weight matrix of l layer.

\sigma (\cdot )Represents a nonlinear activation function, such as the ReLU function.

H^{(l)}\in \mathbb{R}^{N*D}second partllayer's very active square,H^{(0)}=X,H^{(L)}=Z.

GCN layers:

The number of layers of GCNL refers to the furthest distance that node features can propagate. For example, in a layer 1 GCN, each node can only obtain information from its neighbors. The process of collecting information is carried out independently for each node, but at the same time for all nodes.

When another layer is superimposed on the first layer, the process of aggregating information is repeated, but this time, the neighbor nodes already have the information of their own neighbors (from the previous step), which makes the number of layers each node can Maximum stride taken. However, the GCN author has shown through experiments that the number of GCN layers should not be too many. 2-3 layers are enough. Too many layers will be counterproductive.

3. GCN is used for node classification

The GCN in the above figure inputs a graph. After passing the L layer GCN, the characteristics of each node change from X to < a i=3>. represents the label of the node. WITHAND

Suppose we construct a two-layer GCN, and the activation functions use ReLU and Softmax respectively, then the overall forward propagation formula is:

Z=f(X,A)=softmax(\hat{A}ReLU(\hat{A}XW^{(0)})W^{(1)})

Among them,\hat{A}=\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}.

Then, the cross entropy loss function is calculated for all labeled nodes:

In this way, a GCN model for node classification can be trained even if there are only few nodes with labels. The authors call their method Semi-supervised classification< /span> (Semi-Supervised node classification). In addition, by changing the loss function, GCN can also be used for tasks such as link prediction.

4. Summary

The proposal of GCN is a milestone in the field of graph task processing. The original author's experiments show that even if randomly initialized parametersIN are used, the node features extracted by GCN are very good. The clustering results are comparable to the node features obtained by DeepWalk and node2vec algorithms after complex training.

GCN is actually a special form of Laplacian smoothing. The main idea of ​​GCN is to take the weighted average of the characteristics of all neighbor nodes (including its own node). Nodes with low degrees receive greater weight, and then the resulting feature vectors are trained through the neural network. In GCN, the update of node features is achieved by continuously aggregating neighbor node features, which will enhance the similarity of adjacent nodes, thereby greatly enhancing the classification ability. However, if multiple layers of GCN networks are stacked, the output features may be over-smoothed, so that vertices from different clusters may become indistinguishable, and the classification effect will be reduced.

In addition, GCN cannot effectively handle more complex graphs, such as heterogeneous graphs, dynamic graphs, and weighted graphs. For graphs containing a large number of nodes and edges, it also brings challenges to the calculation of GCN. After GCN was proposed, a lot of work has been done to improve its shortcomings, such as R-GCN, GraphSAGE and other models.

references:

1、Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.

2,When can I understand your heart - Graph Convolutional Neural Network (GCN) - Zhihu

3, Detailed introduction to GCN graph convolutional network

Guess you like

Origin blog.csdn.net/weixin_44458771/article/details/129040246