GNN/GCN
Distill:https://distill.pub/2021/gnn-intro/
Three major questions:
- Graph-level task (classification)
- Node-level task (vertex attribute judgment)
- Edge-level task (edge attribute judgment)
Information storage: (storage is efficient and not affected by sorting)
- Nodes: Scalar/Vector
- Edges: Scalar/Vector
- Adjacency List: The length is the same as the edge. The i-th item indicates which two vertices the i-th edge is connected to.
- Global: scalar/vector
The “message passing neural network” framework is graph-in, graph-out, and does not change the connectivity of the graph.
The simplest GNN
The nodes vector, edges vector, and global vector construct an MLP respectively as a layer. All connection information is not considered .
Pooling operation
How to get the predicted value from the last layer output?
If prediction is made for vertices: two classifications: enter MLP, softmax with output dimension 2 for each vertex. Note that there is only one MLP , shared by all vertices.
What to do with a vector that does not correspond to a vertex?
The edge vectors connected to the vertex and the global vector are added together (assuming the dimensions are the same), entered into the MLP, and output
What if there are only vertices and no edge vectors?
To aggregate (add) the vertex vectors to the edges (connected vertices), you can +U vector, enter the MLP of the edges, and output
What if there are no global vectors but only vertex vectors?
The vertex vectors are summed into the MLP of U, and the output
The structure diagram of the simplest GNN is as follows:
limitation:
The structural information of the graph is not used during transformation , and the graph information is not updated into the graph.
Improvement: passing messages
The simplest message passing: when updating a vertex, add it with neighboring vertices and enter MLP
Similar to CNN, it is connected to adjacent pixels, the convolution kernel weights are the same, and the channel is MLP
Edge and vertex information aggregation can be done early:
- Pass the vertex information to the edges, the edge information is updated, and then aggregate the updated edge information to the vertices, and the vertices are updated (the dimensions are different)
- In turn, the results are different
- Alternate update
How to do global information U?
The graph is large and messages are transmitted far away. Add the master node or context vector (connected to all vertices and all edges), that is, U
U is connected to everything in E and V. When edges/vertices are gathered, U will also be added. Updating U will bring in all EVs and enter the MLP.
Similar to attention, get information similar to q
Aggregation: mean, max, sum
Other pictures
- There are different sides (directed and undirected)
- There are sub-pictures
- 。。。。
graph sampling batching
1. Randomly sample points, then find neighbors and create subgraphs to reduce storage
2. Randomly sample a point and walk randomly, and fix the number of random steps to get the subgraph.
3. Walk a few steps randomly and find neighbors.
4.diffusion sampling: Take a point and move the N nearest neighbors forward k steps to obtain a subgraph.
Inductive biases
Any machine learning has assumptions
CNN: spatial transformation invariance
RNN: temporal continuity
GNN: Maintain the symmetry of the graph (no matter how the vertices are exchanged, GNN remains unchanged)
In the aggregation operation, the Max Mean Sum is almost the same.
GCN as subgraph function approximators
GCN: (The one with convergence) If there are k layers and only look at neighbors, each vertex will see its subgraph, up to k steps away.
Point and edge duality
graph attention networks
Convolution weights are related to position, and GNN weights need to be insensitive to position.
Weights can depend on the relationship between vertex vectors, dot product, softmax, vertices get weights
Interpretability of graphs
Point and edge duality
graph attention networks
Convolution weights are related to position, and GNN weights need to be insensitive to position.
Weights can depend on the relationship between vertex vectors, dot product, softmax, vertices get weights
Interpretability of graphs
generative modelling