Embedding Entire Graphs
1. Basic concepts of graph embedding vectors
Unlike Node Embeddings, Graph Embeddings encode the entire graph or subgraph while ignoring the nodes within it. Application scenarios include anomaly detection or molecular toxicity detection.
2. Sum the Node Embeddings or average them after summing.
As shown in the figure, first, the nodes in the graph/subgraph are represented by graph embedding vectors ( zu \mathbb{z}_uzu), and then sum the embedding vectors (or re-average the summed vectors) to obtain the embedding vector of the graph/subgraph.
NOTE: This method is simple but works very well.
3. Create a super node
As shown in the figure, a super node is added to the graph/subgraph on the basis of the original graph, and the super node is connected to all nodes of the graph/subgraph. Then the Node Embeddings method is used to obtain the embedding vector of the node, which is the embedding vector of the graph/subgraph.
4. Anonymous random walk
1. Anonymous random walk method:
Anonymous random walk only remembers the time when a node appears and does not care about the specific node.
As shown in the figure, in the subgraph composed of nodes A, B, C, D, E and F:
The order of nodes in the first walk is: A, B, C, B, C. A is the first visited node marked 1, B is the second visited node marked 2, C is the third visited node marked 3, and there are no other nodes. At this time, the anonymous random walk path is: 1, 2, 3, 2, 3;
The order of nodes in the second walk is: C, D, B, D, B. C is the first visited node marked 1, D is the second visited node marked 2, B is the third visited node marked 3, and there are no other nodes. At this time, the anonymous random walk path is: 1, 2, 3, 2, 3;
It can be seen from the two walks that although the paths of the two walks are different, their anonymous walking paths are the same. Only remember the time when the node appears and don't care about the specific node.
It can be seen from the figure that for anonymous random walk sequences of different lengths, the number of walking paths explodes exponentially.
Among them, the number of paths with a length of 3 is calculated as shown in the figure:
2. Use anonymous random walks for graph embedding coding
For a graph/subgraph, perform a fixed-length anonymous random walk, and record the number of times (or occurrence probability) of each walking sequence as a vector, which is the embedding vector of the graph. As shown in the picture:
The length of the random walk lll is a hyperparameter, for the selectedlll , how many times the graph/subgraph should be sampled is a question worth thinking about, as shown in the figure:
In order to ensure the robustness of sampling, the sampling distribution error is maintained at [ ε , δ ] [\varepsilon,\delta ][ eh ,δ ] , for the selected lengthlll , calculate the number of random walk sequences as $\eta(look up the table or calculate it yourself), get the number of sampling (look up the table or calculate it yourself), get the number of sampling(Look up the table or calculate by yourself), the number of sampling times m$ is:
m = ⌈ 2 ε 2 ( log ( 2 η − 2 ) − log ( δ ) ) ⌉ m=\left \lceil \frac{2}{\varepsilon ^2} \left ( log \left ( 2^\eta - 2 \right ) - log(\delta ) \right ) \right \rceilm=⌈e22(log(2the−2)−l o g ( δ ) ) ⌉For
example: for lengthllThe random walk sequence of l , the walk sequence eta \etaThere are 877 kinds of eta , and the distribution error is defined as[ε = 0.1, δ = 0.01] [\varepsilon=0.1,\delta=0.01][ e=0.1,d=0.01 ] , calculated asmmm is 122500 times.
五、Learn Walk Embeddings
The difference between this method and anonymous random walk is that anonymous random walk only generates an embedding vector to calculate the number of different walk sequences (probability), while this method embeds each anonymous walk sequence separately and adds a graph. /The embedding vector of the subgraph.
Among them, the embedding vector of the graph/subgraph is z G \mathbb{z}_GzG, anonymous random walk sequence embedding vector Z = { zi : 1... η } Z=\{z_i:1...\eta\}Z={ zi:1... the }。the \etaeta is the total number of walk sequences after fixing the walk length.
The specific steps are:
- Create a context manager, NR (u) N_R(u)NR( u ) Recordinguuu is the walking path from the starting point, and the total number isTTT. _
- Contextual self-supervision: for the first Δ \DeltaThe walking sequence that has appeared Δ times, predicts the Δ + 1 \Delta+1D+A sequence of 1 occurrence (similar to the transformer-decoder algorithm in NLP). The optimization function is as shown above.
The specific steps of contextual self-supervision are:
First, convert the former Δ \DeltaΔ -time walking sequence vector{ zt − Δ , . . . zt − 1 } \{ \mathbb{z}_{t-\Delta},...\mathbb{z}_{t-1} \ }{ zt − D,...zt−1} Sum and average, and then combine it with the whole graph embedding vectorz G \mathbb{z}_GzGStacked. Then calculate the Δ + 1 \Delta+1 through linear change and softmaxD+A wandering sequence that appears once . Repeat the above process to repeatedly update allz \mathbb{z}z vector. Note that negative samples are used to optimize calculations.
6. Summary
Picture excerpt from - Stanford CS224W: Machine Learning with Graphs