Graph embedding Graph embedding study notes

1 What is graph embedding?

Map the nodes or edges of the graph to a low-dimensional vector space, that is, represent massive, high-dimensional, heterogeneous, complex and dynamic data as a unified, low-dimensional, dense vector to preserve the structure and properties of the graph, It aims to realize node classification and clustering, link prediction, graph reconstruction and visualization, etc., and provides a method with lower computational complexity.
Including nodes, edges, subgraphs, full graphs based on manually constructed features, matrix decomposition, random walks, graph embedding methods of graph neural networks.

2 Graph embedding method

2.0 Method basis - Word2vec method and Skip-Gram model

The Word2vec method and the Skip-Gram model are the basis of graph embedding methods.

The idea of ​​word2vec can be simply summed up in one sentence: use massive text sequences, predict the probability of co-occurrence of target words according to context words, let a constructed network optimize to maximize the probability, and the obtained parameter matrix is ​​the vector of words.

Word2vec is an embedding method that converts words into embedding vectors , similar words should have similar embeddings. Using a Skip-Gram network with a hidden neural network, input the central word to predict the surrounding words [CBOW model, use the surrounding words to predict the central word]. Skip-Gram is trained to predict adjacent words in sentences. This task is called a pseudo-task because it is only used during the training phase. The network takes words as input and optimizes them so that it can predict adjacent words in a sentence with high probability.

2.1 Deep Walk DeepWalk

Word2vec assumes that adjacent words are similar, and random walk assumes that adjacent nodes are similar, which can be applied.

Deep Walk (DeepWalk) is a graph embedding method based on Word2vec. It is an extension of language model and unsupervised learning from word sequence to graph. First, neighbor nodes of nodes in the network are randomly generated to form a fixed-length random Walk the sequence, and then use the Skip-Gram model to map the generated fixed-length node sequence into a low-dimensional embedding vector. This method can learn the relationship information of node pairs, and the time complexity is O( log | V| ), which realizes the incremental learning of dynamic graphs.

Process:
①Input a graph
②Sampling random walk sequence
③Train work2vec with random walk sequence
④In order to solve the problem of too many classifications, you can add a hierarchical softmax or Huffman code number
⑤Get the graph embedding of each node vector representation

Note:
Deepwalk compresses and maps nodes into a d-dimensional vector, and the similar points in the original image are after embedding.
Emmbedding contains node connection and community association information, which can be used for downstream tasks such as subsequent node classification.
Wordvec assumes that adjacent words are similar. Random walk assumes that adjacent nodes are similar, which can be applied~
Adjacent nodes are still similar after encoding
. Instant training, no need to train from scratch when adding data

The pseudocode is shown in the figure below:
insert image description here
insert image description here
DeepWalk has two sets of weights:
① D-dimensional embedding of N nodes ② (N-1) logistic regression, each with D weights

Advantages and disadvantages

[Advantages]
① It is the first to apply the ideas of deep learning and natural language processing to graph machine learning.
② In the sparsely labeled node classification scenario, the embedding performance is excellent.
③Linear parallel sampling, online incremental training
[Disadvantages]
① Uniform random walk, no biased walk direction. (Node2Vec)
②Requires a large number of random walk sequence training.
③Based on random walk, see the leopard in the tube. Two nodes that are far away cannot influence each other. Can't see full image information. (Graph neural networks can), that is, the limited step size affects the integrity of context information.
④Unsupervised, only encodes the connection information of the graph, and does not use the attribute characteristics of the nodes.
⑤ Neural network and deep learning are not really used.
⑥It is not suitable for weighted graphs, and can only maintain the second-order similarity of graphs;
⑦Facing large-scale graphs, it is more complicated to adjust hyperparameters, and the embedding effect is not significant after the number of walking steps exceeds 2^5.

The DeepWalk method performs a random walk, which means that the embedding does not preserve the local neighborhood of nodes well. The Node2vec approach solves this problem.

2.2 Node-vector model node2vec

Node2vec is an improvement of DeepWalk. By adjusting the parameters of Breadth-First Search (BFS) and Depth-First Search (DFS) strategies respectively, the global structure and local structure of the graph can be obtained.
The main steps are as follows:
①Calculate transition probability, combine BFS and DFS to generate random walk sequence
②Use Skip-Gram model to embed the generated walk sequence

Advantages and disadvantages

[Advantages]
Each step can be processed in parallel, and can maintain semantic information and structural information.
[Disadvantages]
The embedding effect of nodes with specific attributes still needs to be improved.

Node2vec vs DeepWalk

Node2vec is different from deepwalk, mainly through the jump probability between nodes. The jump probability is a third-order relationship, that is, considering the current jump node and the "distance" from the previous node to the next node, the direction of the walk is controlled by returning the parameter p and the in and out (or far away) parameter q (return or continue Forward)
insert image description here
When p and q are both 1, node2vec degenerates into deepwalk, so in fact we can know that deepwalk itself is a pure random walk, and node2vec is more like a flexible framework based on deepwalk, because we can Change the purpose of embedding we want by specifying hyperparameters.
Therefore, it can only be said that deepwalk can capture the co-occurrence between nodes. This co-occurrence may include homogeneity or structure, and node2vec allows users to flexibly define that we want to capture more structure. Or more homogeneity. If we want to consider the homogeneity based on the entire graph, such as the embedding problem of the same or similar roles in the long-distance local communities mentioned above, we need to consider using other embedding algorithms to solve it. For example, the computational complexity is very high and it is basically impossible. struc2vec running on large scale graphs.

2.3 LINE

It can be applied to large-scale graphs to represent structural information between nodes.
Detailed explanation of LINE method

LINE adopts a breadth-first search strategy to generate context nodes: only nodes that are at most two hops away from a given node are considered as its neighbors. Furthermore, it uses negative sampling to optimize the Skip-gram model compared to the hierarchical softmax used in DeepWalk.

Features:
Consider first-order similarity and second-order similarity [first-order: local structural information; second-order: neighbors of nodes. Nodes that share a neighborhood are likely to be similar. ], concatenating the first-order similarity and the second-order similarity.

Advantages and disadvantages

[Advantages]
Deepwork can only use undirected graphs, while LINE can have both directed and undirected graphs.
[Disadvantages]
The effect of embedding encoding on low-degree nodes is not good.

2.4 struc2vec

A disadvantage of DeepWalk and Node2vec is that due to the limited sampling length of walk, it cannot effectively model long-distance nodes with structural similarity. But the reason why the previous algorithm performed better is that most of the data sets are more inclined to describe the homogeneity, that is, the nodes with similar distances are also similar in the feature space, which is enough to cover most of the data sets. In the process of constructing the graph, Struc2vec does not need the position information and label information of the nodes, and only relies on the concept of the degree of the nodes to construct the multi-layer graph.
insert image description here
An intuitive concept, if two nodes have the same degree, then the two nodes are more similar in structure, further, if all the neighbor nodes of the two nodes also have the same degree, then the two nodes are structurally should be more similar . Based on the intuitive concept above, Struc2vec uses dynamic regularization to construct multi-layer graphs and corresponding edges.

2.5 SDNE

The previous DeepWalk, LINE, node2vec, and struc2vec all used shallow structures, and shallow models often cannot capture highly nonlinear network structures. That is, the SDNE method is produced, which uses multiple nonlinear layers to capture the embedding of nodes, and it does not perform random walks. The performance of SDNE on different tasks is very stable.

Its structure is similar to an autoencoder:
self -> vector -> neighbor

Its design keeps the embeddings close to the first and second orders.
The first-order approximation is the local pairwise similarity between nodes connected by edges. It describes the characteristics of the local network structure. Two nodes in a network are similar if they are connected by an edge. When a paper cites another paper, it means that they deal with similar topics;
the second-order proximity indicates the similarity of the node's adjacent structure. It captures the global network structure. Two nodes tend to be similar if they share many neighbors.

The autoencoder neural network proposed by SDNE has two parts - see the figure below. Autoencoders (left and right networks) receive node adjacency vectors and are trained to reconstruct node adjacency. These autoencoders are named raw autoencoders and they learn second-level proximity. The adjacency vector (a row in the adjacency matrix) has positive values ​​at locations representing connections to the selected node.
There's also a surveillance part of the network -- the link between the left and the right. It computes the embedding distance from left to right and includes it in the common loss of the network. The network is trained in such a way that the left and right autoencoders get all pairs of nodes connected by input edges. The loss in the distance part helps to preserve the first-order proximity.
insert image description here
The total loss of the network is calculated as the sum of the loss of the left and right autoencoders and the loss of the middle part.

Summary and code implementation of the above five methods

Summarize

1. DeepWalk
uses random walks to form sequences, and uses the skip-gram method to generate node embeddings.
2. node2vec
uses different random walk strategies to form sequences, similar to skip-gram to generate node embeddings.
3. LINE
captures the first-order and second-order similarities of nodes, solves them separately, and then stitches the first-order and second-order together as the embedding of the node.
4. struc2vec
captures the structural information of the graph, and has a better effect when its structural importance is greater than that of its neighbors.
5. SDNE
uses multiple nonlinear layers to capture the first-order and second-order similarities.

Code

https://github.com/shenweichen/GraphEmbedding

2.6 graph2vec

This approach is to embed the entire graph . It computes a vector describing the graph. In this part, graph2vec is chosen as a typical example because it is a better method for implementing graph embedding.
Graph2vec is based on the idea of ​​the doc2vec method, which uses the SkipGram network. It gets the ID of the input document and is trained to maximize the probability of predicting a random word from the document.
The Graph2vec method consists of three steps:
① Sampling from the graph and relabeling all subgraphs. A subgraph is a set of nodes that appear around a selected node. The nodes in the subgraph are no more than the selected number of edges.
② Training jump graph model. A graph is similar to a document. Since a document is a collection of words, a graph is a collection of subgraphs. In this phase, the skip graph model is trained. It is trained to maximize the probability of predicting subgraphs present in the input graph. The input map is provided as a one-hot vector.
③ Compute the embedding by providing a graph ID at the input as a one-hot vector. Embeddings are the result of hidden layers. Since the task is to predict subgraphs, graphs with similar subgraphs and similar structures have similar embeddings.
insert image description here

2.7 Other embedding methods

The above analysis is some commonly used methods. Since this topic is very popular at the moment, there are some other methods available:
• Vertex embedding methods: LLE, Laplacian Eigenmaps, Graph Decomposition, GraRep, HOPE, DNGR, GCN, LINE
• Graph embedding methods: Patchy-san , sub2vec (embed subgraphs), WL kernel and deep WL kernels
insert image description here

3 Challenges of Graph Embedding

As mentioned earlier, the goal of graph embedding is to discover low-dimensional vector representations of high-dimensional graphs, and obtaining vector representations for each node in the graph is difficult and has several challenges that have been driving research in this field :
① Attribute selection: A "good" vector representation of nodes should preserve the structure of the graph and the connections between individual nodes. The first challenge is to choose which graph properties the embedding should preserve. Given the plethora of distance metrics and attributes defined in the graph, this choice can be difficult, and performance may depend on the actual application scenario. They need to represent graph topology, node connections, and node neighbors. The performance of predictions or visualizations depends on the quality of the embeddings.
② Scalability: Most real networks are large, containing a large number of nodes and edges. Embedding methods should be scalable and able to handle large graphs. Defining a scalable model is challenging, especially when the model aims to preserve the global properties of the network. Good embedding methods need to be efficient on large graphs.
③ Dimensionality of embedding: It is difficult to find the best dimensionality of representation during actual embedding. For example, higher dimensionality may improve reconstruction accuracy, but with higher time and space complexity. Although the lower dimension has low time and space complexity, it will undoubtedly lose a lot of original information in the graph. Users need to make trade-offs based on their needs. In some articles, they generally report that an embedding size between 128 and 256 is sufficient for most tasks. In the Word2vec method, they chose an embedding length of 300.

4 Image embedding interview questions

The relationship between graph embedding and graph network
Reference: word2vec and fully connected neural network

What information of the graph is used and what information is discarded by graph embedding
Reference: Topological information is used, internal attributes of nodes are discarded

How does the graph embedding algorithm deal with the problem of data sparsity
? Reference: You can use algorithms such as LINE and SDNE that can use first- and second-order neighbors

SDNE model principle, loss function and optimization method
Reference: Use a deep learning model, use a loss function including self-encoding (2nd order) + neighbor similarity (1st order), capture global and local information, SGD.

What are the methods of graph embedding
Reference: factorization, random walk, deep learning

How to calculate the vertex embedding of the newly added node
Reference: If it is a transductive model, you need to sample new data to fine-tune the training model, or retrain the model. If it is an inductive test model, it has the ability to process dynamic graphs, such as GraphSAGE.

SDNE learning paradigm
Reference: In the paper, it is considered that the first-order similarity is supervised, the second-order similarity is unsupervised, and the overall is semi-supervised. In fact, the description of the first-order similarity only uses the topological structure of the graph, and does not use the labels of the nodes, so it can be considered as self-supervised.

Guess you like

Origin blog.csdn.net/weixin_45928096/article/details/125600107