There are mainly the following methods to map nodes into D-dimensional vectors:
①Artificial feature engineering : node importance, cluster coefficient, Graphlet, etc.
②Graph representation learning : realize self-supervised learning through random walk construction, such as DeepWalk, Node2Vec , etc.
③Matrix decomposition
④ Deep Learning : Graph Neural Network
1. Graph embedding
Traditional graph machine learning requires artificial feature engineering to convert graphs into D-dimensional vectors. On the other hand, graph representation learning is a computer automatic learning feature that converts each modal input into a vector , and this process does not require human intervention.
A d-dimensional vector has the following characteristics:
Low dimension : the dimension of the vector is much smaller than the number of nodes
Continuous : each element is a real number
Dense : each element is not 0 (different from one-hot encoding)
Embedding vectors contain connectivity information of the network and can be used for downstream tasks.
2. Basic framework
1. Encoder
Input a node, and output its corresponding d-dimensional vector after processing .
2. Decoder
Dot product the vector obtained by the encoder to get the cosine similarity (scalar, which can reflect the similarity of nodes)
! ! ! It should be noted that the structure of the encoder and decoder can be replaced, not necessarily a dot product, and not necessarily a cosine similarity.
The direction of framework iterative optimization is to make the product of vectors of similar nodes in the graph large, and the product of vectors of dissimilar nodes small .
3. Common encoders
1. Shallow Encoder
In essence, the d-dimensional vectors of all nodes are artificially written into a matrix , and the embedding vector can be obtained by multiplying this matrix with a one-hot vector , denoted as: ; where the parameters of the matrix are learnable parameters .
2. Random walk
Starting from the node, generate a random walk sequence, and find the probability of passing through the node, denoted as: .
Can be calculated by softmax activation function: or sigmoid calculation:
①Deepwalk
1. Sampling to obtain a number of random walk sequences, and calculate the conditional probability
2. Iteratively optimize the d-dimensional vector of each node, so that the product of the number of co-occurring node vectors in the sequence is large, and the product of non-co-occurring node vectors is small.
Likelihood objective function : , where is the random walk sequence starting from the node, and the walk strategy is .
Loss function : , where the probability is calculated by softmax
Negative sampling : can be approximated as , k in this formula is the number of negative samples. k generally takes 5-20, and the nodes in the same walk sequence should not be sampled as "negative samples".
Node2vec
The remaining steps are the same as Deepwalk, but the bias of random walk can be specified by setting hyperparameters: BFS--breadth first--local exploration; DFS--depth first--global exploration