[Graph Neural Network] Graph Embedding

There are mainly the following methods to map nodes into D-dimensional vectors:

        ①Artificial feature engineering : node importance, cluster coefficient, Graphlet, etc.

        ②Graph representation learning : realize self-supervised learning through random walk construction, such as DeepWalk, Node2Vec , etc.

        ③Matrix decomposition

        ④ Deep Learning : Graph Neural Network

1. Graph embedding

        Traditional graph machine learning requires artificial feature engineering to convert graphs into D-dimensional vectors. On the other hand, graph representation learning is a computer automatic learning feature that converts each modal input into a vector , and this process does not require human intervention.

         A d-dimensional vector has the following characteristics:

                Low dimension : the dimension of the vector is much smaller than the number of nodes

                Continuous : each element is a real number

                Dense : each element is not 0 (different from one-hot encoding)

        Embedding vectors contain connectivity information of the network and can be used for downstream tasks.

2. Basic framework

        1. Encoder

Input a node, and output its corresponding d-dimensional vector                 after processing .

        2. Decoder

Dot product                 the vector obtained by the encoder to get the cosine similarity (scalar, which can reflect the similarity of nodes)

        ! ! ! It should be noted that the structure of the encoder and decoder can be replaced, not necessarily a dot product, and not necessarily a cosine similarity.

        The direction of framework iterative optimization is to make the product of vectors of similar nodes in the graph large, and the product of vectors of dissimilar nodes small .

3. Common encoders

        1. Shallow Encoder

                In essence, the d-dimensional vectors of all nodes are artificially written into a matrix Z, and the embedding vector can be obtained by multiplying this matrix with a one-hot vector v, denoted as: ; Enc(v)=z_v=Z \cdot vwhere Zthe parameters of the matrix are learnable parameters .

        2. Random walk

                Starting from uthe node, generate a random walk sequence, and find vthe probability of passing through the node, denoted as: P(v|z_u).

                Can be calculated by softmax activation function: \sigma (z)[i]=\frac{e^{z[i]}}{\sum e^{z[i]}}or sigmoid calculation:S(x)=\frac{1}{1+e^{-x}}

                ①Deepwalk

                        1. Sampling to obtain a number of random walk sequences, and calculate the conditional probability

                        2. Iteratively optimize the d-dimensional vector of each node, so that the product of the number of co-occurring node vectors in the sequence is large, and the product of non-co-occurring node vectors is small.

        Likelihood objective function : \underset{f}{max}\sum logP(N_R(u)|z_u), where is the random walk sequence starting N_R(u)from the node, and the walk strategy is .uR

        Loss function : \iota=\sum\sum -log(P(v|z_u)), where the probability is calculated by softmaxP(v|z_u)=\frac{exp(z_u^Tz_v)}{\sum exp(z_u^Tz_n)}

         Negative sampling : can be P(v|z_u)=\frac{exp(z_u^Tz_v)}{\sum exp(z_u^Tz_n)}approximated as log(\sigma(z_u^Tz_v))-\sum_{i=1}^k log(\sigma(z_u^Tz_{n_i}) ), k in this formula is the number of negative samples. k generally takes 5-20, and the nodes in the same walk sequence should not be sampled as "negative samples".

                Node2vec

                        The remaining steps are the same as Deepwalk, but the bias of random walk can be specified by setting hyperparameters: BFS--breadth first--local exploration; DFS--depth first--global exploration

Guess you like

Origin blog.csdn.net/weixin_37878740/article/details/129653520