[Graph Neural Network] Graph Neural Network (GNN) Study Notes: Graph Embedding


Graph widely exists in various scenarios in the real world, that is, a collection of nodes and edges. For example, the connection between people in social networks, the interaction of proteins in biology, and the communication between IP addresses in communication networks, etc. In addition, our most common picture and sentence can also be abstractly regarded as the structure of a graph model, and the graph structure can be said to be ubiquitous.

Research on Graph can solve some of the following problems: For example, the prediction of new relationships in social networks, the recommended people who may know people seen on QQ; the prediction of protein functions and interactions in biomolecules; in communication networks, abnormal Prediction and monitoring of events and prediction of network traffic. If we want to solve the above problems, the first thing we need to do is to represent the graph. Graph embedding is a very effective technology.

Why do we need to embed Graph embedding?

Graph Embedding plays an important role in graph data analysis and machine learning. Here are some of the main reasons for graph embedding:

  1. Dimensionality reduction and visualization: Graph data is usually high-dimensional, containing a large number of nodes and edges. By embedding graphs into low-dimensional spaces, complex graph structures can be transformed into easy-to-understand and visualized forms, helping us to intuitively observe and analyze the features and patterns of graphs.

  2. Feature Representation: Graph embedding can transform nodes and edges into continuous vector representations, thus transforming discrete information in graphs into computable and learnable feature representations. This allows us to apply traditional machine learning and deep learning methods to graph data and utilize these representations for tasks such as node classification, link prediction, graph clustering, etc.

  3. Similarity and Relationship Modeling: Graph embeddings can capture the similarity and relationship between nodes and edges in the embedding space. Similar nodes will be close to each other in the embedding space, and edges with similar relations will also exhibit similar patterns in the embedding space. This provides a powerful tool for similarity-based recommender systems, social network analysis, link prediction, and other tasks.

  4. Information Transfer and Representation Learning: Graph embedding can learn richer node representations by considering the proximity relations between nodes and edges and transferring information. By considering the topology of the graph during the embedding process, the embedding model can capture the location of nodes in the graph and the surrounding context, thus providing richer and accurate node representations.

All in all, graph embedding provides a way to transform graph data into computable and learnable representations, enabling us to analyze, mine and predict graphs using machine learning and deep learning techniques. It has wide applications in many fields, such as social network analysis, bioinformatics, recommender systems, network security, etc.

Graph Embedding

Graph embedding is a process of mapping graph data (usually a high-dimensional dense matrix) into a low-density vector , which can well solve the problem that graph data is difficult to efficiently input into machine learning algorithms. Graph embedding needs to capture the topology of the graph, the relationship between vertices, and other information, such as subgraphs, edges, etc. If more information is represented, then downstream tasks will achieve better performance. There is a consensus in the process of embedding: the nodes in the vector space that remain connected are close to each other . Based on this, the researchers proposed Laplacian feature maps ( Laplacian Eigenmaps) and local linear embeddings ( Locally Linear Embedding, LLE).

graph embedding
In general, the embedding on the graph can be roughly divided into two types: node embedding and graph embedding . When it is necessary to classify nodes, predict node similarity, and visualize node distribution, node embedding is generally used; when it is necessary to predict at the graph level or predict the entire graph structure, we need to represent the entire graph as a vector.

Classic methods such as DeepWalk, node2vec, SDNE and graph2vec will be introduced later.

What are the advantages of using graph embeddings?

Graphs are meaningful and understandable representations of data, but embedded representations of graphs are needed for the following reasons.

  1. Machine learning directly in graph has certain limitations . We all know that a graph is composed of nodes and edges. These vector relationships can only be represented by mathematics, statistics or specific subsets, but the vector space after embedding has more Flexible and rich computing methods ;
  2. Graph embedding can compress data , and we generally use an adjacency matrix to describe the connections between nodes in a graph. The dimensions of the connectivity matrix are ∣ V ∣ x ∣ V ∣ |V| x |V|V x V , where∣ V ∣ |V|V is the number of nodes in the graph. Each column and row in the matrix represents a node. A non-zero value in the matrix indicates that two nodes are connected. It is almost impossible to use an adjacency matrix with the feature space of a large graph. One with1M 1M1 M nodes and1 M x 1 M 1Mx1MHow to calculate and represent the graph of the adjacency matrix of 1 M x 1 M ? But embedding can be regarded as a compression technology, which can play a role in dimensionality reduction;
  3. Vector calculation is simpler and faster than directly operating on the graph.

However, graph embedding also needs to meet certain requirements.

  • Attribute selection : Make sure that the embeddings describe the attributes of the graph well. They need to represent graph topology, node connections and node neighborhoods. The performance of predictions or visualizations depends on the quality of the embeddings;
  • Scalability : Most real networks are large, containing a large number of nodes and edges. Embedding methods should be scalable and able to handle large graphs. Defining a scalable model is challenging, especially when the model aims to preserve the global properties of the network. The size of the network should not slow down the embedding process. A good embedding method not only efficiently embeds on small images, but also needs to be able to efficiently embed on large images;
  • Dimensions of embedding : It is difficult to find the optimal dimension of representation during actual embedding. The larger the dimension, the more information can be retained, but it usually has higher time and space complexity. Although the lower dimension has low time and space complexity, it will undoubtedly lose a lot of original information in the graph.

What are the methods of graph embedding?

There are many methods of Graph Embedding. The following are some common graph embedding methods:

  1. Node Embedding (Node Embedding) : The embedding of nodes draws on the method of word2vec. The reason why this method can be established is that the distribution of the nodes in the graph and the words in the corpus follow the power law.
    figure 2

    • DeepWalk: Use a random walk to sample the node sequence in the graph, and then use the Word2Vec model to embed the node sequence into a low-dimensional vector space.
    • node2vec: A node sampling strategy based on random walks, combining depth-first search and breadth-first search to generate node sequences, and then using Word2Vec for embedding learning.
    • LINE: Learn node embeddings by maximizing the similarity of neighbor nodes and minimizing the similarity of non-neighbor nodes.
  2. Graph Embedding :

    • GraphSAGE: Use the sampled neighbor node features to learn the embedding representation of the node, and perform graph embedding learning by aggregating the features of the neighbor nodes.
    • Graph Convolutional Network(GCN): Feature propagation is performed on the graph through convolution operations, and the node's embedding representation is learned by using the features of the node and its neighbor nodes.
    • Graph Attention Network(GAT): Use the self-attention mechanism to learn the relationship between nodes, and perform embedding learning according to the node's neighbor node features.
  3. Subgraph Embedding :

    • GraphSAGE: Learning embedding representations of subgraphs by sampling and aggregating their node and edge features.
    • Graph Isomorphism Networks(GIN): Learning subgraph embeddings via a multi-layer perceptron using local structural information of subgraphs.
  4. Graph Attention Network (Graph Attention Network) :

    • Graph Attention Network(GAT): By using the self-attention mechanism to learn the relationship between nodes, and then learn the embedding representation of nodes.
    • Graph Attention Graph Neural Network(GAGNN): Extends GAT to learn embeddings by learning attention weights for nodes and subgraphs.
  5. Random Walk Embedding (Random Walk Embedding) :

    • DeepWalk: The node sequence in the graph is sampled by a random walk, and the node sequence is embedded into a low-dimensional vector space using the Word2Vec model.
    • Node2Vec: Node sampling strategy based on random walk, combined with depth-first search and breadth-first search to generate node sequences, and then use Word2Vec for embedding learning.

This is just a small sample of graph embedding methods, there are many other methods and techniques such as Graph Autoencoders, GraphGAN, GraphSNE, etc. Each method has its unique characteristics and applicable scenarios, and choosing a graph embedding method suitable for a specific task and data needs to be evaluated and selected according to the specific situation.


Add to the Word2Vec and Skip-gram models in the NLP field:

  • Word2vec is an embedding method that converts words into embedding vectors. Similar words should have similar embeddings. Word2vec uses skip-gram networks, which are neural networks with one hidden layer (three in total) . The skip-gram model is given a certain word to predict the words adjacent to the context . The figure below shows an example of input words (marked in green) and predicted words. With this task, the authors achieve that two similar words have similar embeddings, since two words with similar meanings may have similar neighborhood words.
    skip-gram
    Figure 3 skip-gram model. The input of the network is one-hot encoded, the length is the same as the length of the word dictionary, only one position is 1, and the output is the embedded representation of the word.

The following introduces three node embedding methods and one graph embedding method. These methods are similar to the embedding principle of Word2vec.

Node Embedding Method (Node Embeddings)

Three classic node embedding methods are introduced below: DeepWalk , Node2vec , SDNE , LINE .

1. DeepWalk

DeepWalk
DeepWalk learns the representation of a network through random walk (truncated random walk), and can get better results when there are few marked vertices in the network . The random walk starts from the selected node, then moves from the current node to a random neighbor, and performs a certain number of steps. This method can be roughly divided into three steps:

  • Sampling : Sampling the nodes on the graph through random walks, and obtaining a sequence of nodes within a given time. The research in this paper shows that performing 32 to 64 random traversals from each node is enough to represent the structural relationship of the nodes;
  • Training skip-gram : Random walks are comparable to sentences in the word2vec approach. The input of skip-gram in the text is a sentence, where the input is a sequence obtained from random walk sampling, and further predicted by maximizing the probability of predicting adjacent nodes . Usually about 20 neighbor nodes are predicted - 10 nodes on the left and 10 nodes on the right;
  • Computational embedding :
    Computational embedding

DeepWalk can obtain the local context information of the point in the graph through random walk, so the learned representation vector reflects the local structure of the point in the graph, and the adjacent points shared by two points in the graph (or high-order adjacent points ) the more, the shorter the distance between the corresponding two vectors . But the DeepWalk method randomly performs random walks, which means that the embedding cannot preserve the local relationship of nodes well , and the Node2vec method can solve this problem.

The DeepWalk core code
DeepWalk algorithm mainly includes two steps. The first step is to randomly walk the sampling node sequence, and the second step is to use skip-gram modelword2vec to learn the expression vector.
DeepWalk Algorithm
①Construct a homogeneous network, and perform Random Walk sampling from each node in the network to obtain locally associated training data; ②SkipGram training is performed on the sampled data, and the discrete network nodes are expressed as vectors to maximize the joint Now, use Hierarchical Softmax as a classifier for super-large-scale classification.
Random Walk
The core code of the random walk is as follows:

import random

def randomwalk(G,gamma,t):

    '''

    :param G: 图结构,networks生成
    :param gamma: 随机游走迭代轮数
    :param t: 随机游走步数
    :return: 返回游走序列
    '''
    W = [] #接收随机游走序列结果
    for i in range(gamma):
        node_list = list(G.nodes())
        random.shuffle(node_list) #打乱节点集合顺序
        for j in node_list:
            w = [j]
            v = j
            while len(w) < t:
                v_neig = list(G.neighbors(v)) #获取节点v所有邻居节点
                w.append(random.choice(v_neig))
                v = w[-1]
            W.append(w)
    return W

RandomWalk is a depth-first traversal algorithm that can repeatedly visit visited nodes . Given the current visit start node, randomly sample nodes from its neighbors as the next visit node, and repeat this process until the length of the visit sequence meets the preset conditions.
DeepWalk
The specific algorithm of the seventh step Skip-Gram is:
Skip-Gram
From the perspective of the algorithm flow, the nodes that have been visited can also be visited repeatedly, so it is a depth-first traversal algorithm that can repeatedly visit the visited nodes. Among them, HS strategy refers to Hierarchical Softmax

Sample code:

import random
from gensim.models import Word2Vec
import networkx as nx

# 构建图
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (4, 8), (4, 9)])

# 产生随机游走序列
def _simulate_walks(nodes, num_walks, walk_length):
  walks = []
  for _ in range(num_walks):
    random.shuffle(nodes)
    for v in nodes:
      walks.append(deepwalk_walk(walk_length=walk_length, start_node=v))
  return walks

# 从start_node开始随机游走
def deepwalk_walk(walk_length, start_node):
  walk = [start_node]
  while len(walk) < walk_length:
    cur = walk[-1]
    cur_nbrs = list(G.neighbors(cur))
    if len(cur_nbrs) > 0:
      walk.append(random.choice(cur_nbrs))
    else:
      break
  return walk

# 每个节点进行随机游走的次数
num_walks = 10  

# 每次随机游走的步数
walk_length = 5  

# 得到所有节点
nodes = list(G.nodes())

# 随机游走生成节点序列
walks = _simulate_walks(nodes, num_walks=num_walks, walk_length=walk_length)

# 使用Word2Vec模型学习节点嵌入
model = Word2Vec(walks, vector_size=32, window=5, min_count=0, sg=1, workers=4, epochs=5)

# 获取节点的嵌入表示
node_embeddings = model.wv

# 打印节点的嵌入向量
for node in G.nodes():
  print(f"Node {
      
      node}: {
      
      node_embeddings[node]}")

DeepWalk

Using random walks has two advantages :

  1. Parallelization , random walk is local. For a large network, random walks of a certain length can be started at different vertices at the same time. Multiple random walks can be performed at the same time, which can reduce the sampling time.
  2. Adaptability , which can adapt to local changes in the network. The evolution of the network is usually a change of local points and edges. Such changes will only affect part of the random walk path, so it is not necessary to recalculate the random walk of the entire network every time during the evolution of the network.

2. LINE

DeepWalk uses DFS random walk to sample nodes in the graph, and uses word2vec to learn the vector representation of nodes in the sampled sequence graph. LINE is also a method based on the assumption of neighborhood similarity, but unlike DeepWalk, which uses DFS to construct neighborhoods, LINE can be regarded as an algorithm that uses BFS to construct neighborhoods . In addition, LINE can also be applied to weighted graphs ( DeepWalk can only be used for unweighted graphs ).
LINE
Paper address: https://arxiv.org/abs/1503.03578

LINE (Large-scale Information Network Embedding) is an algorithm for generating graph embeddings, which aims to map nodes in a graph to a low-dimensional vector space. It learns node embeddings by maximizing the probability of neighbor relations between nodes to preserve the similarity of nodes in a low-dimensional space.

The following figure shows a simple graph. The edges in the graph can be directed or undirected, and the thickness of the edges also represents the size of the weight:
graph

first order similarity

The first-order similarity is used to describe the local similarity between pairs of vertices in the graph, and the formal description is if u , vu,vu,There is a direct edge between v , then the edge weight wuv w_{uv}wuvThat is, the similarity between two vertices. If there is no direct edge, the first-order similarity is 0. As shown in the figure above, if there is a direct edge between 6 and 7, and the weight of the edge is large, it is considered that the two are similar and the first-order similarity is high. If there is no direct edge between 5 and 6, the first-order similarity between the two is considered to be similar. The similarity is 0.

second order similarity

Is only 1st order similarity enough? Obviously not enough, as shown in the figure above, although there is no direct connection between 5 and 6, they have many of the same neighbor vertices (1,2,3,4), which can actually show that 5 and 6 are similar, and 2 Order similarity is used to describe this relationship. Formally defined as, let pu = ( wu , 1 , . . . , wu , ∣ V ∣ ) p_u=(w_{u,1},...,w_{u,|V|})pu=(wu,1,...,wu,V) means the vertexuuThe first-order similarity between u and all other vertices, thenuuu givevvThe second-order similarity of v can be obtained by pu p_upuand pv p_vpvsimilarity representation. if uuu givevvIf there are no identical neighbor vertices between v , the second-order similarity is 0.

optimize the target

(1) First order
For each undirected edge ( i , j ) (i,j)(i,j ) , define the vertexvi v_iviand vj v_jvjThe joint probability between is p 1 ( vi , vj ) = 1 1 + exp ( − u ⃗ i ⋅ u ⃗ j ) p_1(v_i,v_j)=\frac{1}{1+exp(-\vec{u }_i \cdot \vec{u}_j)}p1(vi,vj)=1+exp(u iu j)1Among them u ⃗ i \vec{u}_iu iFor the vertex vi v_iviA low-dimensional vector representation of . This formula can be regarded as an inner product model to calculate the matching degree between two items.
Also define the empirical distribution p ^ 1 = wij W \hat{p}_1=\frac{w_{ij}}{W}p^1=Wwij W = ∑ ( i , j ) ∈ E w i j W=\sum_{(i,j)\in E}w_{ij} W=(i,j)Ewij
The optimization objective is to minimize: O 1 = d ( p ^ 1 , ( ⋅ , ⋅ ) ,p ^ 1 ( ⋅ , ⋅ ) ) O_1=d(\hat{p}_1,(\cdot, \cdot),\ hat{p}_1(\cdot, \cdot))O1=d(p^1,(,)p^1(,)) where d(\cdot, \cdot) is the distance between two distributions. The commonly used index to measure the difference between two probability distributionsKL散度isO 1 = − ∑ ( i , j ) ∈ E wij log ⁡ pi ( vi , vj ) O_1=-\sum_{(i,j)\in E}w_{ij} \log p_i(v_i,v_j)O1=(i,j)Ewijlogpi(vi,vj)
first-order similarity can only be used in undirected graphs.
(2) The second stage
maintains two embedding vectors for each vertex, one is the representation vector of the vertex itself, and the other is the representation vector when the point is used as the context vertex of other vertices.

For directed edge ( i , j ) (i,j)(i,j ) , define a given vertexvi v_iviUnder the condition, generate context (neighbor) vertex vj v_jvj的概率为 p 2 ( v j ∣ v i ) = e x p ( u ⃗ j ⋅ u ⃗ i ) ∑ k = 1 ∣ V ∣ e x p ( u ⃗ k T ⋅ u ⃗ i ) p_2(v_j|v_i)=\frac{exp(\vec{u}_j\cdot \vec{u}_i)}{\sum_{k=1}^{|V|}exp(\vec{u}_k^T\cdot \vec{u}_i)} p2(vjvi)=k=1Vexp(u kTu i)exp(u ju i)where ∣ V ∣ |V|V is the number of context vertices.

Modification target O 2 = ∑ i ∈ V λ id ( p ^ 2 ( ⋅ ∣ vi ) , p 2 ( ⋅ ∣ vi ) ) O_2=\sum_{i\in V}\lambda_i d(\hat{p}_2 (\cdot|v_i),p_2(\cdot|v_i))O2=iVlid(p^2(vi),p2(vi)) Among them,λ i \lambda_iliThe factor that controls the importance of nodes can be estimated by methods such as the degree of vertices or PageRank.

The empirical distribution is defined as: p ^ 2 ( vj ∣ vi ) = wijdi \hat{p}_2(v_j|v_i)=\frac{w_{ij}}{d_i}p^2(vjvi)=diwij, w i j w_{ij} wijis the edge ( i , j ) (i,j)(i,j ) is the edge weight,di d_idiis the vertex vi v_iviout-degree, for a weighted graph, di = ∑ k ∈ N ( I ) W ik d_i=\sum_{k\in N(I)}W_{ik}di=kN(I)Wi

Use KL divergence and set λ i = di \lambda_i=d_ili=di, ignoring the constant term, there is O 2 = − ∑ ( i , j ) ∈ E wij log ⁡ p 2 ( vj ∣ vi ) O_2=-\sum_{(i,j)\in E} w_{ij}\log p_2 (v_j|v_i)O2=(i,j)Ewijlogp2(vjvi)

Optimization Tips

(1) Negative Sampling (negative sampling)
When calculating the second-order similarity, the calculation of the denominator of the softmax function needs to traverse all vertices, which is very inefficient. The paper uses the technique of negative sampling optimization, and the objective function becomes: log
⁡ σ ( u ⃗ j T ⋅ u ⃗ i ) + ∑ i = 1 KE vn ∼ P n ( v ) [ − log ⁡ σ ( u ⃗ n T ⋅ u ⃗ i ) ] \log \sigma(\vec{u} _j^T\cdot \vec{u}_i)+\sum_{i=1}^K E_{v_n\sim P_n(v)}[-\log \sigma(\vec{u}_n^T\cdot \ vec{u}_i)]logs (u jTu i)+i=1KEvnPn(v)[logs (u nTu i)] whereKKK is the number of negative edges. In the paper,P n ( v ) ∝ dv 3 / 4 P_n(v)\propto d_v^{3/4}Pn(v)dv3/4 d v d_v dvis the vertex vvThe out-degree of v .

(2) Edge Sampling (edge ​​sampling)
notices that our objective function has a weight coefficient wij w_{ij} before logwij, when optimizing parameters using the gradient descent method, wij w_{ij}wijwill be directly multiplied on the gradient. If the variance of the edge weights in the graph is large, it is difficult to choose an appropriate learning rate. If a larger learning rate is used, the gradient may explode for larger edge weights, and a smaller learning rate may cause the gradient to be too small for smaller edge weights.

For the above problem, if all edge weights are the same, then choosing an appropriate learning rate becomes easy. A method of splitting weighted edges into equal weighted edges is used here. If a weight is wwThe side of w is split intowww edges with weight 1. This can solve the problem of learning rate selection, but due to the increase in the number of edges, the storage requirements will also increase.

Another method is to sample from the original weighted edge. The probability of each edge being sampled is proportional to the weight of the edge in the original graph, which not only solves the problem of learning rate, but also does not bring too much storage overhead. .

The sampling algorithm here uses the Alias ​​algorithm, which is an O ( 1 ) O(1)Discrete Event Sampling Algorithm with O ( 1 ) Time Complexity. For details, please refer toAlias ​​Method: Discrete Sampling Method with Time Complexity O(1)

sample code

The first input is the numbers of the two vertices, and then the corresponding embedding vectors are obtained respectively, and finally the result of the inner product is output. The real label is defined as 1 or -1. The final model outputs the inner product, and then the inner product result can be optimized.
LINE
In the implementation here, the first-order and second-order methods are integrated, and the hyperparameter order can be used to control whether to optimize separately or jointly. The paper recommends separate optimization.

import math
import random

import numpy as np
import tensorflow as tf
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.layers import Embedding, Input, Lambda
from tensorflow.python.keras.models import Model


def line_loss(y_true, y_pred):
    return -K.mean(K.log(K.sigmoid(y_true * y_pred)))


def create_model(numNodes, embedding_size, order='second'):
    v_i = Input(shape=(1,))
    v_j = Input(shape=(1,))

    first_emb = Embedding(numNodes, embedding_size, name='first_emb')
    second_emb = Embedding(numNodes, embedding_size, name='second_emb')
    context_emb = Embedding(numNodes, embedding_size, name='context_emb')

    v_i_emb = first_emb(v_i)
    v_j_emb = first_emb(v_j)

    v_i_emb_second = second_emb(v_i)
    v_j_context_emb = context_emb(v_j)

    first = Lambda(lambda x: tf.reduce_sum(
        x[0] * x[1], axis=-1, keepdims=False), name='first_order')([v_i_emb, v_j_emb])
    second = Lambda(lambda x: tf.reduce_sum(
        x[0] * x[1], axis=-1, keepdims=False), name='second_order')([v_i_emb_second, v_j_context_emb])

    if order == 'first':
        output_list = [first]
    elif order == 'second':
        output_list = [second]
    else:
        output_list = [first, second]

    model = Model(inputs=[v_i, v_j], outputs=output_list)
    return model, {
    
    'first': first_emb, 'second': second_emb}


def preprocess_nxgraph(graph):
    node2idx = {
    
    }
    idx2node = []
    node_size = 0
    for node in graph.nodes():
        node2idx[node] = node_size
        idx2node.append(node)
        node_size += 1
    return idx2node, node2idx


def create_alias_table(area_ratio):
    """
    :param area_ratio: sum(area_ratio)=1
    :return: accept,alias
    """
    l = len(area_ratio)
    accept, alias = [0] * l, [0] * l
    small, large = [], []
    area_ratio_ = np.array(area_ratio) * l
    for i, prob in enumerate(area_ratio_):
        if prob < 1.0:
            small.append(i)
        else:
            large.append(i)

    while small and large:
        small_idx, large_idx = small.pop(), large.pop()
        accept[small_idx] = area_ratio_[small_idx]
        alias[small_idx] = large_idx
        area_ratio_[large_idx] = area_ratio_[large_idx] - (1 - area_ratio_[small_idx])
        if area_ratio_[large_idx] < 1.0:
            small.append(large_idx)
        else:
            large.append(large_idx)

    while large:
        large_idx = large.pop()
        accept[large_idx] = 1
    while small:
        small_idx = small.pop()
        accept[small_idx] = 1

    return accept, alias


def alias_sample(accept, alias):
    """
    :param accept:
    :param alias:
    :return: sample index
    """
    N = len(accept)
    i = int(np.random.random() * N)
    r = np.random.random()
    if r < accept[i]:
        return i
    else:
        return alias[i]


class LINE:
    def __init__(self, graph, embedding_size=8, negative_ratio=5, order='second'):
        """
        :param graph:
        :param embedding_size:
        :param negative_ratio:
        :param order: 'first','second','all'
        """
        if order not in ['first', 'second', 'all']:
            raise ValueError('mode must be fisrt,second,or all')

        self.graph = graph
        self.idx2node, self.node2idx = preprocess_nxgraph(graph)
        self.use_alias = True

        self.rep_size = embedding_size
        self.order = order

        self._embeddings = {
    
    }
        self.negative_ratio = negative_ratio
        self.order = order

        self.node_size = graph.number_of_nodes()
        self.edge_size = graph.number_of_edges()
        self.samples_per_epoch = self.edge_size * (1 + negative_ratio)

        self._gen_sampling_table()
        self.reset_model()

    def reset_training_config(self, batch_size, times):
        self.batch_size = batch_size
        self.steps_per_epoch = ((self.samples_per_epoch - 1) // self.batch_size + 1) * times

    def reset_model(self, opt='adam'):
        self.model, self.embedding_dict = create_model(self.node_size, self.rep_size, self.order)
        self.model.compile(opt, line_loss)
        self.batch_it = self.batch_iter(self.node2idx)

    def _gen_sampling_table(self):
        # create sampling table for vertex
        power = 0.75
        numNodes = self.node_size
        node_degree = np.zeros(numNodes)  # out degree
        node2idx = self.node2idx

        for edge in self.graph.edges():
            node_degree[node2idx[edge[0]]] += self.graph[edge[0]][edge[1]].get('weight', 1.0)

        total_sum = sum([math.pow(node_degree[i], power) for i in range(numNodes)])
        norm_prob = [float(math.pow(node_degree[j], power))/total_sum for j in range(numNodes)]

        self.node_accept, self.node_alias = create_alias_table(norm_prob)

        # create sampling table for edge
        numEdges = self.graph.number_of_edges()
        total_sum = sum([self.graph[edge[0]][edge[1]].get('weight', 1.0) for edge in self.graph.edges()])
        norm_prob = [self.graph[edge[0]][edge[1]].get('weight', 1.0) * numEdges / total_sum for edge in self.graph.edges()]

        self.edge_accept, self.edge_alias = create_alias_table(norm_prob)

    def batch_iter(self, node2idx):
        edges = [(node2idx[x[0]], node2idx[x[1]]) for x in self.graph.edges()]
        data_size = self.graph.number_of_edges()
        shuffle_indices = np.random.permutation(np.arange(data_size))
        # positive or negative mod
        mod = 0
        mod_size = 1 + self.negative_ratio
        h = []
        t = []
        sign = 0
        count = 0
        start_index = 0
        end_index = min(start_index + self.batch_size, data_size)
        while True:
            if mod == 0:
                h = []
                t = []
                for i in range(start_index, end_index):
                    if random.random() >= self.edge_accept[shuffle_indices[i]]:
                        shuffle_indices[i] = self.edge_alias[shuffle_indices[i]]
                    cur_h = edges[shuffle_indices[i]][0]
                    cur_t = edges[shuffle_indices[i]][1]
                    h.append(cur_h)
                    t.append(cur_t)
                sign = np.ones(len(h))
            else:
                sign = np.ones(len(h)) * -1
                t = []
                for i in range(len(h)):
                    t.append(alias_sample(self.node_accept, self.node_alias))

            if self.order == 'all':
                yield ([np.array(h), np.array(t)], [sign, sign])
            else:
                yield ([np.array(h), np.array(t)], [sign])
            mod += 1
            mod %= mod_size
            if mod == 0:
                start_index = end_index
                end_index = min(start_index + self.batch_size, data_size)

            if start_index >= data_size:
                count += 1
                mod = 0
                h = []
                shuffle_indices = np.random.permutation(np.arange(data_size))
                start_index = 0
                end_index = min(start_index + self.batch_size, data_size)

    def get_embeddings(self, ):
        self._embeddings = {
    
    }
        if self.order == 'first':
            embeddings = self.embedding_dict['first'].get_weights()[0]
        elif self.order == 'second':
            embeddings = self.embedding_dict['second'].get_weights()[0]
        else:
            embeddings = np.hstack((self.embedding_dict['first'].get_weights()[0], self.embedding_dict['second'].get_weights()[0]))
        idx2node = self.idx2node
        for i, embedding in enumerate(embeddings):
            self._embeddings[idx2node[i]] = embedding
        return self._embeddings

    def train(self, batch_size=1024, epochs=1, initial_epoch=0, verbose=1, times=1):
        self.reset_training_config(batch_size, times)
        hist = self.model.fit_generator(self.batch_it, epochs=epochs, initial_epoch=initial_epoch, steps_per_epoch=self.steps_per_epoch, verbose=verbose)
        return hist

Execute training and output node embedding:

import networkx as nx

# 构建图
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (4, 8), (4, 9)])

model = LINE(G,embedding_size=16,order='second')
model.train(batch_size=1,epochs=10,verbose=2)
embeddings = model.get_embeddings()

# 打印节点的嵌入向量
for node in G.nodes():
  print(f"Node {
      
      node}: {
      
      embeddings[node]}")

LINE output

3. SDNE

SDNE does not use a random walk method but uses an autoencoder to simultaneously optimize the first-order and second-order similarity. The learned vector representation retains local and global structures and is robust to sparse networks.

As mentioned in the previous LINE algorithm, the first-order similarity characterizes the local similarity between paired nodes connected by edges. Two nodes in the network are considered similar if they are connected. The second-order similarity represents the similarity of the node neighborhood structure, which can represent the global network structure. Two nodes tend to be similar if they share many neighbors.
SDNE
Figure 2 Semi-supervised deep model SDNE framework. It mainly consists of two parts encoder and decoder. The encoder is the red circle in the picture, the decoder is the blue circle, and the green represents the representation vector of the node. The input xi x_i of the modelxiis the ii of the adjacency matrix of the graphLine i represents nodeiii and the connections of all nodes in the network, the input is mapped to the low-dimensional vector space through the encoder, and the low-dimensional vector representationyi ( K ) y_i^{(K)}yi(K), and then reconstructed back through the decoder. Therefore, the embedding representations learned by nodes with a large number of the same neighbors through the autoencoder network will be relatively similar, which maintains the second-order similarity of the graph network. In addition, for node pairs with direct edges, SDNE will constrain the embedding representations of two nodes to be close, thus maintaining the first-order similarity of the graph network.

The specific approach of SDNE is to use an autoencoder to preserve the first-order and second-order network proximity . It does this by jointly optimizing these two approximations. This method utilizes highly non-linear functions to obtain embeddings . The model consists of two parts: unsupervised and supervised . The former consists of an autoencoder that aims to find an embedding of a node that can reconstruct its neighborhood . The latter is based on a Laplacian feature map that penalizes when similar vertices map far from each other in the embedding space .

The loss function of SDNE is shown below and consists of three parts: first-order similarity loss , second-order similarity loss and L2 regularization term . L min = L 2 nd + α L 1 st + v L reg \mathcal{L}_{min}=\mathcal{L}_{2nd}+\alpha\mathcal{L}_{1st}+v\mathcal {L}_{reg}Lmin=L2 n d+αL1 st _+vLreg
L r e g = 1 2 ∑ k = 1 K ( ∣ ∣ W ( k ) ∣ ∣ F 2 + ∣ ∣ W ^ ( k ) ∣ ∣ F 2 ) \mathcal{L}_{reg}=\frac{1}{2}\sum_{k=1}^K(||W^{(k)}||_F^2+||\hat{W}^{(k)}||_F^2) Lreg=21k=1K(∣∣W(k)F2+∣∣W^(k)F2)
The first part is the first-order similarity loss between neighbor nodes:L 1 st = ∑ i , j = 1 nsi , j ∣ ∣ yi ( K ) − yj ( K ) ∣ ∣ 2 2 = ∑ i , j = 1 nsi , j ∣ ∣ yi − yj ∣ ∣ 2 2 \mathcal{L}_{1st}=\sum_{i,j=1}^ns_{i,j}||y_i^{(K)}-y_j^ {(K)}||_2^2=\sum_{i,j=1}^ns_{i,j}||y_i-y_j||_2^2L1 st _=i,j=1nsi,j∣∣yi(K)yj(K)22=i,j=1nsi,j∣∣yiyj22

Thisri si , j s_{i,j}si,jRepresents the connection relationship between two nodes (in an unweighted network, 1 means connected, 0 means no connection), yi y_iyiRepresents node iiThe embedding vector of i , which is the green node in the previous model. The purpose of the first-order similarity is very simple, which is to make the representation vectors between neighbor nodes closer, soin the loss function, it is hoped that the difference between them is smaller.

The second-order similarity, the simple understanding is to hope that the representation vectors between nodes with more similar neighbors are more similar , and its loss is composed of the reconstruction of the autoencoder: L 2 nd = ∑ i = 1 n ∣ ∣ ( x ^ i − xi ) ⊙ bi ∣ ∣ 2 2 = ∣ ∣ ( X ^ − X ) ⊙ B ∣ ∣ F 2 \mathcal{L}_{2nd}=\sum_{i=1}^n||( \hat{x}_i-x_i)\odot b_i||_2^2=||(\hat{X}-X)\odot B||_F^2L2 n d=i=1n∣∣(x^ixi)bi22=∣∣(X^X)BF2

Among them xi x_ixiis node iiThe adjacency vector of i , that is, take out the row value of the node in the adjacency matrix. The purpose of multiplying B hereis to distinguish the difference between 0 and 1 values, because the neighbor vector of a node may be like this: 0000100000000010001, where 0 accounts for most of it, and 1 has only a small part. In order to enhance the role of 1, let it be multiplied Take a value(5 is taken in the article). Whether it is Deepwalk, LINE, or Node2vec, their first-order and second-order similarity principles are similar. The first-order similarity is to hope that the similarity between connected nodes is higher, and then the second-order is to have more identical neighbor nodes. The representation vectors between nodes are somewhat similar.
SDNE
sample code

import torch
import numpy as np
import networkx as nx
import scipy.sparse as sparse


class SDNEModel(torch.nn.Module):
  def __init__(self, input_dim, hidden_layers, alpha, beta, device='cpu'):
    '''Structural Deep Network Embedding(SDNE)
    :param input_dim: 节点数量 node_size
    :param hidden_layers: AutoEncoder中间层数
    :param alpha: 对于1st_loss的系数
    :param beta: 对于2nd_loss中对非0项的惩罚
    :param device:
    '''
    super(SDNEModel, self).__init__()
    self.alpha = alpha
    self.beta = beta
    self.device = device
    input_dim_copy = input_dim
    layers = []
    for layer_dim in hidden_layers:
      layers.append(torch.nn.Linear(input_dim, layer_dim))
      layers.append(torch.nn.ReLU())
      input_dim = layer_dim
    self.encoder = torch.nn.Sequential(*layers)

    layers = []
    for layer_dim in reversed(hidden_layers[:-1]):
      layers.append(torch.nn.Linear(input_dim, layer_dim))
      layers.append(torch.nn.ReLU())
      input_dim = layer_dim
    # 最后加一层输入的维度
    layers.append(torch.nn.Linear(input_dim, input_dim_copy))
    layers.append(torch.nn.ReLU())
    self.decoder = torch.nn.Sequential(*layers)
  
  def forward(self, A, L):
    '''输入节点的邻接矩阵和拉普拉斯矩阵,主要计算方式参考论文
    :param A: adjacency_matrix, dim=(m, n)
    :param L: laplace_matrix, dim=(m, m)
    :return:
    '''
    Y = self.encoder(A)
    A_hat = self.decoder(Y)
    # loss_2nd 二阶相似度损失函数
    beta_matrix = torch.ones_like(A)
    mask = A != 0
    beta_matrix[mask] = self.beta
    loss_2nd = torch.mean(torch.sum(torch.pow((A - A_hat) * beta_matrix, 2), dim=1))
    # loss_1st 一阶相似度损失函数 论文公式(9) alpha * 2 *tr(Y^T L Y)
    loss_1st =  self.alpha * 2 * torch.trace(torch.matmul(torch.matmul(Y.transpose(0,1), L), Y))
    return loss_2nd + loss_1st


def process_nxgraph(graph):
  node2idx = {
    
    }
  idx2node = []
  node_size = 0
  for node in graph.nodes():
    node2idx[node] = node_size
    idx2node.append(node)
    node_size += 1
  return idx2node, node2idx


class Regularization(torch.nn.Module):
  def __init__(self, model, gamma=0.01, p=2, device="cpu"):
    '''
    :param model:构建好的模型
    :param gamma:系数
    :param p:当p=0表示L2正则化,p=1表示L1正则化
    '''
    super().__init__()
    if gamma <= 0:
      print("param weight_decay can not be <= 0")
      exit(0)
    self.model = model
    self.gamma = gamma
    self.p = p
    self.device = device
    self.weight_list = self.get_weight_list(model) # 取出参数的列表
    self.weight_info = self.get_weight_info(self.weight_list) # 打印参数的信息

  def to(self, device):
    super().to(device)
    self.device = device
    return self

  def forward(self, model):
    self.weight_list = self.get_weight_list(model)
    reg_loss = self.regulation_loss(self.weight_list, self.gamma, self.p)
    return reg_loss

  def regulation_loss(self, weight_list, gamma, p=2):
    reg_loss = 0
    for name, w in weight_list:
      l2_reg = torch.norm(w, p=p)
      reg_loss += l2_reg
    reg_loss = reg_loss * gamma
    return reg_loss

  def get_weight_list(self, model):
    weight_list = []
    # 返回参数的名字和参数自己
    for name, param in model.named_parameters():
      # 这里只取weight 未取bias
      if 'weight' in name:
        weight = (name, param)
        weight_list.append(weight)
    return weight_list

  def get_weight_info(self, weight_list):
    # 打印被正则化的参数的名称
    print("#"*10, "regulations weight", "#"*10)
    for name, param in weight_list:
      print(name)
    print("#"*25)


class SDNE(torch.nn.Module):
  def __init__(self, graph, hidden_layers=None, alpha=1e-5, beta=5, gamma=1e-5, device="cpu"):
    super().__init__()
    self.graph = graph
    self.idx2node, self.node2idx = process_nxgraph(graph)
    self.node_size = graph.number_of_nodes()
    self.edge_size = graph.number_of_edges()
    self.sdne = SDNEModel(self.node_size, hidden_layers, alpha, beta)
    self.device = device
    self.embeddings = {
    
    }
    self.gamma = gamma

    adjacency_matrix, laplace_matrix = self.__create_adjacency_laplace_matrix()
    self.adjacency_matrix = torch.from_numpy(adjacency_matrix.toarray()).float().to(self.device)
    self.laplace_matrix = torch.from_numpy(laplace_matrix.toarray()).float().to(self.device)
  
  def fit(self, batch_size=512, epochs=1, initial_epoch=0, verbose=1):
    num_samples = self.node_size
    self.sdne.to(self.device)
    optimizer = torch.optim.Adam(self.sdne.parameters())
    if self.gamma:
      regularization = Regularization(self.sdne, gamma=self.gamma)
    if batch_size >= self.node_size:
      batch_size = self.node_size
      print('batch_size({0}) > node_size({1}),set batch_size = {1}'.format(batch_size, self.node_size))
      for epoch in range(initial_epoch, epochs):
        loss_epoch = 0
        optimizer.zero_grad()
        loss = self.sdne(self.adjacency_matrix, self.laplace_matrix)
        if self.gamma:
          reg_loss = regularization(self.sdne)
          # print("reg_loss:", reg_loss.item(), reg_loss.requires_grad)
          loss = loss + reg_loss
        loss_epoch += loss.item()
        loss.backward()
        optimizer.step()
        if verbose > 0:
          print('Epoch {0}, loss {1} . >>> Epoch {2}/{3}'.format(epoch + 1, round(loss_epoch / num_samples, 4), epoch+1, epochs))
    else:
      steps_per_epoch = (self.node_size - 1) // batch_size + 1
      for epoch in range(initial_epoch, epochs):
        loss_epoch = 0
        for i in range(steps_per_epoch):
          idx = np.arange(i * batch_size, min((i+1) * batch_size, self.node_size))
          A_train = self.adjacency_matrix[idx, :]
          L_train = self.laplace_matrix[idx][:,idx]
          # print(A_train.shape, L_train.shape)
          optimizer.zero_grad()
          loss = self.sdne(A_train, L_train)
          loss_epoch += loss.item()
          loss.backward()
          optimizer.step()

        if verbose > 0:
          print('Epoch {0}, loss {1} . >>> Epoch {2}/{3}'.format(epoch + 1, round(loss_epoch / num_samples, 4), epoch + 1, epochs))

  def get_embeddings(self):
    if not self.embeddings:
      self.__get_embeddings()
    embeddings = self.embeddings
    return embeddings

  def __get_embeddings(self):
    embeddings = {
    
    }
    with torch.no_grad():
      self.sdne.eval()
      embed = self.sdne.encoder(self.adjacency_matrix)
      for i, embedding in enumerate(embed.numpy()):
        embeddings[self.idx2node[i]] = embedding
    self.embeddings = embeddings


  def __create_adjacency_laplace_matrix(self):
    node_size = self.node_size
    node2idx = self.node2idx
    adjacency_matrix_data = []
    adjacency_matrix_row_index = []
    adjacency_matrix_col_index = []
    for edge in self.graph.edges():
      v1, v2 = edge
      edge_weight = self.graph[v1][v2].get("weight", 1.0)
      adjacency_matrix_data.append(edge_weight)
      adjacency_matrix_row_index.append(node2idx[v1])
      adjacency_matrix_col_index.append(node2idx[v2])
    adjacency_matrix = sparse.csr_matrix((adjacency_matrix_data,(adjacency_matrix_row_index, adjacency_matrix_col_index)),shape=(node_size, node_size))
    # L = D - A  有向图的度等于出度和入度之和; 无向图的领接矩阵是对称的,没有出入度之分直接为每行之和
    # 计算度数
    adjacency_matrix_ = sparse.csr_matrix((adjacency_matrix_data+adjacency_matrix_data,(adjacency_matrix_row_index+adjacency_matrix_col_index,adjacency_matrix_col_index+adjacency_matrix_row_index)), shape=(node_size, node_size))
    degree_matrix = sparse.diags(adjacency_matrix_.sum(axis=1).flatten().tolist()[0])
    laplace_matrix = degree_matrix - adjacency_matrix_
    return adjacency_matrix, laplace_matrix



# 构建图
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (4, 8), (4, 9)])

model = SDNE(G, hidden_layers=[64, 16])
model.fit(batch_size=2, epochs=10)
embeddings = model.get_embeddings()

# 打印节点的嵌入向量
for node in G.nodes():
  print(f"Node {
      
      node}: {
      
      embeddings[node]}")

4. Node2vec

The Node2Vec algorithm has made a small improvement on the DeepWalk algorithm, mainly changing the process of random walk generation sequence, so that the homophily and structural equivalence of the graph nodes will be studied in the process of learning embedding. On balance , after obtaining the sequence, the skipGram algorithm is also used to learn the embedding representation of the node.

The meaning of homogeneity and structure can be explained from the figure below. Homogeneity means that two connected nodes should have similar embedding representations , such as node u and node S 1 S_1 in the figureS1If they are directly connected, their embedding should be closer. Structural means that two structurally similar nodes should have similar embedding representations , as shown in the figure, node u and node S 6 S_6S6If they are located at the center of the two clusters, the embeddings of the two nodes should be similar.
BFSvs DFS
Usually DFS is used to express the homogeneity of the network, and BFS is used to express the structure of the network.
DFS vs BFS
slave node vvv Jump to the next nodexxThe probability of x
is: π vx = α pq ( t , x ) ⋅ wvx \pi_{vx}=\alpha_{pq}(t,x)\cdot w_{vx}Pivx=apq(t,x)wvx
Among them wvx w_{vx}wvxIs the weight of the edge VX, α pq ( t , x ) \alpha_{pq}(t,x)apq(t,x)的定义如下:
α p q ( t , x ) = { 1 p  if  d t x = 0 1  if  d t x = 1 1 q  if  d t x = 2 \alpha _{pq}(t,x)=\begin{cases}\frac{1}{p} & \text{ if } d_{tx}=0 \\1 & \text{ if } d_{tx}=1 \\\frac{1}{q} & \text{ if } d_{tx}=2\end{cases} apq(t,x)= p11q1 if dtx=0 if dtx=1 if dtx=2
where dtx d_{tx}dtxIndicates the node ttt to nodexxx -distance:

  • If two nodes are the same node, dtx = 0 d_{tx}=0dtx=0
  • If two nodes are directly connected, dtx = 1 d_{tx}=1dtx=1
  • If two nodes are not connected, dtx = 2 d_{tx}=2dtx=2

The parameters p and q together control the propensity of the random walk:

  • The parameter p is called the return parameter (return parameter). The smaller p is, the higher the probability that the node returns t, the more inclined the graph traversal is to BFS, and the more it tends to represent the structure of the graph.
  • The parameter q is called an in-out parameter (in-out parameter). The smaller q is, the higher the probability of traversing to distant nodes is. The more inclined the graph traversal is to DFS, the more it tends to represent the homogeneity of the graph.
  • The tendency of the results expressed by graph embedding can be weighed by setting the values ​​of p and q. When p=1, q=1, the walk method is equivalent to the random walk in DeepWalk.

Use the above rules to generate a sequence, and then input the sequence into the model to calculate the Emedding of each node. The homogeneity and structure of the network embodied by Node2Vec can also be explained intuitively in the recommendation system. Items with the same homogeneity are likely to be items of the same category, same attribute, or items that are often purchased together, while items with the same structure are the hottest items of each category, the best single-order items of each category, etc. with similar trends or Items with structural properties.
homogeneity and heterogeneity
Both are very important feature expressions in the recommendation system. Due to the flexibility of Node2Vec and the ability to explore different features, we can splice embeddings with different tendencies to express results to obtain richer embedding features; we can also fuse embeddings generated by different Node2Vec into the subsequent deep learning network. To retain different feature information of items.

In short, node2vec is a graph embedding method that comprehensively considers DFS neighborhood and BFS neighborhood. In simple terms, it can be seen as an extension of deepwalk, which is a deepwalk that combines DFS and BFS random walks.

Node2vec core code:
node2vec
node2vec has a three-party library, which can be installed through pip:

pip install node2vec

Simple usage:

import networkx as nx
from node2vec import Node2Vec

# 构建图
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (4, 8), (4, 9)])

# Precompute probabilities and generate walks - **ON WINDOWS ONLY WORKS WITH workers=1**
node2vec = Node2Vec(G, dimensions=16, walk_length=10, num_walks=100, workers=4) # Use temp_folder for big graphs

# Embed nodes
model = node2vec.fit(window=10, min_count=0, batch_words=4)
from node2vec.edges import AverageEmbedder
from node2vec.edges import HadamardEmbedder
from node2vec.edges import WeightedL1Embedder
from node2vec.edges import WeightedL2Embedder

edges_embs = HadamardEmbedder(keyed_vectors=model.wv)
# Look for embeddings on the fly - here we pass normal tuples
print(edges_embs[('1', '2')])

Get the embedding between node 1 and node 2:
embedding
get the embeddings of each node:

embeddings = model.wv.vectors

for ix, node in enumerate(G.nodes()):
  print(f"Node {
      
      node}: {
      
      embeddings[ix]}")

embeddings

5. Struc2vec

Struc2Vec released in 2017 can be seen as extracting certain features on the graph from another angle. DeepWalk learns the similarity of neighbors, LINE learns the similarity of first-order neighbors and second-order neighbors, and Node2Vec learns the similarity and structure of neighbors. Similarity, but Node2Vec cannot actually learn sufficient structural similarity, because the number of steps of random walk is limited after all, if the distance between two nodes is very far, it is difficult to learn the so-called structural similarity, and struc2vec is directly aimed at learning structural similarity .
Structure
Its core idea is:

  1. The properties of nodes and edges and their positions in the network are ignored to evaluate the structural similarity between nodes, and only the local structure of nodes is considered . The intuitive criterion for judging the structural similarity of two nodes is: if the degrees of two nodes are the same, they are similar in structure; if the degrees of neighbors of two nodes are also the same, their structural similarity is higher .
    node degree
    Define R k ( u ) R_k(u)Rk( u ) represents the set of vertices in khop,S ( s ) S(s)S ( s ) represents an ordered sequence of set S degree,g ( D 1 , D 2 ) g(D_1,D_2)g(D1,D2) represents the distance between D1 and D2. f ( u , v ) f(u,v)f(u,v ) represents the similarity of u and v in khop neighbors.

  2. A hierarchical structure is built to measure the structural similarity of nodes : at the bottom layer, the structural similarity depends only on the node degree; at the top layer, the structural similarity depends on the information of the entire network.

The specific process is as follows:
First, construct a weighted multi-layer graph, for the graph G = ( V , E ) G=(V,E)G=(V,E ) , construct a weighted multi-layer graph M according to the following description:
The kth layer of M is a weighted undirected complete graph composed of a node set V, where the edge weight between each pair of nodes u and v is defined as wk (u , v ) = e − fk ( u , v ) w_k(u,v)=e^{-f_k(u,v)}wk(u,v)=efk( u , v ) (inversely proportional to the structural distance between them).
The corresponding relationship between the weights between layers is:
Weights
P means that in the next sampling, there is a probability of p to walk in this layer, and a probability of 1-p to switch between upstream and downstream. Therefore, the calculation formula of p can be divided into two situations:
(1) when roaming in this layer
P
(2) when switching between upstream and downstream
Upstream and downstream switching
In general:
overall

  1. Then the node walks through the above rules to obtain the walk sequence, and uses Skip-gram to generate embedding.

In simple terms, the steps of Struct2Vec are divided into three steps: 1. Obtain the vertex pair distance of each layer 2. Construct a weighted hierarchical graph based on the vertex pair distance 3. Randomly walk the sampled vertex sequence in the weighted hierarchical graph .
Struc2vec captures the structural information of the graph, and it has a better effect when the importance of its structure is greater than that of its neighbors.

Graph Embedding Methods

Regarding the way the whole graph is embedded, a representative graph2vec is introduced here .

1. Graph2vec

Graph embedding is a method of representing the entire graph with a vector. Graph2vec is also based on the idea of ​​skip-gram and encodes the entire graph into a vector space. Similar to document embedding doc2vec, doc2vec obtains the ID of the document in the input and predicts it by maximizing the document Random word likelihoods for training.
Graph2Vec
Graph2vec also consists of three steps :

  1. Sample and relabel all subgraphs in the graph . A subgraph is a group of nodes that appear around a selected node.
  2. Train the skip-gram model . Similar to document doc2vec, since the document is a word set and the graph is a subgraph set, at this stage, the skip-gram model is trained. It is trained to maximize the probability of predicting subgraphs present in the graph in the input.
  3. calculate . Embeddings are computed by providing at input a vector of id indices of subgraphs.

Word2Vec VS Graph2Vec

other methods

The above mentioned are a few commonly used methods, but there are many other methods and models worth learning

  • Node embedding: LLE, Laplacian Eigenmaps, Graph Factorization, GraRep, HOPE, DNGR, GCN, LINE
  • Graph embedding approaches: Patchy-san, sub2vec (embed subgraphs), WL kernel andDeep WL kernels

Summarize

This article combines ChatGPT and related codes to introduce what Graph embedding is, why Graph embedding is used, and the properties that need to be satisfied when embedding. It further briefly introduces several classic methods of node-level: DeepWalk, LINE, SDNE, etc. and graph2vec of graph-levle.

Note: All the embedding methods based on homogeneous graphs are introduced here! ! !

Relevant information

  1. Graph Embeddings — The Summary
  2. DeepWalk: algorithm principle, implementation and application
  3. DeepWalk principle and code practice
  4. LINE: algorithm principle, implementation and application
  5. Interpretation of LINE Principle of Graph Embedding Method
  6. Alias ​​Method: Discrete sampling method with time complexity O(1)
  7. SDNE (Structural Deep Network Embedding) theory and pytorch implementation
  8. Graph Embedding (2)
  9. Graph Embedding(三)
  10. Graph Embedding Principles and Applications of Graph Representation Learning
  11. Node2Vec showcase
  12. Node classification with weighted Node2Vec
  13. Node2Vec
  14. Struc2Vec: algorithm principle, implementation and application
  15. Struc2vec: learning node representations from structural identity

Guess you like

Origin blog.csdn.net/ARPOSPF/article/details/122556237