Generate Graph Embedding from Graph Structure Data

We know that as long as it is an item that can be represented by sequence data, Embedding can be trained through the Item2vec method. However, Internet data is not just as simple as sequence data, and more and more data are displayed in the form of graphs. At this time, the Embedding method based on sequence data appears to be "not enough". But it is a pity to abandon graph structure data in recommendation systems, because graph data contains a lot of very valuable structural information. The following focuses on the Embedding method based on the graph structure, which is also called Graph Embedding .

Graph Structured Data in the Internet

insert image description here
insert image description here
insert image description here
In fact, graph-structured data is almost ubiquitous in the Internet, and the most typical example is the social network we use every day (Figure a). From the social network, we can discover opinion leaders and communities, and then make social recommendations based on these "social" characteristics. If we can encode the nodes in the social network with Embedding code, the process of social recommendation will be very convenient. .
Knowledge graph is also a very hot research and application direction recently. As described in Figure b, the knowledge graph contains different types of knowledge subjects (such as people, places, etc.), attributes attached to the knowledge subjects (such as character descriptions, item characteristics), and the relationship between the subject and the subject, the subject and the relationship between attributes. If we can embedding the subject in the knowledge graph, we can discover the potential relationship between the subject, which is very helpful for the recommendation system based on content and knowledge.

Another very important type of graph data is behavioral relationship graph data. This kind of data exists in almost all Internet applications, and it is actually a " bipartite graph " (also known as a bipartite graph, as shown in Figure c) composed of users and items . The interaction between users and items generates a graph of behavioral relationships. With such a relationship graph, we can naturally use Embedding technology to discover the relationship between items and items, between users and users, and between users and items, so as to apply to the further recommendation of the recommendation system.

A bipartite graph, also known as a bipartite graph, is a special model in graph theory. Let G=(V,E) be an undirected graph, if the vertex V can be divided into two mutually disjoint subsets (A,B), and each edge (i, j) in the graph is associated with two vertices i and j belong to these two different vertex sets (i in A, j in B), then the graph G is called a bipartite graph. In simple terms, a graph is bipartite if the points in the graph can be divided into two groups such that all edges cross the group boundaries. To be precise: Divide the vertices of a graph into two disjoint subsets such that each edge connects vertices in the two sets respectively. If such a partition exists, the graph is a bipartite graph.

There is no doubt that graph data is of great value. If the nodes in the graph can be embedded, it will be a very valuable feature for the recommendation system. Next, let's get to the point, let's learn the Graph Embedding method based on graph data.

Graph Embedding method based on random walk: Deep Walk

Let's first learn a Graph Embedding method that has great influence in the industry and is widely used, Deep Walk, which was proposed by researchers at Stony Brook University in 2014. Its main idea is to perform random walks on the graph structure composed of items to generate a large number of item sequences, and then input these item sequences as training samples into Word2vec for training, and finally get the Embedding of the items. Therefore, DeepWalk can be regarded as a transitional method for connecting sequence Embedding and Graph Embedding. The figure below shows the execution process of the DeepWalk method.
insert image description here
Next, explain in detail the algorithm flow of DeepWalk.
First, we build an item relationship graph (figure b) based on the original user behavior sequence (figure a), such as the user’s sequence of purchasing items, watching a video sequence, and so on. From it, we can see that because user U 1 purchased item A and item B successively, a directed edge from A to B is generated. If multiple same directed edges are subsequently generated, the weight of the directed edges is strengthened. After converting all user behavior sequences into edges in the item correlation graph, a global item correlation graph is established.
Then, we use a random walk to randomly select the starting point to regenerate the item sequence (Figure c). Among them, the number and length of random walk sampling are all hyperparameters, which need to be adjusted according to specific applications.
Finally, we input the item sequences generated by these random walks into the Word2vec model in Figure d to generate the final item Embedding vector.
In the algorithm flow of DeepWalk above, the only thing that needs to be defined formally is the jump probability of the random walk, that is, after reaching the node v i, the next step is to traverse the adjacent point v j of v iThe probability. If the item relationship graph is a directed and weighted graph, then the probability of jumping from node v i to node v j is defined as follows:
insert image description here
where, N + (vi) is the set of all outgoing edges of node v i , M ij is the node v i The weight of the edge to node v j , that is, the jump probability of DeepWalk, is the ratio of the weight of the jump edge to the sum of the weights of all related outgoing edges. If the item-related graph is an undirected and unweighted graph, then the jump probability will be a special case of the above formula, that is, the weight M ij will be a constant 1 , and N + (vi) should be the set of all "edges" of the node v i , not the set of all "outsides".
Then get a new item sequence through random walk, we can generate item Embedding in the classic Word2vec way.

A way to balance homogeneity and structure: Node2vec

In 2016, researchers at Stanford University went a step further on the basis of DeepWalk, and they proposed the Node2vec model. Node2vec allows the results of Graph Embedding to be weighed in the Homophily and Structural Equivalence of the network by adjusting the random walk jump probability , and can further input different Embedding into the recommendation model, so that the recommendation system Learn different network structure characteristics.
The "homogeneity" of the network mentioned here means that the embedding of nodes with similar distances should be as close as possible. As shown in the figure below, the embedding expressions of node u and its connected nodes s1, s2, s3, and s4 should be close , which is the embodiment of the "homogeneity" of the network. On e-commerce websites, homogeneous items are likely to be items of the same category, same attribute, or items that are often purchased together.
And "structural" means that the Embedding of structurally similar nodes should be as close as possible. For example, node u and node s6 in the figure below are the central nodes of their respective local area networks. They are similar in structure, so their Embedding expression should also be Approximately, this is the embodiment of "structural". On e-commerce websites, structurally similar items are generally items with similar trends or structural attributes, such as hot items in various categories, best-ordered items, etc.
insert image description here
So the question is, how does the result of Graph Embedding express structure and homogeneity?
First of all, in order to enable the results of Graph Embedding to express the "structure" of the network, in the process of random walk, we need to make the walk process more inclined to BFS (Breadth First Search, breadth first search), because BFS will be more Walking and traversing in the neighborhood of the current node in many places is equivalent to performing a "microscopic scan" of the network structure around the current node. Whether the current node is a "local central node", an "edge node", or a "connectivity node", the number and order of nodes contained in the generated sequence must be different, so that the final Embedding can capture more structures sexual information.
In order to express "homogeneity", the random walk should be more inclined to DFS (Depth First Search, depth-first search), because DFS is more likely to walk to distant nodes through multiple jumps. But in any case, the DFS walk is more likely to be carried out within a large group, which makes the Embedding of nodes within a group or community more similar, thereby expressing more of the "homogeneity" of the network. The embedding of DFS paths starting from two central points will be more similar. The comparison is not the nodes, but the embedding distance of the paths generated by the nodes according to DFS.

So in the Node2vec algorithm, how to control the tendency of BFS and DFS?

In fact, it mainly controls the tendency of jumping through the jump probability between nodes. The figure below shows the jump probability of Node2vec algorithm jumping from node t to node v, and then jumping from node v to surrounding points. Here we should pay attention to the characteristics of these nodes. For example, node t is the node visited in the previous step of the random walk, node v is the node currently visited, nodes x 1 , x 2 , and x 3 are non-t nodes connected to v, but node x 1 is also connected to node t, These different characteristics determine the probability of the next jump during the random walk.
insert image description here
We can also use specific formulas to express these probabilities, the probability of jumping from the current node v to the next node x π vx ​=α pq ​(t,x)⋅ω vx​ , where ω vx is the original edge vx Weight, α pq ​(t,x) is a jump weight defined by Node2vec. Whether to prefer DFS or BFS is mainly related to the definition of the jump weight.
insert image description here
α pq ​(t,x) The d tx in (t,x) refers to the distance from node t to node x. For example, node x 1 is actually directly connected to node t, so the distance d tx is 1, from node t to node t. The distance d tt is 0, and x 2, x 3 these nodes not connected with t, d tx is 2.

Furthermore, the parameters p and q in α pq ​(t,x) jointly control the propensity of the random walk. The parameter p is called the return parameter (Return Parameter). The smaller p is, the more likely it is that the random walk returns to node t, and Node2vec pays more attention to expressing the structure of the network. The parameter q is called the in-out parameter (In-out Parameter). The smaller the q, the greater the possibility of random walk to distant nodes. Node2vec pays more attention to expressing the homogeneity of the network. On the contrary, the current node is more likely to walk around nearby nodes. This is the problem of random walk propensity.

The characteristics of Node2vec's flexible expression of homogeneity and structure have also been confirmed by experiments. We can make it produce different Embedding results by adjusting the p and q parameters. The left picture in the figure below shows that Node2vec pays more attention to homogeneity. From it, we can see that the colors of nodes with similar distances are closer, while the right picture shows more emphasis on structure, and the colors of nodes with similar structural characteristics are closer. for close.
insert image description here
The homogeneity and structure of the network embodied by Node2vec are very important feature expressions in the recommendation system. Due to the flexibility of Node2vec and the ability to explore different graph features, we can input the "structural" Embedding results and the "homogeneity" Embedding results generated by different Node2vec into the deep learning network to preserve Different map feature information of items.

How is Embedding applied in the feature engineering of the recommendation system?

We have learned several mainstream Embedding methods, including Embedding methods for sequence data, Word2vec and Item2vec, and Embedding methods for graph data , Deep Walk and Node2vec.
Since the output of Embedding is a numerical feature vector, Embedding technology itself can be regarded as a kind of feature processing method. But unlike simple One-hot encoding, Embedding is a higher-order feature processing method, which has the ability to integrate sequence structure, network structure, and even other features into a feature vector.
There are roughly three ways to apply Embedding in the recommendation system, namely "direct application", "pre-training application" and "End2End application". Among them, "
direct application " is the simplest, that is, after we get the Embedding vector, directly use the Embedding vector Similarity implements some of the functionality of recommender systems. Typical functions include, using the similarity between item embeddings to realize similar item recommendation, using the similarity between item embedding and user embedding to realize classic recommendation functions such as "guess you like it", and using item embedding to realize the recall layer in the recommendation system, etc. .
" Pre-training application " means that after we have pre-trained the embedding of items and users, we do not apply them directly, but use these embedding vectors as part of the feature vectors, splicing them with the rest of the feature vectors, and participating in the input of the recommendation model train. Doing so can better introduce other features, allowing the recommendation model to make more comprehensive and accurate predictions.
The third application is called " End2End application". Its full name is "End to End Training", that is, end-to-end training. It means that we do not pre-train Embedding, but combine Embedding training with deep learning recommendation models in a unified, end-to-end manner. Train together to directly get the recommendation model containing the Embedding layer. This method is very popular. For example, the following figure shows three classic models containing the Embedding layer, namely Microsoft's Deep Crossing, UCL's FNN and Google's Wide&Deep.
insert image description here

Summarize

insert image description here

Guess you like

Origin blog.csdn.net/Edward_Legend/article/details/121471952