Read the literature (12) WWW2015-LINE: Large-scale Information Network Embedding

This article is: plain translation and understanding "LINE Large-scale Information Network Embedding" a text of the original article has been uploaded to a personal resource, if infringement instantly deleted.


Title

《LINE: Large-scale Information Network Embedding》
——WWW2015
Author: Jian Tang

to sum up

This paper presents LINE algorithm, which will interconnect between nodes regarded as a first-order similarity of their similarity order to establish a sequence of nodes two nodes (ie each node neighbors sequence), the similarity between them is two order of similarity.

By introducing the second-order similarity algorithm, similar to some properties but did not establish a connection node, after embedding it must represent closer together.

1 first-order second-order similarity

First-order Proximity: a pair of nodes u and v formed edges have corresponding weight w_u, v, the weighting is the first-order similarity.
Second-order Proximity: Order p_u = {w_u, 1, ... , w_u, | v |}, i.e., with all weights u constituting a similarity order. For p_u and p_v, the similarity of the two sequences is the second-order similarity.

2 LINE

First-order Proximity (acting only on undirected graphs):

For undirected edges (i, j), with a joint probability between nodes:
Here Insert Picture Description
wherein u_i nodes v_i low dimensional vector representation of formula (1), which is a posterior probability:
Here Insert Picture Description
the above objective function is minimized, there are:
Here Insert Picture Description
wherein d (·, ·) is the distance of two distributions, the KL divergence used herein are measured, and d are omitted by replacing some constants, then:
Here Insert Picture Description

Second-order Proximity (directed to be no effect):

Each node plays two roles: the context node itself and other nodes, nodes v_i, when acting as the other nodes in the context of its characterization set u_i '. For each edge (i, j), there are:
Here Insert Picture Description
| V | represent the number of nodes in the context node, the formula (4) defines the conditional distribution p (· | v_i), i.e., the total set of nodes in the network, the above objective function minimizing are:
Here Insert Picture Description
Because of the importance of different nodes in a network may be different, the function to represent the impact of the introduction of λ_i v_i can be measured by the degree or algorithms. Of p2, which is the empirical distribution:
Here Insert Picture Description
N (I) is a node of the neighbor set v_i, algorithms so λ_i = d_i, instead of by using the KL divergence of formula (5) D (), are:
Here Insert Picture Description

3 model optimization

Optimization of the formula (6) requires calculation for the entire set of nodes and adding, computationally intensive, so the introduction of a negative sample, there:
Here Insert Picture Description
is also introduced into the formula (3) in the form of negative sampling of formula (7), of the formula ( 7) using asynchronous stochastic gradient descent, graded as follows:
Here Insert Picture Description
the choice of learning rate will affect the size of the gradient. According to the weight of the smaller sides to determine the learning rate gradient explosion, according to the weight of the larger side of the gradient is too small.

To solve this problem, the article first select the weighting edge expands to multiple binary edge. However, this will lead to excessive memory requirements, especially the right side is too large weight. Thus, the original sample from the sides and edges binary processing, and re-sampling to determine the probability of the side edges according to the weights.

4 issues discussed

Low degree vertices:
is the low accuracy (in) embedded node degrees, the article second order extended neighbor nodes, i.e., between the right neighbor of neighbors, neighbor node v j its second order weight:
Here Insert Picture Description
New the vertices:
for newly arrived node representation, if it is known in the prior node is connected, the empirical distribution can be obtained. By the formula (3) or (6), to obtain a new node is embedded, to minimize the objective function are:
Here Insert Picture Description
if the node is not connected to any known, it is necessary to rely on additional information, such as text node information.

Published 13 original articles · won praise 19 · views 10000 +

Guess you like

Origin blog.csdn.net/CSDNTianJi/article/details/104537980