The cute learning of graph models-basic concepts (1)

1. Basic concepts of graphs

  • node

    Nodes can be divided into two categories: hidden nodes and observation nodes. Edges can be divided into directed or undirected edges. From the point of view of probability theory, the probability graph model is a probability distribution. The nodes in the graph correspond to random variables, and the edges correspond to the correlation between random variables. Given a practical problem, we usually observe some data and hope to dig out the knowledge implicit in the data. So how can we use probabilistic graph models to mine these hidden knowledge? Under normal circumstances, we will construct a graph: use observation nodes to represent the observed data, use hidden nodes to represent potential knowledge, and use edges to describe the relationship between knowledge and data, and finally obtain a probability distribution. After the probability distribution is given, knowledge is acquired by performing two tasks: inference (given the observation node, infer the posterior distribution of the hidden node) and learning (learn the parameters of the probability distribution)

    Want to learn about probabilistic graphical models? You must first understand the basic definition and form of graph theory: https://zhuanlan.zhihu.com/p/26133450

  1. side

    • Directed
      • direction
        • Out-degree: self-pointing to others
        • In-degree: others point to oneself
      • Connectivity
        • Strongly connected: A can reach B, B can reach A (A points to B, B points to C, C points to A, that is, there is a strong link between AB); if a part of the nodes can form a strong link, this part of the node Can form a "strongly connected component" (Strongly connected components)
        • Weakly connected: not a strong connection, that is, a weak connection
    • Undirected
  2. Directed graph is also called Bayesian network (directed acyclic), also called belief network or belief network; undirected graph is also called Markov network.

    Probabilistic graphical model system: HMM, MEMM, CRF: https://zhuanlan.zhihu.com/p/33397147

  3. Bipartite graph: two subjects, both internal and external connections

2. Specially structured network

Random graph: Random graph refers to the graph produced by random process https://zh.wikipedia.org/wiki/%E9%9A%8F%E6%9C%BA%E5%9B%BE

Scale-free network: https://zh.wikipedia.org/wiki/%E6%97%A0%E5%B0%BA%E5%BA%A6%E7%BD%91%E7%BB%9C

3. Some indicators to measure the network

  1. Degree distribution: degree distribution

    The basic concept of degree: how many edges are connected to a node, its degree is.

    Degree distribution: if there is a link between two nodes, the degree is 1, otherwise it is 0. Therefore, there are n-1 connection relations between n nodes, which obey the binomial distribution. When n takes positive infinity, the degree distribution obeys a power-law distribution https://zh.wikipedia.org/wiki/%E5%BA%A6%E5%88%86%E5%B8%83

  2. Path length, diameter of the diagram: path length

    Concept: the minimum distance between any two points in the graph, where d(u,v) in the directed graph is not necessarily equal to d(v,u)

  3. Clustering coefficient (accumulation factor): clustering coefficient

    The calculation formula is:, N个节点的总边数/N个节点之间可以连接的最大边数where the maximum number of sides is generallyN的阶乘

  4. Connected components: connected components

4. Graph representation learning based on graph structure

The core idea is to use graph theory, data mining and other methods to make the learned vector retain as much topology information as possible in the graph. The previous idea is to use 1-hot or n-hot, but the complexity is high and easy to combine and explode (dimension Explosion), so now we mainly use random walk sampling to obtain a large number of sequences (vectors) to represent the graph.

deepwalk

Use random walk method to jump with equal probability

node2vec

Use artificial weights to replace equal-probability jumps

struc2vec

According to the existing graph, construct a new graph on the premise of retaining local features, and then do a random walk

metapath2vec

Learning features in heterogeneous graphs

5. Graph representation learning based on graph features

GCN

GNN

  1. Basic gnn
  2. self-loops GNN

Reference

Random graph:

  1. http://www.qzu5.com/r.htm
  2. https://zh.wikipedia.org/wiki/%E9%9A%8F%E6%9C%BA%E5%9B%BE
  3. https://blog.csdn.net/qq_34213260/article/details/107472115

Guess you like

Origin blog.csdn.net/weixin_35757704/article/details/114821610