Network Motif 文献调研

Network Motif 文献调研

概述：Network motifs，可以认为是网络中频繁出现的子图模式，是复杂网络的"构建块"。有两篇发表在science上的论文给出motif比较权威的解释：① MILO, Ron, et al将motifs描述为：recurring, significant patterns of inter-connections. ②BENSON et al. 将motif描述为：The most common higher-order structures are small network subgraphs, which we refer to as network motifs. Network motifs are considered building blocks for complex networks. Motifs在网络或图上有许多应用，如角色发现，链路预测，蛋白质识别等等。为了便于了解motif的研究进展，进行了相关的文献调研。

文献调研涉及以下文献：

[1] MILO, Ron, et al. Network motifs: simple building blocks of complex networks[J]. Science, 2002, 298.5594: 824-827.

[2] BENSON, Austin R.; GLEICH, David F.; LESKOVEC, Jure. Higher-order organization of complex networks[J]. Science, 2016, 353.6295: 163-166.

[3] Kashtan N, Itzkovitz S, Milo R, et al. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs[J]. Bioinformatics, 2004, 20(11): 1746-1758.

[4] Schreiber F, Schwöbbermeyer H. MAVisto: a tool for the exploration of network motifs[J]. Bioinformatics, 2005, 21(17): 3572-3574.

[5] Wernicke S, Rasche F: FANMOD: a tool for fast network motif detection[J]. Bioinformatics 2006, 22:1152-1153.

[6] Kashani Z R M, Ahrabian H, Elahi E, et al. Kavosh: a new algorithm for finding network motifs[J]. BMC bioinformatics, 2009, 10(1): 318.

[7] Zhang Y, Parthasarathy S. Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks[C]. international conference on data engineering, 2012: 1049-1060.

[8] Lin W, Xiao X, Xie X, et al. Network motif discovery: A GPU approach[C]. international conference on data engineering, 2015: 831-842.

[9] Wang T, Peng J, Peng Q, et al. FSM: Fast and scalable network motif discovery for exploring higher-order network organizations[J]. Methods, 2020: 83-93.

[10] Wang L, Ren J, Xu B, et al. MODEL: Motif-Based Deep Feature Learning for Link Prediction[J]. IEEE Transactions on Computational Social Systems, 2020: 1-14.

[11] Yu Y, Lu Z, Liu J, et al. RUM: Network Representation Learning Using Motifs[C]. international conference on data engineering, 2019: 1382-1393.

[12] Monti F, Otness K, Bronstein M M, et al. MotifNet: a motif-based Graph Convolutional Network for directed graphs[J]. arXiv: Learning, 2018.

[13] Rossi R A, Ahmed N K, Koh E, et al. A structural graph representation learning framework[C]. web search and data mining, 2020: 483-491.

[14] Xia F, Wei H, Yu S, et al. A Survey of Measures for Network Motifs[J]. IEEE Access, 2019: 106576-106587.

文献[1][2]是motif方面比较权威的论文；[3]-[7]是如何找motif的论文；[8][9]是关于motif的拓展；[10]-[13]是利用motif做网络表示学习的论文。[14]是motif相关指标介绍的论文。下面是部分论文的简介。

文献[1]: MILO, Ron, et al. Network motifs: simple building blocks of complex networks[J]. Science, 2002, 298.5594: 824-827.

概述：本篇论文发表在science上，较为权威，是最早提出motifs概念的论文之一。本文将motifs描述为：recurring, significant patterns of inter-connections. 但是论文没有给出motifs的公式化定义，而是基于统计的观点，即network motifs 是出现频繁的子图，对比随机网络有更高的出现概率。这里的随机网络是与真实网络节点数目相同，且“入边”与“出边”个数相同的网络，用来作为真实网络的对比。(The network motifs are those patterns for which the probability P of appearing in a randomized network an equal or greater number of times than in the real network is lower than a cutoff value.) 然后文章中采用z score来衡量motif出现的频繁程度。zscore越大，代表子图越有可能是motif。公式如右：。几个真实网络示例如下图。同时也有给出pvalue指标，则是越小越好。更多细节见论文。

文献[2]: BENSON, Austin R.; GLEICH, David F.; LESKOVEC, Jure. Higher-order organization of complex networks[J]. Science, 2016, 353.6295: 163-166.

概述：本篇论文发表在science上；论文描述了什么是network motifs, 即最常用的高阶子图，它是复杂网络的构建块。(The most common higher-order structures are small network subgraphs, which we refer to as network motifs. Network motifs are considered building blocks for complex networks.) 论文没有给出公式定义，也是通过描述结合图示描述motifs，如下图所示。

文中提出的框架：

对于motif的使用：(没有说怎么生成motifs,而是直接使用给定的motif M来生成权值矩阵，矩阵中元素的值是节点i和j在M中出现的次数)。这里说的给定motif,是说给定一种类型的motif，然后根据这种类型的motif从原始网络中找到符合的所有子图。其中会涉及到子图非同构的问题，本文没有详细说明。根据给定的motif M建立权值矩阵后，再使用专门的目标函数进行图切分，以产生聚类结果，如下面的公式所示。

更多细节详见论文。

小结：如果M表示motif类型，Mi表示一种类型的motif，则一个图上，一种motif Mi是对应很多子图的；根据该类型可以找图上的符合Mi的所有子图；这些子图可以看作是原始网络的子集。

文献[3]-[6]: 如何找network motif，包括枚举、识别、分类等问题。

[3] Kashtan N, Itzkovitz S, Milo R, et al. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs[J]. Bioinformatics, 2004, 20(11): 1746-1758.

[4] Schreiber F, Schwöbbermeyer H. MAVisto: a tool for the exploration of network motifs[J]. Bioinformatics, 2005, 21(17): 3572-3574.

[5] Wernicke S, Rasche F: FANMOD: a tool for fast network motif detection[J]. Bioinformatics 2006, 22:1152-1153.

[6] Kashani Z R M, Ahrabian H, Elahi E, et al. Kavosh: a new algorithm for finding network motifs[J]. BMC bioinformatics, 2009, 10(1): 318.

文献[3]提出了一种MFinder的算法和相应的软件；文献[4]给出了一种MAVisto的工具；文献[5]给出了一种FANMOD的工具；文献[7]提出了一种Kavosh的算法。上述的这些算法或工具都是用于motif discovery。相关软件或代码已经开源，见下面的链接：

l http://www.weizmann.ac.il/mcb/UriAlon/download/network-motif-software

l https://github.com/shmohammadi86/Kavosh

以文献[6]为例，介绍一下发现motif的流程。

文献[6]提出了一种Kavosh的算法，用于发现k-size network motifs,对比其他算法具有较低的内存和时间的开销( We present a new algorithm (Kavosh), for finding k-size network motifs with less memory and CPU time in comparison to other existing algorithms)。该算法有四个子任务，即子图枚举、子图分类、随机图生成、motif识别。(consists of four subtasks: Enumeration: finding all subgraphs of a given size that occur in the input graph; Classification: classifying each found sub-graph into isomorphic groups; Random graph generation: generating random graphs with respect to the input network (enumeration and classification are also performed on random graphs) and Motif identification: distinguishing motifs among all found sub-graphs on basis of statistical parameters.)。算法具体可以分为4步，如下：

1) 子图枚举：利用树型结构，找出所有的k-size子图。

2) 子图分类：即给k-size子图做标记，以互相区别。具体做法是就是根据子图生成邻接矩阵，输入到NAUTY,产生canonical labeling，作为子图的分类鉴别。其中，NAUTY是用于识别非同构图的一种知名的算法。

3) 产生随机图：这里的随机图是与输入图具有相同的节点数以及相同的度分布(包括入度、出度)的随机产生的图。实际使用中往往产生若干个随机图，然后同样进行枚举、分类的操作。

4) motif识别：根据真实网络的子图的统计数据，与随机网络的子图的平均统计数据，计算一些指标（Zscore, Pvalue等），从而得出频繁程度高的子图作为motif。

文献[7]-[9]: 是一些关于motif的优化和拓展。

[7] Zhang Y, Parthasarathy S. Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks[C]. international conference on data engineering, 2012: 1049-1060.

文献[7]发表于ICDE上，是数据挖掘的顶级会议。论文提出被称为K-Core和Triangle K-Core的motif。其中K-Core指的是子图中的每个节点的度数至少为K；Triangle K-Core指的是这样一种子图，子图中每条边都至少关联了K个三角形结构，如下图所示。文章说明了Triangle K-Core比K-Core更接近于clique，即Triangle K-Core内部的节点关联更为紧密。例子：an edge participating in 4 triangles implies a subgraph of 6 nodes and 9 edges (in the worst case)；and it is closer to a 6-node clique (density: 9/15=60%, in the worst case)。这种Triangle K-Core的motif可以作为一种较好的高阶的局部结构，而不用像clique那样有极其严格的连接要求。

[8] Lin W, Xiao X, Xie X, et al. Network motif discovery: A GPU approach[C]. international conference on data engineering, 2015: 831-842.

[9] Wang T, Peng J, Peng Q, et al. FSM: Fast and scalable network motif discovery for exploring higher-order network organizations[J]. Methods, 2020: 83-93.

文献[10]-[11]: 是利用motif做network embedding的一些论文。

[10] Wang L, Ren J, Xu B, et al. MODEL: Motif-Based Deep Feature Learning for Link Prediction[J]. IEEE Transactions on Computational Social Systems, 2020: 1-14.

概述：根据motif从G中抽取子图，得到一个motif（多个节点）和一个负样本节点。然后每个节点都进入自编码器（由nonlinear activation functions堆叠而成）中进行学习。设计特定的优化函数，优化的目标是让motif内节点的向量更加靠近，让motif的节点与负样本节点更加远离。最终获取节点嵌入，并用于链路预测上。

模型框图如下：

大概的思路是，根据给定的motif从G中抽取子图，得到一个motif（多个节点）和一个负样本节点。然后每个节点都进入自编码器中进行学习。学习的目标是让motif内节点的向量更加靠近，让motif的节点与负样本节点更加远离。损失函数如下。

评价：本文中，motif的发现是引用前人的工作，自编码器的构建也是前人的工作，创新点体现在两者的结合上，以及多编码器上。论文的任务是链路预测。（This method seamlessly incorporates motifs and deep learning model into link prediction.）

[11] Yu Y, Lu Z, Liu J, et al. RUM: Network Representation Learning Using Motifs[C]. international conference on data engineering, 2019: 1382-1393.

概述：本文发表在ICDE2019上，提出了一个使用motif做网络表示学习的模型RUM。在文中提出了两种策略，即MotifWalk 和 MotifRe-weighting，用来形成motif-aware network embeddings. 示意图如下图所示，是一种网络粗化的策略，与随机游走比较相关。

本文未给出motif的定义，而是图示。本文对motif的介绍是：A network motif is a small sub-network that represents an elemental and recurring pattern in a network. [20] has given a list for the shapes of motifs. 论文主要对三角结构进行处理，这样做时间开销比较小。

[12] Monti F, Otness K, Bronstein M M, et al. MotifNet: a motif-based Graph Convolutional Network for directed graphs[J]. arXiv: Learning, 2018.

概述：将motif与GCN结合而提出了MotifNet模型。论文对于motif的介绍如下图所示。论文给出了motif粗略的定义，且基于motif产生权值矩阵，也是直接使用了Benson et al.的内容。论文将它与GCN结合做网络表示学习。

[13] Rossi R A, Ahmed N K, Koh E, et al. A structural graph representation learning framework[C]. web search and data mining, 2020: 483-491.

概述：论文利用motif生成K阶的权值矩阵，然后利用矩阵分解类似的原理做图表示学习。

Network Motif 文献调研

猜你喜欢