higher-order organization of complex networks摘要

网络中，高阶链接模式是控制和调节复杂系统的基本结构，大部分高阶结构是指一个小的子图，这种小的子图是复杂系统的建筑块。例如，正反馈回路是调控网络的关键要素，三元组是社交网络的关键，双向开三角结构是大脑hub节点的关键，开三角结构是航空网络的关键模式。这里介绍高阶结构，并提出一种聚类框架。
给定一个网络模块 $M$ ，寻找一种聚类 $S$ 以满足两种目标。首先，节点应参与尽量多的模块 $M$ ，其次集合 $S$ 应避免破坏模块 $M$ 。即，给定模块 $M$ ，高阶聚类方法的目标是最小
$\phi_M(S)=cut_M(S,\overline{S})/min[vol_M(S),vol_M(\overline{S})]\tag{1}$
$cut_M(S,\overline{S})$ 是被分割开来的 $M$ 的个数， $vol_M(S)$ 是 $S$ 中 $M$ 的节点数。算法借鉴谱聚类的方法，算法流程是
1.给定一个网络和一种模块 $M$ ，算出矩阵 $W_M$ ，其中元素 $(i, j)$ 表示节点 $i, j$ 共现于 $M$ 中的次数。
2.计算由 $W_M$ 产生的拉普拉斯谱排序 $\sigma$ 。
3.找到 $\sigma$ 排序的前 $r$ 个组成的集合作为 $S_r=\{\sigma_1,\dots,\sigma_r\}$ ，依据是 $S:=argmin_r\phi_M(S_r)$ 。
我们的方法统一了模块分析和网络聚类这两个基础方向，揭示了组织结构和模块。之前的分析没有给出最差情况下的聚类保证，以及随网络规模增加而导致的复杂度，补充材料里的理论结果说明了超图聚类方法对于特殊的有向图是更加通用的。
考虑无向图 $G = (V, E)$ ， $∣ V ∣ = n$ ，进一步假设没有独立点， $W$ 记为图的权重，对角矩阵 $D$ 记为 $D_{ii}=\sum_{j=1}^nW_{ij}$ ，拉普拉斯矩阵为 $L = D - W$ 。对于集合 $S$ ，则 $\phi^{(G)}(S)$ 为
$\phi^{(G)}(S)=cut^{(G)}(S,\overline{S})/min(vol^{(G)}(S),vol^{(G)}(\overline{S}))\tag{3}$
$cut^{(G)}(S,\overline{S})=\sum_{i\in S,j\in\overline{S}}W_{ij}\tag{4}$
$vol^{(G)}(S)=\sum_{i\in S}D_{ii}\tag{5}$
$cut^{(G)}(S,\overline{S})=weighted\ sum\ of\ edges\ that\ are\ cut\tag{6}$
$vol^{(G)}(S)=weighted\ number\ of\ edge\ end\ points\ in\ S\tag{7}$
$x$ 表示集合 $S$ 的向量， $x_i=1$ 表示节点 $i$ 在 $S$ 中， $x_i=0$ 表示节点 $i$ 在 $\overline{S}$ 中，若边 $(i, j)$ 被分开，则 $x_i-x_j)^2=1$ ，反之 $x_i-x_j)^2=0$ ，于是
$x^TLx=cut^{(G)}(S,\overline{S})\tag{9}=\sum_{(i,j)\in E}w_{ij}(x_i-x_j)^2$
在这里定义 $(B,\mathcal{A})$ ，其中 $B$ 是一个 $k\times k$ 的二值矩阵， $\mathcal{A}$ 是一个节点集合， $B$ 编码连边模式， $\mathcal{A}$ 表示模块的子集，在很多时候 $\mathcal{A}表示节点的整个集合$ 。 $set(\cdot)$ 将有序元组变成无序元组， $set((v_1,v_2,\dots,v_k))=\{v_1,v_2,\dots,v_k\}$ ，
$M(B,\mathcal{A})=\{(set(v),set(\chi_\mathcal{A}(v)))|v\in V^k,v_1,\dots,v_k\}\tag{10}$
若 $\chi_\mathcal{A}(v)=v$ ，则为simple motifs，否则为anchored motifs。
公式(10)可以重新写为
$cut_M^{(G)}(S,\overline{S})=\sum_{(v,\chi_A(v))\in M}1(\exists i,j\in\chi_\mathcal{A}(v)|i \in S,j\in\overline{S})\tag{17}$
$vol_M^{(G)}(S)=\sum_{(v,\chi_\mathcal{A})\in M}\sum_{i\in\chi_\mathcal{A}(v)}1(i\in S)\tag{18}$
$(W_M)_{ij}=\sum_{(v,\chi_\mathcal{A})\in M}1(\{i,j\})\subset\chi_\mathcal{A}(v))\tag{20}$
$(D_M)_{ii}=\sum_{j=1}^n(W_M)_{ij}$ ， $L_M=D_M-W_M$ ，最后 $\mathcal{L}_M=D_M^{-1/2}L_MD_M^{-1/2}=I-D_M^{-1/2}W_MD_M^{-1/2}$ ，使用特征向量 $\mathcal{L}_M$ 进行分类。
这里写图片描述

引理1.令 $G = (V, E)$ 为一个无权重有向图， $G_M$ 是基于模块的有权重图， $|\mathcal{A}|\geq 2$ 则对于任意 $S\subset V$ ，有
$vol_M^{(G)}(S)=\frac{1}{|\mathcal{A}|-1}vol^{(G_M)}(S)$
引理2.令 $x_i,x_j,x_k\in\{-1,1\}$ ，则
$4\cdot1(x_i,x_jx_k\ not\ all\ the\ same)=x_i^2+x_j^2+x_k^2-x_ix_j-x_jx_k-x_kx_i$
引理3.令 $z\in\{0,1\}^n$ ，若 $z_i=1$ 则 $x_i=1$ ，若 $z_i=1$ 则 $x_i=-1$ ，则对于 $L = D - W$ ，有 $4z^TLz=x^TLx$ 。
引理4.令 $G = (V, E)$ 有向无权重图， $G_M$ 是基于 $|\mathcal{A}|=3$ 的模块的有权重图，对于 $S\subset V$ ，有
$cut^{(G)}_M(S,\overline{S})=\frac{1}{2}cut^{(G_M)}(S,\overline{S})$
定理5.令 $G = (V, E)$ 有向无权重图， $W_M$ 有权重邻接矩阵，且模块 $|\mathcal{A}|=3$ ，则对于 $S\subset V$ ，有
$\phi^{(G)}_M(S)=\phi^{(G_M)}(S)$
定理6.假设使用算法1找到较好的集合 $S$ ，令 $\phi_*=min_{S'}\phi^{(G)}_M(S')$ 为最优集合，则
1. $\phi^{(G)}_M\leq 4\sqrt{\phi^*}\ and$
2. $\phi^*\geq\lambda_2/2$
引理7.令 $x_i,x_jx_k,x_l\in\{-1,1\}$ ，则有
$8\cdot1(x_i,x_j,x_k,x_l\ not\ all\ the\ same)=(7-x_ix_j-x_ix_k-x_ix_l-x_jx_k-x_jx_l-x_kx_l-x_ix_jx_kx_l)\tag{21}$
引理8.令 $G = (V, E)$ 有向无权重图， $G_M$ 基于 $|\mathcal{A}|=4$ 模块有权重图，则对于 $S\subset V$ ，有
$cut^{(G)}_M(S,\overline{S})=\frac{1}{3}cut^{(G_M)}(S,\overline{S})-\sum_{(v,\{i,j,k,l\}\in M)}\frac{1}{3}\cdot1(exactly\ two\ of\ i,j,k,l\ in\ S)$
定理9.令 $G = (V, E)$ 有向无权重图， $W_M$ 基于 $|\mathcal{A}|=4$ 模块邻接矩阵，则对于 $S\subset V$ ，有
$\phi^{(G)}_M(S)=\phi^{(G_M)}(S)-\frac{\sum_{(v,\{i,j,k,l\})\in M}1(exactly\ two\ of\ i,j,k,l\ in\ S)}{vol^{(G_M)}(S)}$
Matlab代码

% function [S,Sbar,conductances]=MotifSpectralPartitionM6(A)
% Spectral partitioning for motif M_6
B = spones(A & A');%bidirectional links
U = A - B ; %unidirectional links
% Form motif adjacency matrix for motif M_6.
% For different motifs , replace this line with another matrix for mulation.
W = (B * U') .* U' + (U * B) .* U + (U' * U) .* B;
% Compute eigen vector of motif normalized Laplacian
Dsqrt = full(sum(W,2));
Dsqrt(Dsqrt ~= 0) = 1 ./ sqrt(Dsqrt(Dsqrt ~= 0));
[I , J , V] = find(W);
Ln = sparse(I , J , -V .* (Dsqrt(I) .* Dsqrt(J)) , size(A, 1) , size(A, 2));
[Z , lambdas] = eigs(Ln , 2 , 'sa');
% Matlab's eigs is sometimes out of order
[~ , eig_order] = sort(diag(lambdas));
% y = Dsqrt .* Z(: , eig_order(end));
y = Z(: , eig_order(end));
% Linear time sweep procedure
[~ , order] = sort(y);
C = W(order , order);
C_sums = full(sum(C , 2));
volumes = cumsum(C_sums);
volumes_other = full(sum(sum(W))) * ones(length(order) , 1) - volumes;
conductances = cumsum (C_sums - 2 * sum(tril(C) , 2)) ./ min (volumes , volumes_other);
[~ , split] = min(conductances);
S = order(1 : split);
Sbar = order((split + 1) : end);

% function [S,Sbar,conductances]=MotifSpectralPartitionM4(A)
% Spectral partitioning for motif M_4
W=(A*A).*A;
% Compute eigen vector of motif normalized Laplacian
Dsqrt = full(sum(W,2));
Dsqrt(Dsqrt ~= 0) = 1 ./ sqrt(Dsqrt(Dsqrt ~= 0));
[I , J , V] = find(W);
Ln = sparse(I , J , -V .* (Dsqrt(I) .* Dsqrt(J)) , size(A, 1) , size(A, 2));
[Z , lambdas] = eigs(Ln , 2 , 'sa');
% Matlab's eigs is sometimes out of order
[~ , eig_order] = sort(diag(lambdas));
y = Dsqrt .* Z(: , eig_order(end));
% Linear time sweep procedure
[~ , order] = sort(y);
C = W(order , order);
C_sums = full(sum(C , 2));
volumes = cumsum(C_sums);
volumes_other = full(sum(sum(W))) * ones(length(order) , 1) - volumes;
conductances = cumsum (C_sums - 2 * sum(tril(C) , 2)) ./ min (volumes , volumes_other);
[~ , split] = min(conductances);
S = order(1 : split);
Sbar = order((split + 1) : end);

higher-order organization of complex networks摘要

猜你喜欢