文献阅读 - Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture


H. Poon, P. Domingos, Sum-Product Networks: A New Deep Architecture, ICCV (2011), Best Paper


摘要

图模型(graphical model)推理(inference)和学习(learning)的主要制约因素(key limiting factor)为配分函数(partition function)的复杂度。

本文提出一种和积网络(SPN):以变量为叶节点,中间节点为和、积运算,且对边加权的有向无环图(SPNs are directed acyclic graphs with variables as leaves, sums and products as internal nodes, and weighted edges)。

若SPN完备(complete)且一致(consistent),则该SPN表示图模型的配分函数及所有边缘、SPN的节点表示语义(the partition function and all marginals of some graphical model, and give semantics to its nodes)。

本文提出一种基于反向传播(backpropagation)和EM的SPN学习算法(learning algorithms)

SPN的学习和推理速度、准确性均优于传统深度网络。

1 引言

图模型(graphical models)将分布表示为因子的归一化乘积(graphical models represent distributions compactly normalized products of factors): P ( X = x ) = 1 Z k ϕ k ( x { k } ) P(X = x) = \frac{1}{Z} \prod_{k} \phi_{k} (x_{\{k\}}) ,其中,

  • x X x \in \mathcal{X} d d 维向量

  • 势(potential) ϕ k \phi_{k} 为变量子集(作用域) x { k } x_{\{k\}} 的函数(each potential ϕ k \phi_{k} is a function of a subset x { k } x_{\{k\}} of the variables (its scope))

  • Z = x X k ϕ k ( x { k } ) Z = \sum_{x \in \mathcal{X}} \prod_{k} \phi_{k} (x_{\{k\}}) 表示配分函数(partition function)。

图模型的缺点:

  1. 一些分布无法表示成上述形式;

  2. 最坏情况下(in the worst case),推理(inference)的时间复杂度呈指数(exponential)增长;

  3. 最坏情况下,学习所需样本数量(sample size required for accurate learning)随变量数量(scope size)呈指数增长;

  4. 由于学习过程涉及推理,即使固定变量,其时间复杂度依然为指数(because learning requires inference as a subroutine, it can take exponential time even with fixed scopes)。

通过假设隐含变量(hidden variables) y y ,可显著提高图模型的紧凑性(compactness): P ( X = x ) = 1 Z y k ϕ k ( ( x , y ) k ) P(X = x) = \frac{1}{Z} \sum_{y} \prod_{k} \phi_{k} ( (x, y)_{k} )

多层隐藏变量的模型能够在类别数量众多的分布上高效推理(models with multiple layers of hidden variables allow for efficient inference in a much larger class of distributions)。

若能通过分配律将 x X k ϕ k ( x { k } ) \sum_{x \in \mathcal{X}} \prod_{k} \phi_{k} (x_{\{k\}}) 改写为多项式数量的和、积项(if x X k ϕ k ( x { k } ) \sum_{x \in \mathcal{X}} \prod_{k} \phi_{k} (x_{\{k\}}) can be reorganized using the distributive law into a computation involving only a polynomial number of sums and products),则配分函数 Z Z 可高效计算。

本文提出和积网络(sum-product networks,SPNs)。SPN可视为混合模型的广义有向无环图(generalized directed acyclic graphs of mixture models),其和节点对应变量子集的混合(sum nodes corresponding to mixtures over subsets of variables)、积节点对应混合的特征(product nodes corresponding to features or mixture components)。SPN可采用反向传播或EM学习(efficient learning by backpropagation or EM)。

2 和积网络(Sum-Product Networks)

考虑布尔变量(Boolean variables) X i X_{i} ,其反(negation)记为 X ˉ i \bar{X}_{i}

指示函数(indicator function) [ ] [\cdot] :当输入(argument)为真时,其值为1;反之为0。本文中,变量指示器 [ X i ] [X_{i}] [ X ˉ i ] [\bar{X}_{i}] 分别简记为 x i x_{i} x ˉ i \bar{x}_{i}

网络多项式(network polynomial):令 Φ ( x ) 0 \Phi(x) \geq 0 表示非归一化概率分布(unnormalized probability distribution),则 Φ ( x ) \Phi(x) 的网络多项式为 x Φ ( x ) Π ( x ) \sum_{x} \Phi(x) \Pi(x) ,其中 Π ( x ) \Pi(x) 表示在状态 x x 上值为1的指示器之积(the product of the indicators that have value 1 in state x x )。

网络多项式为指示器变量的多重线性函数(a multilinear function)。

证据(evidence) e e X X 的部分实例化;证据 e e 的非归一化概率:与 e e 兼容的所有指示器设为1、其余设为0时,网络多项式的值,(the unnormalized probability of evidence (partial instantiation of X X ) e e is the value of the network polynomial when all indicators compatible with e e are set to 1 and the remainder are set to 0)。

定义1:和积网络(SPN)为变量 x 1 , , x d x_{1}, \dots, x_{d} 的有向无环有根图,其叶节点为 x 1 , , x d x_{1}, \dots, x_{d} x ˉ 1 , , x ˉ d \bar{x}_{1}, \dots, \bar{x}_{d} 的指示器,中间节点为和、积运算(a sum-product network (SPN) over variables x 1 , , x d x_{1}, \dots, x_{d} is a rooted directed acyclic graph whose leaves are the indicators x 1 , , x d x_{1}, \dots, x_{d} and x ˉ 1 , , x ˉ d \bar{x}_{1}, \dots, \bar{x}_{d} and whose internal nodes are sums and products):

  • 和节点各边 ( i , j ) (i, j) 的权值 w i j w_{ij} 非负(each edge ( i , j ) (i, j) emanating from a sum node i i has a non-negative weight w i j w_{ij} )。

  • 积节点的值为其所有子节点值之积(the value of a product node is the product of the values of its children)

  • 和节点的值为 j Ch ( i ) h ( i ) w i j v j \sum_{j \in \text{Ch}(i)} h(i) w_{ij} v_{j} ,其中 Ch ( i ) \text{Ch}(i) 表示节点 i i 的子节点、 v j v_{j} 为节点 j j 的值(the value of a sum node is j Ch ( i ) w i j v j \sum_{j \in \text{Ch}(i)} w_{ij} v_{j} , where Ch ( i ) \text{Ch}(i) are the children of i i and v j v_{j} is the value of node j j )。

  • SPN的值为其根节点的值(the value of an SPN is the value of its root)。

在这里插入图片描述
假设:和、积节点层交替排列(sums and products are arranged in alternating layers, i.e., all children of a sum are products or leaves, and vice-versa)。

将和积网络 S S 记为指示变量(indicator variables) x 1 , , x d x_{1}, \dots, x_{d} x ˉ 1 , , x ˉ d \bar{x}_{1}, \dots, \bar{x}_{d} 的函数, S ( x 1 , , x d , x ˉ 1 , , x ˉ d ) S(x_{1}, \dots, x_{d}, \bar{x}_{1}, \dots, \bar{x}_{d})

  • 若指示器指定一个完全状态(the indicators specify a complete state x x ),即每个状态 X i X_{i} 的指示器都分配一个值( x i = 0 x_{i} = 0 x ˉ i = 1 \bar{x}_{i} = 1 x i = 1 x_{i} = 1 x ˉ i = 0 \bar{x}_{i} = 0 ),和积网络的输出记为 S ( x ) S(x)

  • 若指示器指定一个证据 e e ,和积网络的输出记为 S ( x ) S(x)

  • 若所有指示器的值均设为1,和积网络的输出记为 S ( ) S(\ast)

  • 和积网络中,以任意节点 n n 为根的子网络(the subnetwork rooted at an arbitrary node)仍为和积网络,记为 S n ( ) S_{n}(\cdot)

  • S ( x ) S(x) 定义了 X \mathcal{X} 上的非归一化概率分布(the values of S ( x ) S(x) for all x X x \in \mathcal{X} define an unnormalized probability distribution over X \mathcal{X} );

  • S ( x ) S(x) 定义的分布下,证据 e e 的非归一化概率为 Φ S ( e ) = x e S ( x ) \Phi_{S}(e) = \sum_{x \in e} S(x) ,其中 \sum 表示对所有与 e e 一致的状态求和(the unnormalized probability of evidence e e under this distribution is Φ S ( e ) = x e S ( x ) \Phi_{S}(e) = \sum_{x \in e} S(x) , where the sum is over states consistent with e e );

  • S ( x ) S(x) 定义的分布,其配分函数为 Z S = x X S ( x ) Z_{S} = \sum_{x \in \mathcal{X}} S(x)

  • S S 的作用域(scope)为 S S 中的变量集合(the scope of an SPN S S is the set of variables that appear in S S

  • x ˉ i \bar{x}_{i} S S 的叶节点,则 S S 中变量 X i X_{i} 取反;反之亦然(a variable X i X_{i} appears negated in S S if x ˉ i \bar{x}_{i} is a leaf of S S and non-negated if x i x_{i} is a leaf of S S )。

例:图1中,SPN为 S ( x 1 , x 2 , x ˉ 1 , x ˉ 2 ) = 0.6 ( 0.6 x 1 + 0.4 x ˉ 1 ) ( 0.3 x 2 + 0.7 x ˉ 2 ) + 0.2 ( 0.6 x 1 + 0.4 x ˉ 1 ) ( 0.2 x 2 + 0.8 x ˉ 2 ) + 0.6 ( 0.9 x 1 + 0.1 x ˉ 1 ) ( 0.2 x 2 + 0.8 x ˉ 2 ) S(x_{1}, x_{2}, \bar{x}_{1}, \bar{x}_{2}) = 0.6 (0.6 x_{1} + 0.4 \bar{x}_{1}) (0.3 x_{2} + 0.7 \bar{x}_{2}) + 0.2 (0.6 x_{1} + 0.4 \bar{x}_{1}) (0.2 x_{2} + 0.8 \bar{x}_{2}) + 0.6 (0.9 x_{1} + 0.1 \bar{x}_{1}) (0.2 x_{2} + 0.8 \bar{x}_{2}) ,网络多项式为 ( 0.5 × 0.6 × 0.3 + 0.2 × 0.6 × 0.2 + 0.3 × 0.9 × 0.2 ) x 1 x 2 ( 0.5 \times 0.6 \times 0.3 + 0.2 \times 0.6 \times 0.2 + 0.3 \times 0.9 \times 0.2 ) x_{1} x_{2} 。给定完全状态 x x X 1 = 1 X_{1} = 1 X 2 = 0 X_{2} = 0 S ( x ) = S ( 1 , 0 , 0 , 1 ) S(x) = S(1, 0, 0, 1) ;给定证据 e e X 1 = 1 X_{1} = 1 S ( x ) = S ( 1 , 1 , 0 , 1 ) S(x) = S(1, 1, 0, 1) S ( ) = S ( 1 , 1 , 1 , 1 ) S(\ast) = S(1, 1, 1, 1)

定义2:称和积网络 S S 是有效的(valid),当且仅当对 e \forall e ,满足 S ( e ) = Φ S ( e ) S(e) = \Phi_{S}(e) (a sum-product network S S is valid iff S ( e ) = Φ S ( e ) S(e) = \Phi_{S}(e) for all evidence e e )。

定义3:称和积网络 S S 是完备的(complete),当且仅当 S S 中任意和节点的所有子节点作用域均相同(a sum-product network is complete iff all children of the same sum node have the same scope)。

定义4:称和积网络 S S 是一致的(consistent),当且仅当 S S 中任意积节点的所有子节点不相悖(a sum-product network is consistent iff no variable appears negated in one child of a product node and non-negated in another)。■即积节点中不存在 x i x ˉ i x_{i} \bar{x}_{i}

定理1:当该网络完备且一致时,和积网络有效(a sum-product network is valid if it is complete and consistent)。

完备性(completeness)和一致性(consistency)不是网络有效(validity)的必要条件。

若和积网络 S S 完备但不一致(complete but inconsistent),其展开式(expansion)中包含网络多项式中不存在的单项式(monomial),故 S ( e ) Φ S ( e ) S(e) \geq \Phi_{S}(e) ;若和积网络 S S 一致但不完备(consistent but incomplete),其展开式中缺少网络多项式中的部分单项式,故 S ( e ) Φ S ( e ) S(e) \leq \Phi_{S}(e) 。因此,无效SPN可用于近似推理(approximate inference)。

定义5:称非归一化概率分布 Φ ( x ) \Phi(x) 是可由和积网络 S S 表示的,当且仅当对 x \forall x ,满足 Φ ( x ) = S ( x ) \Phi(x) = S(x) S S 有效(an unnormalized probability distribution Φ ( x ) \Phi(x) is representable by a sum-product network S S iff Φ ( x ) = S ( x ) \Phi(x) = S(x) for all states x x and S S is valid)。

则通过 S S ,可实现 Φ ( x ) \Phi(x) 所有边缘及其配分函数的高效计算。

定理2:若马尔科夫网络的配分函数 Φ ( x ) \Phi(x) 可通过包含 d d 的多项式条边(edges)的和积网络表示,其中 x x 表示 d d 维向量,则 Φ ( x ) \Phi(x) 的计算时间复杂度为 d d 的多项式(the partition function of a Markov network Φ ( x ) \Phi(x) , where x x is a d d -dimensional vector, can be computed in time polynomial in d d if Φ ( x ) \Phi(x) is representable by a sum-product network with a number of edges polynomial in d d )。

定义6:称和积网络是可分解的,当且仅当任意变量至多出现在积节点的一个子节点中(a sum-product network is decomposable iff no variable appears in more than one child of a product node)。

可分解比一致性更严格(decomposability is more restricted than consistency)。

在这里插入图片描述

3 和积网络与其它模型(Sum-Product Networks and Other Models)

在这里插入图片描述

4 和积网络学习(Learning Sum-Product Networks)

在这里插入图片描述

5 实验

图像补全(completing images)问题

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

6 和积网络和大脑皮层(Sum-Product Networks and the Cortex)

7 结论

致谢

发布了103 篇原创文章 · 获赞 162 · 访问量 5万+

猜你喜欢

转载自blog.csdn.net/zhaoyin214/article/details/103659128