E2CNN: General E(2)-Equivariant Steerable CNNs论文解读

论文地址
代码地址

Introduction


  • The equivariance of neural networks under symmetry group actions has in the recent years proven to be a fruitful prior in network design

  • By guaranteeing a desired transformation behavior of convolutional features under transformations of the network input, equivariant networks achieve improved generalization capabilities and sample complexities compared to their non-equivariant counterparts.

  • Due to their great practical relevance, a big pool of rotation- and reflection- equivariant models for planar images has been proposed by now.

  • 近年来,证明在对称群作用下具有等变性的神经网络被证明在网络设计中极富先验

  • 在网络输入变换下,通过保证进行卷积特征所需要变换,相对于非等变网络,等变网络提升了泛化能力和样本复杂度。

  • 由于他们具有很大的实用意义,当前出现了一大批平面图像的旋转等变和镜像对称等变的网络


  • An important step in this direction is given by the theory of Steerable CNNs which defines a very general notion of equivariant convolutions on homogeneous spaces.

  • The feature spaces of steerable CNNs are thereby defined as spaces of feature fields, characterized by a group representation which determines their transformation behavior under transformations of the input.

  • In order to preserve the specified transformation law of feature spaces,the convolutional kernels are subject to a linear constraint, depending on the corresponding group representations. While this constraint has been solved for specific groups and representations, no general solution strategy has been proposed so far.

  • 旋转等边和镜像对称等变这个方向重要的一步是可控卷积神经网络的提出,它定义的是一个齐次空间上的等变卷积的普遍概念。

  • 可控卷积神经网络的特征空间被定义为:确定其在输入变化下的群表征(group representation)的特征场空间,

  • 为了保存特征空间的特定变换规律,卷积核会受到相应群表征的线性约束,而这种约束可以通过特定群表征进行求解,只是并没有提出一个通解。

作者大概做了什么?

  • Author further propose a group restriction operation, allowing for network architectures which are decreasingly equivariant with depth.

  • This is useful e.g. for natural images which show low level features like edges in arbitrary orientations but carry a sense of preferred orientation globally.

  • An adaptive level of equivariance accounts for the resulting loss of symmetry in the hierarchy of features.

  • Since the theory of steerable CNNs does not give a preference for any choice of group representation or equivariant nonlinearity, we run an extensive benchmark study, comparing different equivariance groups, representations and nonlinearities.

  • We do so on MNIST 12k, rotated MNIST SO(2) and reflected and rotated MNIST O(2) to investigate the influence of the presence or absence of certain symmetries in the dataset.

  • A drop in replacement of our equivariant convolutional layers is shown to yield significant gains over non-equivariant baselines on CIFAR10, CIFAR100 and STL-10.

  • Author's contributions are of relevance for general steerable CNNs on homogeneous spaces and gauge equivariant CNNs on manifolds since these models obey the same kind of kernel constraints. More specifically, 2-dimensional manifolds, endowed with an orthogonal structure group O(2) (or subgroups thereof), necessitate exactly the kernel constraints solved in this paper. Author's results can therefore readily be transferred to e.g. spherical CNNs or more general models of geometric deep learning.

  • 作者进一步提出了一个群限制操作,允许网络结构随着网络结构加深而降低等变性。

  • 这是很有用的,比如说,特征水平较低的图片,如包含了偏向某一个方向信息的任意方向的边

  • 自适应等变的水平解释了特征层次结构中导致的对称性损失。

  • 由于可控卷积神经网络的理论没有给出群表征或者非线性等变的选择方法,所以作者比较不同的等变群、等变表征和等变非线性进行了研究

  • 作者在MNIST 12k、旋转的MNIST SO(2)(Special Orthogonal)和镜像翻转与旋转的MNIST O(2)(Orthogonal)数据集,以研究数据集中某些对称性的存在或缺失对结果的影响。

  • 在CIFAR10、CIFAR100和STL-10数据集上,少量替换作者的等变卷积层相比于非等变卷积会明显提升效果

  • 作者的贡献对泛化齐次空间和流形的规范等变可控卷积神经网络有重大意义,由于这些模型遵守相同类型的卷积核约束。更特殊的是用正交结构群O(2)编码的二维流形(或者其中的子群)需要论文中求解的核约束。作者的结论因此能便利地转换到球面空间(球谐)或者更多的泛化几何深度学习模型中。

order ∣ G ∣ \vert G \vert G G ⩽ O ( 2 ) G \leqslant O(2) GO(2) ( R 2 , + ) ⋊ G (\mathbb{R}^2, +) \rtimes G (R2,+)G
orthogonal - O(2) E ( 2 ) ≅ ( R 2 , + ) ⋊ O ( 2 ) E(2)\cong (\mathbb{R}^2, +) \rtimes O(2) E(2)(R2,+)O(2)
special orthogonal - SO(2) S E ( 2 ) ≅ ( R 2 , + ) ⋊ S O ( 2 ) SE(2) \cong (\mathbb{R}^2, +) \rtimes SO(2) SE(2)(R2,+)SO(2)
cyclic N C N C_N CN ( R 2 , + ) ⋊ C N (\mathbb{R}^2, +) \rtimes C_N (R2,+)CN
reflection 2 ( ± 1 , ∗ ) ≅ D 1 ({\pm 1}, *) \cong D_1 (±1,)D1 ( R 2 , + ) ≅ ( ± 1 , ∗ ) (\mathbb{R}^2, +) \cong ({\pm 1}, *) (R2,+)(±1,)
dihedral 2N D N ≅ C N ⋊ ( ± 1 , ∗ ) D_N \cong C_N \rtimes ({\pm 1}, *) DNCN(±1,) ( R 2 , + ) ⋊ D N (\mathbb{R}^2, +) \rtimes D_N (R2,+)DN

\qquad

下面的内容需要补充一点知识基础

由于博文内容限制字数,所以补充的知识点放在了这里“补充知识”,有基础的同学直接略过。

General E(2)-Equivariant Steerable CNNs

  • The convolutional weight sharing ensures the inference to be translation-equivariant which means that a translated input signal results in a corresponding translation of the feature maps.

  • 卷积网络权重共享保证了推理中是具有平移等变性的,这就意味着平移后的输入信号会导致特征图会产生响应的平移。

\qquad

Isometries of the Euclidean plane R 2 \mathbb{R}^2 R2

∙ \bullet The Euclidean group E(2) is the group of isometries of the plane R 2 \mathbb{R}^2 R2, consisting of translations, rotations and reflections.

∘ \circ 欧几里得群E(2)是\mathbb{R}^2等距(等度量)群,是由平移、旋转和反射构成的。

∙ \bullet Characteristic patterns in images often occur at arbitrary positions and in arbitrary orientations.

∘ \circ 图片的特征通常出现在任意位置和任意方向

∙ \bullet The Euclidean group therefore models an important factor of variation of image features.

∘ \circ 欧几里得群模型化了一个图片特征变量的重要因素

∙ \bullet Euclidean group can be constructed from the translation group ( R 2 , + ) (\mathbb{R}^2, +) (R2,+) and the orthogonal group O ( 2 ) = { O ∈ R 2 × 2 ∣ O T O = i d 2 × 2 } O(2) = \{O \in \mathbb{R}^{2 \times 2}| O^TO=id_{2 \times 2}\} O(2)={ OR2×2OTO=id2×2} via the semidirect product operation as E ( 2 ) ≅ ( R 2 , + ) ⋊ O ( 2 ) E(2) \cong (\mathbb{R}^2, +) \rtimes O(2) E(2)(R2,+)O(2)

∘ \circ 可以将欧几里得群视为由平移群 ( R 2 , + ) (\mathbb{R}^2, +) (R2,+)和正交群 O ( 2 ) = { O ∈ R 2 × 2 ∣ O T O = i d 2 × 2 } O(2) = \{O \in \mathbb{R}^{2 \times 2}| O^TO=id_{2 \times 2}\} O(2)={ OR2×2OTO=id2×2}通过半直积的方法构成的群

E ( 2 ) ≅ ( R 2 , + ) ⋊ O ( 2 ) E(2) \cong (\mathbb{R}^2, +) \rtimes O(2) E(2)(R2,+)O(2)

其中

( R 2 , + ) (\mathbb{R}^2, +) (R2,+)是平移群。

O ( 2 ) O(2) O(2)是正交群。

  • The orthogonal group contains all operations leaving the origin invariant, e.g. continuous rotation and reflections.
  • 正交群包含了所有不变性操作,例如持续的旋转和镜像。

⋊ \rtimes 是半直积,作用就是把两个群合成一个群。

\qquad

∙ \bullet In order to allow for different levels of equivariance and to cover a wide spectrum of related work we consider subgroups of the Euclidean group of the form ( R 2 , + ) ⋊ G (\mathbb{R}^2, +) \rtimes G (R2,+)G, defined by subgroups G ⩽ O ( 2 ) . G \leqslant O(2). GO(2).

∘ \circ 为了可以允许不同等级的不变性和覆盖更多相关工作,作者用 G ⩽ O ( 2 ) G \leqslant O(2) GO(2) 定义形如 ( R 2 , + ) ⋊ G (\mathbb{R}^2, +) \rtimes G (R2,+)G的欧几里得子群。

∙ \bullet G G G could be either the special orthogonal group S O ( 2 ) SO(2) SO(2), the group ( { ± 1 } , ∗ ) (\{±1\}, ∗) ({ ±1},) of the reflections along a given axis, the cyclic groups C N C_N CN, the dihedral groups D N D_N DN or the orthogonal group O ( 2 ) O(2) O(2) itself.

∘ \circ G G G可以是特别的正交群 S O ( 2 ) SO(2) SO(2)、沿给定轴镜像对称的 ( { ± 1 } , ∗ ) (\{±1\}, ∗) ({ ±1},)群、圆柱群 C N C_N CN、二面体群 D N D_N DN或者正交群本身 O ( 2 ) O(2) O(2)

  • S O ( 2 ) SO(2) SO(2)表示无镜像对称的连续旋转
  • D N 和 C N D_N和C_N DNCN表示包含 N N N 2 π N \frac{2\pi}{N} N2π旋转和镜像(镜像针对D_N),因此 C N 和 D N C_N和D_N CNDN是序数分别为 N 和 2 N ( 因为存在镜像,所以旋转多一倍 ) N和2N(因为存在镜像,所以旋转多一倍) N2N(因为存在镜像,所以旋转多一倍)的离散子群。

\qquad

E(2)-steerable feature fields

∙ \bullet Steerable CNNs define feature spaces as spaces of s t e e r a b l e f e a t u r e f i e l d s f : R 2 → R c steerable feature fields f : \mathbb{R}^2 \rightarrow \mathbb{R}^c steerablefeaturefieldsf:R2Rc which associate a c − d i m e n s i o n a l c-dimensional cdimensional feature vector f ( x ) ∈ R c f (x) \in \mathbb{R}^c f(x)Rc to each point x of a base space, in our case the
plane R 2 \mathbb{R}^2 R2 .

∘ \circ 可控CNNs把特征空间定义为 :可控特征场: f : R 2 → R c f : \mathbb{R}^2 \rightarrow \mathbb{R}^c f:R2Rc,该可控特征场被c-维特征向量 f ( x ) ∈ R c f (x) \in\mathbb{R}^c f(x)Rc关联到基空间的点上,即 R 2 \mathbb{R}^2 R2 空间中

∙ \bullet In contrast to vanilla CNNs, the feature fields of steerable CNNs are associated with a transformation law which specifies their transformation under actions of E(2) (or subgroups) and therefore endows features with a notion of orientation

∘ \circ 和原始CNNs(Vanilla CNNs)相比,可控CNNs的特征场关联了特定E(2)或子群的运动法则,因此给特征赋予了方向的概念。

  • 图中左边,是一个标量特征场的变换,欧几里得群作用于标量场,只是将像素点从原来的位置移动那个到新的位置,就是 s ( x ) ↦ s ( ( t g ) − 1 x ) = s ( g − 1 ( x − t ) ) s(x) \mapsto s((tg)^{-1}x) = s(g^{-1}(x-t)) s(x)s((tg)1x)=s(g1(xt)),其中的 t g ∈ ( R 2 , + ) ⋊ G tg \in (\mathbb{R}^2,+)\rtimes G tg(R2,+)G,例如光流估计或者梯度图。
  • 图中右边,是一个矢量场的变换, v ( x ) ↦ g ⋅ v ( g − 1 ( x − t ) ) v(x) \mapsto g\cdot v(g^{-1}(x-t)) v(x)gv(g1(xt)),对比标量场,矢量场不仅把像素的移动到了新位置,还根据 g ∈ G g \in G gG改变了矢量的方向
    \qquad

    矢量变换
    在这里插入图片描述

ρ \rho ρ作为图片中变换法则,群表示为: ρ   :   G ↦ G L ( R c ) \rho \ : \ G \mapsto GL(\mathbb{R}^c) ρ : GGL(Rc),详细表明了 c c c c h a n n e l s channels channels的特征向量 f ( x ) f(x) f(x)经过各种变换后,是如何混合到通道信息中的。群表示满足 ρ ( g g ~ ) \rho(g\tilde{g}) ρ(gg~)的形式,因此可以将群乘法 g g ~ g\tilde{g} gg~表示为 c × c c \times c c×c维度的 ρ ( g ) \rho(g) ρ(g) ρ ( g ~ ) \rho(\tilde{g}) ρ(g~)矩阵的乘积。

\qquad

∙ \bullet a ρ \rho ρ-field transforms under the induced representation [ I n d G ( R 2 ,   + )   ⋊   G   ρ ] [Ind_G^{(\mathbb{R}^2, \ +) \ \rtimes \ G} \ \rho] [IndG(R2, +)  G ρ] of ( R 2 ,   + )   ⋊   G (\mathbb{R}^2, \ +) \ \rtimes \ G (R2, +)  G

∘ \circ 诱导表示为 [ I n d G ( R 2 ,   + )   ⋊   G   ρ ] [Ind_G^{(\mathbb{R}^2, \ +) \ \rtimes \ G} \ \rho] [IndG(R2, +)  G ρ]下的 ρ \rho ρ-field变换 ( R 2 ,   + )   ⋊   G (\mathbb{R}^2, \ +) \ \rtimes \ G (R2, +)  G为:

\qquad
f ( x )   ↦   ( [ I n d G ( R 2 ,   + )   ⋊   G   ρ ] ( t g ) ⋅ f ( x ) : = ρ ( g ) ⋅ f ( g − 1 ( x − t ) ) (1) f(x) \ \mapsto \ ([Ind_G^{(\mathbb{R}^2, \ +) \ \rtimes \ G} \ \rho](tg)\cdot f(x) \quad := \quad \rho(g)\cdot f(g^{-1}(x-t))\tag{1} f(x)  ([IndG(R2, +)  G ρ](tg)f(x):=ρ(g)f(g1(xt))(1)
\qquad

\qquad

\qquad

E(2)-steerable convolutions

\qquad

∙ \bullet In order to preserve the transformation law of steerable feature spaces, each network layer is required to be equivariant under the group actions. The most general equivariant linear map between steerable feature spaces, transforming under ρ i n \rho_{in} ρin and ρ o u t \rho_{out} ρout , is given by c o n v o l u t i o n s convolutions convolutions with G − s t e e r a b l e   k e r n e l s k   k : R 2   →   R c o u t × c i n G-steerable \ kernels^k \ k : \mathbb{R}^2 \ \rightarrow \ \mathbb{R}^{c_{out} \times c_{in}} Gsteerable kernelsk k:R2  Rcout×cin,satisfying a kernel constraint:

∘ \circ 为了保留可控制特征空间的变换规律,需要每个网络层在群作用下是等变的。经过 ρ i n \rho_{in} ρin and ρ o u t \rho_{out} ρout变换的可控特征空间之间的普遍等变的显性映射是由 G G G- s t e e r a b l e steerable steerable卷积核 k ^{k} k给定的, k k k个卷积核表示为 k e r n e l s k   k : R 2   →   R c o u t × c i n kernels^k \ k : \mathbb{R}^2 \ \rightarrow \ \mathbb{R}^{c_{out} \times c_{in}} kernelsk k:R2  Rcout×cin ,满足如下的核约束:

\qquad
k ( g x )   =   ρ o u t ( g ) k ( x ) ρ i n ( g − 1 ) ∀ g ∈ G ,   x ∈ R 2 (2) k(gx) \ = \ \rho_{out}(g)k(x)\rho_{in}(g^{-1}) \quad \forall g \in G, \ x \in \mathbb{R}^2 \tag{2} k(gx) = ρout(g)k(x)ρin(g1)gG, xR2(2)
\qquad

Intuitively,就非变换的坐标 x x x的卷积核和卷积核对输入场的变换响应而言,这种约束决定了变换后的坐标 g x gx gx的卷积核的形式。其确保了当输入场根据 I n d ρ i n Ind_{\rho_{in}} Indρin变换时,输出特征场会变换为 I n d ρ o u t Ind_{\rho_{out}} Indρout

\qquad

\qquad

由于核约束是线性的,其解会形成一个传统CNNs中非约束的卷积核的矢量空间的线性子空间.

\qquad

\qquad

Irrep(irreducible representation不可约表示,诱导表示一般可约) decomposition of the kernel constraint

\qquad

\qquad

如果任何维数大于一的表示的所有矩阵都可以用相同的相似变换转换为相同的块对角矩阵结构,则称此表示为可约表示,反之称为不可约表示。

\qquad

\qquad

∙ \bullet The kernel constraint (2) in principle needs to be solved individually for each pair of input and output types ρ i n \rho_{in} ρin and ρ o u t \rho_{out} ρout to be used in the network.

∘ \circ 原则上对于如下网络中使用的每个输入输出对的类型 ρ i n \rho_{in} ρin ρ o u t \rho_{out} ρout来说,需要单独求解上述核约束

  • 网络:

    ∙ \quad \bullet \quad A numerical solution technique which is based on a Clebsch-Gordan decomposition of tensor products of irreps has been proposed

    ∘ \quad \circ \quad 一种基于不可约的张量积的Clebsch-Gordan分解(又称向量耦合分级)

\qquad

  • 核约束

\qquad
k ( g x )   =   ρ o u t ( g ) k ( x ) ρ i n ( g − 1 ) ∀ g ∈ G ,   x ∈ R 2 (2) k(gx) \ = \ \rho_{out}(g)k(x)\rho_{in}(g^{-1}) \quad \forall g \in G, \ x \in \mathbb{R}^2 \tag{2} k(gx) = ρout(g)k(x)ρin(g1)gG, xR2(2)
\qquad

∙ \bullet Any representation of a finite or compact group decomposes under a change of basis into a direct sum of irreps, each corresponding to an invariant subspace of the representation space R c \mathbb{R}^c Rc on which ρ \rho ρ acts. Denoting the change of basis by Q Q Q, this means that one can always write ρ = Q − 1 [ ⨁ i ∈ I ψ i ] Q \rho=Q^{-1}\left[\bigoplus_{i\in I}\psi_i\right]Q ρ=Q1[iIψi]Q where ψ i \psi_i ψi are the irreducible representations of G G G and the index set I I I encodes the types and multiplicities of irreps present in ρ \rho ρ .

∘ \circ 有限群或紧群的任何表示在基改变的作用下分解为不可约直和(direct sum of irreps),每个不可约直和对应于经过 ρ \rho ρ变换作用后的 R c \mathbb{R}^c Rc表示空间的不变子空间。基根据 Q Q Q变换,这意味着,总有 ρ = Q − 1 [ ⨁ i ∈ I ψ i ] Q \rho=Q^{-1}\left[ \bigoplus_{i\in I}\psi_i \right]Q ρ=Q1[iIψi]Q,其中 ψ i \psi_i ψi G G G的不可约表示,且索引集合 I I I编码了 ρ \rho ρ的不可约表达的类型和稳态解(multiplicity)。

The decomposition of ρ i n \rho_{in} ρin and ρ o u t \rho_{out} ρout in the kernel constraint leads to:
核约束中的 ρ i n \rho_{in} ρin ρ o u t \rho_{out} ρout的分解会产生:

\qquad

k ( g x )   =   Q o u t − 1 [ ⨁ i ∈ I o u t   ψ i ( g ) ]   Q o u t k ( x ) Q i n − 1 [ ⨁ j ∈ I i n   ψ j − 1 ( g ) ]   Q i n ∀ g ∈ G ,   x ∈ R 2 k ( g x )   =   [ ⨁ i ∈ I o u t   ψ i ( g ) ]   κ [ ⨁ j ∈ I i n   ψ j − 1 ( g ) ] ∀ g ∈ G ,   x ∈ R 2 \begin{array}{lllclcc} k(gx) \ &= \ Q^{-1}_{out} &\left[\bigoplus{_{i \in I_{out}}} \ \psi_i(g)\right] \ &Q_{out}k(x)Q^{-1}_{in} &\left[ \bigoplus{_{j \in I_{in}}} \ \psi^{-1}_{j}(g) \right] \ Q_{in} \quad &\forall g \in G, \ x \in \mathbb{R}^2 \\ \qquad \\ k(gx) \ &= \ &\left[\bigoplus{_{i \in I_{out}}} \ \psi_i(g)\right] \ &\kappa &\left[ \bigoplus{_{j \in I_{in}}} \ \psi^{-1}_{j}(g) \right] \quad &\forall g \in G, \ x \in \mathbb{R}^2 \end{array} k(gx) k(gx) = Qout1= [iIout ψi(g)] [iIout ψi(g)] Qoutk(x)Qin1κ[jIin ψj1(g)] Qin[jIin ψj1(g)]gG, xR2gG, xR2
其中:

∙ \quad \bullet 将与不可约基相关的核k定义为 κ : = Q o u t k Q i n − 1 \kappa:=Q_{out}kQ^{-1}_{in} κ:=QoutkQin1

\qquad

∙ ⨁ \quad \bullet \bigoplus 是直和
直和

V 1 V_1 V1 V 2 V_2 V2是域 F \mathbb{F} F向量空间的两个向量,满足:

V 1 ∩ V 2 = { 0 } V_1 \cap V_2 = \{0\} V1V2={ 0}
且任意 v ∈ V v \in V vV都能 V 1 V_1 V1 V 2 V_2 V2中向量的线性组合,即
v = c 1 v 1 + c 2 v 2 ( v 1 ∈ V 1 ,   v 2 ∈ V 2 ,   c 1 , c 2 ∈ F ) v = c_1v_1+c_2v_2 \quad (v1 \in V_1, \ v_2 \in V_2, \ c_1,c_2 \in \mathbb{F}) v=c1v1+c2v2(v1V1, v2V2, c1,c2F)
那么空间 V V V 就是 V 1 V_1 V1 V 2 V_2 V2的直和空间,用直和(direct sum)运算记为
V = V 1 ⨁ V 2 V = V_1 \bigoplus V_2 V=V1V2
即 “任意 v ∈ V v \in V vV 都能表示为 V 1 V_1 V1 V 2 V_2 V2中向量之和”

∙ \quad \bullet 左右不可约直和的乘积部分可替换为如下形式:

\qquad
κ i j ( g x )   =   ψ i ( g ) κ i j ( x ) ψ j − 1 ( g ) ∀ g ∈ G ,   x ∈ R 2 w h e r e   i i n I o u t ,   j ∈ I i n (3) \kappa^{ij}(gx) \ = \ \psi_i(g)\kappa^{ij}(x)\psi^{-1}_{j}(g) \quad \forall g \in G, \ x \in \mathbb{R}^2 \quad where \ i in I_{out}, \ j \in I_{in}\tag{3} κij(gx) = ψi(g)κij(x)ψj1(g)gG, xR2where iinIout, jIin(3)
k 1 , ⋯   , k d   : =   ⋃ i ∈ I o u t ⋃ j ∈ I i n   { Q o u t − 1   k ˉ 1 i j   Q i n } (4) \qquad \\ {k_1, \cdots, k_d} \ := \ \bigcup{_{i \in I_{out}}} \bigcup{_{j \in I_{in}}} \ \{Q^{-1}_{out} \ \bar{k}^{ij}_{1} \ Q_{in}\} \tag{4} k1,,kd := iIoutjIin { Qout1 kˉ1ij Qin}(4)
其中

{ k 1 i j , ⋯   , k d i j i j } \{k^{ij}_{1}, \cdots, k^{ij}_{d_{ij}}\} { k1ij,,kdijij}是一个 d i j d_{ij} dij维的基

d = ∑ i j d i j d = \sum_{ij}d_{ij} d=ijdij,视为 i × j i \times j i×j个不同的 d d d相加
k ˉ \bar{k} kˉ指的是在 k k k形的矩阵被填充在矩阵块 k i j k^{ij} kij的对应位置,而该矩阵块其他位置全都是0.

\qquad

\qquad

General solution of the kernel constraint for O(2) and subgroups

\qquad

∙ \bullet In order to build isometry-equivariant CNNs on R 2 \mathbb{R}^2 R2 we need to solve the irrep constraints (3) for the specific case of G G G being O ( 2 ) O(2) O(2) or one of its subgroups. For this purpose note that the action of G G G on R 2 \mathbb{R}^2 R2 is norm-preserving, that is, ∣ g ⋅ x ∣ = ∣ x ∣ ∀ g ∈ G , x ∈ R 2 |g\cdot x| = |x| \quad \forall g \in G, x \in \mathbb{R}^2 gx=xgG,xR2 . The constraints (2) and (3) therefore only restrict the angular parts of the kernels but leave their radial parts free. It is convenient to expand the kernel w.l.o.g. in terms of an (angular) Fourier series

∘ \circ 为了在 R 2 \mathbb{R}^2 R2上建立度量等变的CNNs,我们需要解正交群O2或者其子群下的不可约约束。为此, R 2 \mathbb{R}^2 R2上的G: ∣ g ⋅ x ∣ = ∣ x ∣ ∀ g ∈ G , x ∈ R 2 |g\cdot x| = |x| \quad \forall g \in G, x \in \mathbb{R}^2 gx=xgG,xR2 。约束2和3因此只限制了核的角部分,而径向部分没有约束。不是一般性(w.l.o.g.),很容易把公式转换成傅里叶级数的形式

κ α β i j ( x ( r , ϕ ) )   =   A α β   ,   0 ( r ) + ∑ μ = 1 ∞ [ A α β   ,   μ ( r ) cos ⁡ ( μ   ϕ ) + B α β   ,   μ ( r ) sin ⁡ ( μ   ϕ ) ] (5) \kappa^{ij}_{\alpha \beta}(x(r,\phi)) \ = \ A_{\alpha \beta \ , \ 0 }(r) + \sum^{\infty}_{\mu=1} \left[A_{\alpha \beta \ , \ \mu}(r)\cos{(\mu \ \phi)} + B_{\alpha \beta \ , \ \mu }(r)\sin{(\mu \ \phi )} \right] \tag{5} καβij(x(r,ϕ)) = Aαβ , 0(r)+μ=1[Aαβ , μ(r)cos(μ ϕ)+Bαβ , μ(r)sin(μ ϕ)](5)
\qquad

对于每个矩阵块 κ i j \kappa^{ij} κij κ α β i j \kappa^{ij}_{\alpha \beta} καβij的实值的径向相关系数(real-value, radically dependent coefficients)即正弦分量的系数和余弦分量的系数为

A α   β   ,   μ   :   R +   →   R B α   β   ,   μ   :   R +   →   R \begin{array}{cl} A_{\alpha \ \beta \ , \ \mu} \ &: &\ \mathbb{R}^+ \ \rightarrow \ \mathbb{R} \\ B_{\alpha \ \beta \ , \ \mu} \ &: &\ \mathbb{R}^+ \ \rightarrow \ \mathbb{R} \\ \end{array} Aα β , μ Bα β , μ :: R+  R R+  R
κ α   β i j \kappa^{ij}_{\alpha \ \beta} κα βij插入不可约的约束(3) κ i j ( g x )   =   ψ i ( g ) κ i j ( x ) ψ j − 1 ( g ) \kappa^{ij}(gx) \ = \ \psi_i(g)\kappa^{ij}(x)\psi^{-1}_{j}(g) κij(gx) = ψi(g)κij(x)ψj1(g)中,且投影到单谐(一个频率的谐函数)中,我们可以得到傅里叶系数的约束。

\qquad


ψ i \psi_i ψi \ ψ j \psi_j ψj trivial sign-flip frequency n ∈ N + n \in \mathbb{N}^+ nN+
trivial [ 1 ] [1] [1] ∅ \varnothing [ s i n ( n ϕ ) ,   − c o s ( n ϕ ) ] [sin(n\phi), \ -cos(n\phi)] [sin(nϕ), cos(nϕ)]
sign-flip ∅ \varnothing [ 1 ] [1] [1] [ c o s ( n ϕ ) ,   s i n ( n ϕ ) ] [cos(n\phi), \ sin(n\phi)] [cos(nϕ), sin(nϕ)]
frequency m ∈ N + m \in \mathbb{N}^+ mN+ [ s i n ( m ϕ ) − c o s ( m ϕ ) ] \begin{bmatrix}sin(m\phi) \\ -cos(m\phi)\end{bmatrix} [sin(mϕ)cos(mϕ)] [ c o s ( m ϕ ) s i n ( m ϕ ) ] \begin{bmatrix}cos(m\phi) \\ sin(m\phi)\end{bmatrix} [cos(mϕ)sin(mϕ)] [ c o s ( ( m − n ) ϕ ) − s i n ( ( m − n ) ϕ ) s i n ( ( m − n ) ϕ ) c o s ( ( m − n ) ϕ ) ] \begin{bmatrix}cos((m-n)\phi) & -sin((m-n)\phi) \\ sin((m-n)\phi) & cos((m-n)\phi)\end{bmatrix} [cos((mn)ϕ)sin((mn)ϕ)sin((mn)ϕ)cos((mn)ϕ)], [ c o s ( ( m + n ) ϕ ) s i n ( ( m + n ) ϕ ) s i n ( ( m + n ) ϕ ) − c o s ( ( m + n ) ϕ ) ] \begin{bmatrix}cos((m+n)\phi) & sin((m+n)\phi) \\ sin((m+n)\phi) & -cos((m+n)\phi)\end{bmatrix} [cos((m+n)ϕ)sin((m+n)ϕ)sin((m+n)ϕ)cos((m+n)ϕ)]

O(2)可控核的角部分的基,满足不同不可约输入场 ψ i \psi_i ψi和不可约输出 ψ j \psi_j ψj场对的不可约约束(3)

\qquad
\qquad

Group representation and nonlinearities

\qquad

∙ \bullet Considering only the convolution operation with G − s t e e r a b l e G-steerable Gsteerable kernels for the moment, it turns out that any change of basis P P P to an equivalent representation ρ ~   : =   P − 1 ρ P \tilde{\rho} \ := \ P^{-1}\rho P ρ~ := P1ρP is irrelevant

∘ \circ 考虑到运动中只有 G − s t e e r a b l e G-steerable Gsteerable卷积核的操作,对于一个等变表示为 ρ ~   : =   P − 1 ρ P \tilde{\rho} \ := \ P^{-1}\rho P ρ~ := P1ρP 的任何 P P P基的变换是相关的

\qquad

∙ \bullet Consider the irrep decomposition ρ = Q − 1 [ ⨁ i ∈ I ψ i ] Q \rho=Q^{-1}\left[\bigoplus{_{i \in I}\psi_i}\right]Q ρ=Q1[iIψi]Qused in the solution of the kernel constraint to obtain a basis { k i } i = 1 d \{k_i\}^{d}_{i=1} { ki}i=1d of G-steerable kernels as defined by Eq. (4). Any equivalent representation will decompose into ρ ~ = Q ~ − 1 [ ⨁ i ∈ I ψ i ] Q \tilde{\rho}=\tilde{Q}^{-1}\left[\bigoplus{_{i\in I}\psi_i}\right]Q ρ~=Q~1[iIψi]Q with Q ~ = Q P \tilde{Q}=QP Q~=QP for some P P P and therefore result in a kernel basis { P o u t − 1 k i P i n } i = 1 d \{P^{-1}_{out}k_i P_{in}\}^{d}_{i=1} { Pout1kiPin}i=1d which entirely negates changes of bases between
equivalent representations. It would therefore w.l.o.g. suffice to consider direct sums of irreps ρ = ⨁ i ∈ I ψ i \rho = \bigoplus{_{i \in I}\psi_i} ρ=iIψi as representations only

∘ \circ 考虑等式(4)中用于获取G-steerable卷积核约束解的不可约分解 ρ = Q − 1 [ ⨁ i ∈ I ψ i ] Q \rho=Q^{-1}\left[\bigoplus{_{i \in I}\psi_i}\right]Q ρ=Q1[iIψi]Q。任何等变表示都可以用 Q ~ = Q P \tilde{Q}=QP Q~=QP分解为 ρ ~ = Q ~ − 1 [ ⨁ i ∈ I ψ i ] Q \tilde{\rho}=\tilde{Q}^{-1}\left[\bigoplus{_{i\in I}\psi_i}\right]Q ρ~=Q~1[iIψi]Q,并且产生卷积核的基底 { P o u t − 1 , k i P i n } i = 1 d \{P^{-1}_{out},k_i P_{in}\}^{d}_{i=1} { Pout1kiPin}i=1d,该基底让等变表示的基变化完全无用。不是一般性,这足以视不可约直和 ρ = ⨁ i ∈ I ψ i \rho = \bigoplus{_{i \in I}\psi_i} ρ=iIψi为仅有的表示。

\qquad

∙ \bullet In practice, however, convolution layers are interleaved with other operations which are sensitive to specific choices of representations. In particular, nonlinearity layers are required to be equivariant under the action of specific representations. The choice of group representations in steerable CNNs therefore restricts the range of admissible nonlinearities, or, conversely, a choice of nonlinearity allows only for certain representations.

∘ \circ 然而,实际上各卷基层和其他特定特征选择的操作是交织在一起的。尤其是,在特定表示作用下,非线性层需要是等变的。在可控卷积网络中,群表示的选择限制了一系列允许的线性变换,或者,相反地,非线性选择仅仅适用于某些表达

\qquad

∙ \bullet All equivariant nonlinearities considered here act spatially localized, that is, on each feature vector f ( x )   ∈   R c i n f(x) \ \in \ \mathbb{R}^{c_{in}} f(x)  Rcin for all x ∈ R 2 x \in \mathbb{R}^2 xR2 individually. They might produce different types of output fields ρ o u t : G → G L ( R c o u t ) \rho_{out}: G \rightarrow GL(\mathbb{R}^{c_{out}}) ρout:GGL(Rcout), that is, σ   :   R c i n   →   R c o u t   ,   f ( x )   ↦   σ ( f ( x ) ) \sigma \ : \ \mathbb{R}^{c_{in}} \ \rightarrow \ \mathbb{R}^{c_{out}} \ , \ f(x) \ \mapsto \ \sigma(f(x)) σ : Rcin  Rcout , f(x)  σ(f(x)). It is sufficient to require the equivariance of σ \sigma σ under the actions of ρ i n \rho_{in} ρin and ρ o u t \rho_{out} ρout, i.e. σ ∘ ρ i n ( g ) = ρ g ∘ σ   ∀ g ∈ G \sigma \circ \rho_{in}(g) = \rho_{g} \circ \sigma \ \forall g \in G σρin(g)=ρgσ gG, for the nonlinearities to be equivariant under the action of induced representations when being applied to a whole feature field as σ ( f ) ( x )   : = σ ( f ( x ) ) \sigma(f)(x) \ :=\sigma(f(x)) σ(f)(x) :=σ(f(x)).

∘ \circ 这里所考虑的所有等变非线性操作用于空间( H × W H \times W H×W空间)定位

,即在所有位于空间 x ∈ R 2 x \in \mathbb{R}^2 xR2的每个特征向量 f ( x )   ∈   R c i n f(x) \ \in \ \mathbb{R}^{c_{in}} f(x)  Rcin。这会产生不同输入场 ρ o u t : G → G L ( R c o u t ) \rho_{out}: G \rightarrow GL(\mathbb{R}^{c_{out}}) ρout:GGL(Rcout)的类型,即 σ   :   R c i n   →   R c o u t   ,   f ( x )   ↦   σ ( f ( x ) ) \sigma \ : \ \mathbb{R}^{c_{in}} \ \rightarrow \ \mathbb{R}^{c_{out}} \ , \ f (x) \ \mapsto \ \sigma(f(x)) σ : Rcin  Rcout , f(x)  σ(f(x))。在 ρ i n \rho_{in} ρin ρ o u t \rho_{out} ρout作用下,仅需要等变性 σ \sigma σ就足够了,例如, σ ∘ ρ i n ( g ) = ρ g ∘ σ   ∀ g ∈ G \sigma \circ \rho_{in}(g) = \rho_{g} \circ \sigma \ \forall g \in G σρin(g)=ρgσ gG,当用于整个特征场 σ ( f ) ( x )   : = σ ( f ( x ) ) \sigma(f)(x) \ :=\sigma(f(x)) σ(f)(x) :=σ(f(x)),对于在诱导表示(可约)作用下,非线性操作是等变的

\qquad

∙ \bullet A general class of representations are u n i t a r y r e p r e s e n t a t i o n s unitary representations unitaryrepresentations which preserve the norm of their representation space, that is, they satisfy ∣ ρ u n i t a r y ( g ) f ( x ) ∣   =   ∣ f ( x ) ∣   ∀ g ∈ G |\rho_{unitary}(g)f(x)| \ = \ |f(x)| \ \forall g \in G ρunitary(g)f(x) = f(x) gG

∘ \circ 一个通用类的表示是酉表示(unitary representation),它保存了表示空间的范数,这满足 ∣ ρ u n i t a r y ( g ) f ( x ) ∣   =   ∣ f ( x ) ∣   ∀ g ∈ G |\rho_{unitary}(g)f(x)| \ = \ |f(x)| \ \forall g \in G ρunitary(g)f(x) = f(x) gG

什么是酉表示?



酉表示,群到酉群的同态,是表示论中 一种经典群表示方法,酉表示论的主要目标之一是描述“酉对偶”,即 G 的所有不可约酉表示的空间。


旨在将抽象代数结构中的元素“表示”成向量空间上的线性变换,并研究这些代数结构上的模,藉以研究结构的性质。

略言之,表示论将一代数对象表作较具体的矩阵,并使得原结构中的代数运算对应到矩阵加法和矩阵乘法。

G G G为群,其在域 F F F(常取复数域 F = C F=\mathbb{C} F=C)表示是: F F F矢量空间 V V V及映至一般线性群之群同态

ρ : G → G L ( V ) \rho : G \rightarrow GL(V) ρ:GGL(V)
假设 V V V有限维,则上述同态即是将 G G G的元素映成可逆矩阵,并使得群运算对应到矩阵乘法。


表示论的妙用在于能将抽象代数问题转为较容易解决的线性代数问题。此外,群还可以表示在无穷维空间上;

表示论的一大特点是它遍布数学各个领域。这个特点有两个方面。首先,表示论的应用十分广泛:除了在代数的影响之外,表示论

  • 通过调和分析阐明并推广了傅里叶分析;
  • 通过不变量理论和爱尔兰根纲领与几何学建立了联系;
  • 通过自守形式和朗兰兹纲领对数论产生了影响。

\qquad

∙ \bullet Nonlinearities which solely act on the norm of feature vectors but preserve their orientation are equivariant w.r.t. unitary representations. They can in general be decomposed in σ n o r m   :   R c   → R c   ,   f ( x )   ↦   η ( ∣ f ( x ) ∣ f ( x ) ∣ f ( x ) ∣ \sigma_{norm} \ : \ \mathbb{R}^c \ \rightarrow \mathbb{R}^c \ , \ f(x) \ \mapsto \ \eta(|f(x)|\frac{f(x)}{|f(x)|} σnorm : Rc Rc , f(x)  η(f(x)f(x)f(x) for some nonlinear function η   :   R ⩾ 0   →   R ⩾ 0 \eta \ : \ \mathbb{R}_{\geqslant 0} \ \rightarrow \ \mathbb{R}_{\geqslant 0} η : R0  R0 acting on the norm of feature vectors.

∘ \circ 仅仅作用于特征向量范数但保存了特征向量方向的非线性操作是关于酉表示等变的。他们通常可以分解为 σ n o r m   :   R c   → R c   ,   f ( x )   ↦   η ( ∣ f ( x ) ∣ f ( x ) ∣ f ( x ) ∣ \sigma_{norm} \ : \ \mathbb{R}^c \ \rightarrow \mathbb{R}^c \ , \ f(x) \ \mapsto \ \eta(|f(x)|\frac{f(x)}{|f(x)|} σnorm : Rc Rc , f(x)  η(f(x)f(x)f(x) ,在对某些作用于特征向量的范数的非线性操作下: η   :   R ⩾ 0   →   R ⩾ 0 \eta \ : \ \mathbb{R}_{\geqslant 0} \ \rightarrow \ \mathbb{R}_{\geqslant 0} η : R0  R0

\qquad

∙ \bullet A common choice of representations of finite groups like C N C_N CN and D N D_N DN are regular representations. Their representation space R ∣ G ∣ \mathbb{R}^{|G|} RG has dimensionality equal to the order of the group, e.g. R N \mathbb{R}^N RN for C N C_N CN and R 2 N \mathbb{R}^{2N} R2N for D N D_N DN. The action of the regular representation is defined by assigning each axis e g e_g eg of R ∣ G ∣ \mathbb{R}^{|G|} RG to a group element g ∈ G g \in G gG and permuting the axes according to ρ r e g G ( g ~ ) e g   : =   e g ~ g \rho^{G}_{reg}({\tilde{g}})e_g \ := \ e_{\tilde{g}g} ρregG(g~)eg := eg~g . Since this action is just permuting channels of ρ r e g G \rho^{G}_{reg} ρregG-fields, it commutes with pointwise nonlinearities like ReLU.

∘ \circ C N C_N CN D N D_N DN有限群表示的一个通用的选择是使用三角表示。三角表示空间 R ∣ G ∣ \mathbb{R}^{|G|} RG 和群相同的阶数(order),例如 R N \mathbb{R}^N RN 对于 C N C_N CN R 2 N \mathbb{R}^{2N} R2N 对于 D N D_N DN,都是具有相同阶数的。三角表示的作用是通过把 R ∣ G ∣ \mathbb{R}^{|G|} RGd每个轴 e g e_g eg分给群元素 g ∈ G g \in G gG并且根据 ρ r e g G ( g ~ ) e g   : =   e g ~ g \rho^{G}_{reg}({\tilde{g})}e_g \ := \ e_{\tilde{g}g} ρregG(g~)eg := eg~g对各个轴进行排序。由于这种变换只是对 ρ r e g G \rho^{G}_{reg} ρregG场的通道进行排序,所以他是和点级的非线性操作进行交互的,例如ReLU。

\qquad

∙ \bullet While regular steerable CNNs were empirically found to perform very well, they lead to high dimensional feature spaces with each individual field consuming ∣ G ∣ |G| G channels. The translation of feature maps of conventional CNNs can be viewed as action of the regular representation of the translation group.

∘ \circ 尽管三角可控卷积网络表现很好,但是会导致高维特征空间的每个单独的场都会占据一个 ∣ G ∣ |G| G通道。 卷积网络的特征图的平移会被视为平移群的三角表示的作用。

\qquad

∙ \bullet Closely related to regular representations are quotient representations. Instead of permuting ∣ G ∣ |G| G channels indexed by G G G, they permute ∣ G ∣ / ∣ H ∣ |G|/|H| G∣/∣H channels indexed by cosets g H gH gH in the quotient
space G / H G/H G/H of a subgroup H ⩽ G H \leqslant G HG. Specially,they act on axes e g H e_{gH} egH of R ∣ G ∣ / ∣ H ∣ \mathbb{R}^{|G|/|H|} RG∣/∣H as defined by ρ q u o t G / H ( g ~ ) e g H   : =   e g ~ g H \rho^{G/H}_{quot}(\tilde{g})e_gH \ := \ e_{\tilde{g}gH} ρquotG/H(g~)egH := eg~gH. This definition covers regular representations as a special case for the trivial subgroup H = { e } H = \{e\} H={ e}.

∘ \circ 商表示是一种非常接近三角表示的方式。 商表示使用在子集 H ⩽ G H \leqslant G HG 的商空间 G / H G/H G/H 的陪集 g H gH gH 来对 ∣ G ∣ / ∣ H ∣ |G|/|H| G∣/∣H 进行排序,而非通过 G G G ∣ G ∣ |G| G 通道进行排序和索引。具体来说就是,作用于 R ∣ G ∣ / ∣ H ∣ \mathbb{R}^{|G|/|H|} RG∣/∣H空间的轴 e g H e_{gH} egH被定义为 ρ q u o t G / H ( g ~ ) e g H   : =   e g ~ g H \rho^{G/H}_{quot}(\tilde{g})e_gH \ := \ e_{\tilde{g}gH} ρquotG/H(g~)egH := eg~gH。 这种定义覆盖了三角标示,是一种平凡子群 H = { e } H = \{e\} H={ e}的特殊情况

什么是平凡子群?

\qquad

任意群 G G G,都含有 { e } ( e \{e\} (e { e}(e是单位圆 ) ) ) G G G作为子群的,称为平凡子群。

什么是陪集群coset group?

\qquad

陪集是:

H H H G G G的子群(左图),右图中,在 G G G中选一个不属于 H H H的元素 g g g,将其乘以 H H H得到 g H gH gH ,注意若非阿贝尔群, g H gH gH H g Hg Hg是不同的,称 g H gH gH为左陪集(Left Coset),可以证明 H H H g H gH gH无交集不重叠;类似的,在 G G G中选择一个不属于 H H H g H gH gH的元素KaTeX parse error: Expected group after '^' at position 2: g^̲'乘以 H H H得到KaTeX parse error: Expected group after '^' at position 2: g^̲'H,与前两者都不重叠,…… ,这样一直进行下去,通过选择大小相同但不重叠的陪集,完成对 G G G的划分( G G G是有限群)。

陪集
陪集不是子群,因为它没有恒等元 e e e,而 H H H有。

陪集并不总能形成一个群。陪集能组成群的充要条件是对于任何 y ∈ G y \in G yG ,都能满足 y − 1 N y = N y^{-1}Ny = N y1Ny=N N N N 是一个正规子群,记 N ⊴ G N \trianglelefteq G NG

陪集群称为因子群(factor group),记 G / N G/N G/N,在该群中,恒等元为 N N N

\qquad

∙ \bullet Both regular and quotient representations can be viewed as being induced from the trivial representation of a subgroup H ⩽ G H \leqslant G HG, specifically ρ r e g G = I n d q u o t G   1 \rho^{G}_{reg} = Ind^{G}_{quot} \ 1 ρregG=IndquotG 1 and ρ q u o t G = I n d H G   1 \rho^{G}_{quot} = Ind^{G}_{H} \ 1 ρquotG=IndHG 1. More generally, any representation ρ ~   : =   H → G L ( R c ) \tilde{\rho} \ := \ H \rightarrow GL(\mathbb{R}^c) ρ~ := HGL(Rc)can be used to define an induced representations ρ i n d = I n d H G   ρ ~   :   G → G L ( R c ⋅ ∣ G : H ∣ ) \rho_{ind} = Ind^{G}_{H} \ \tilde{\rho} \ : \ G \rightarrow GL(\mathbb{R}^{c \cdot |G:H|}) ρind=IndHG ρ~ : GGL(RcG:H). Here ∣ G : H ∣ |G:H| G:H denot the index of H H H in G G G which corresponds to ∣ G ∣ / ∣ H ∣ |G|/|H| G∣/∣H if G G G and H H H are both finite.

∘ \circ 三角标示和商表示都能被视为 H ⩽ G H \leqslant G HG子群的平凡表示(仅有 e e e)的约分,详细说就是 ρ r e g G = I n d q u o t G   1 \rho^{G}_{reg} = Ind^{G}_{quot} \ 1 ρregG=IndquotG 1 ρ q u o t G = I n d H G   1 \rho^{G}_{quot} = Ind^{G}_{H} \ 1 ρquotG=IndHG 1。更通俗地说,任何表示 ρ ~   : =   H → G L ( R c ) \tilde{\rho} \ := \ H \rightarrow GL(\mathbb{R}^c) ρ~ := HGL(Rc)都可以用于定义一个可约表示 ρ i n d = I n d H G   ρ ~   :   G → G L ( R c ⋅ ∣ G : H ∣ ) \rho_{ind} = Ind^{G}_{H} \ \tilde{\rho} \ : \ G \rightarrow GL(\mathbb{R}^{c \cdot |G:H|}) ρind=IndHG ρ~ : GGL(RcG:H)。其中, ∣ G : H ∣ |G:H| G:H指的是 ∣ G ∣ |G| G ∣ H ∣ |H| H都是有限群时, ∣ G ∣ / ∣ H ∣ |G|/|H| G∣/∣H所对应的 G G G的指数index

\qquad

∙ \bullet Regular and quotient fields can furthermore be acted on by nonlinear pooling operators. Via a group pooling or projection operation max : R c → R   ,   f ( x ) → m a x ( f ( x ) ) \mathbb{R}^c \rightarrow \mathbb{R} \ , \ f(x)\rightarrow max(f(x)) RcR , f(x)max(f(x)) the works extract the maximum value of a regular or quotient field. The invariance of the maximum operation implies that the resulting features form scalar fields. Since group pooling operations discard information on the feature orientations entirely, vector field nonlinearities σ v e c t : R N → R 2 \sigma_{vect}: \mathbb{R}^N \rightarrow \mathbb{R}^2 σvect:RNR2 for regular representations of C N C_N CN were proposed in other paper. Vector field nonlinearities do not only keep the maximum response max ( f ( x ) ) (f(x)) (f(x)) but also its index arg max ( f ( x ) ) (f(x)) (f(x)). This index corresponds to a rotation angle θ   =   2 π N a r g   m a x   ( f ( x ) ) \theta \ = \ \frac{2 \pi}{N}arg \ max \ (f(x)) θ = N2πarg max (f(x)) which is used to define a vector field with elements v ( x ) = m a x ( f ( x ) ) ( c o s ( θ ) , s i n ( θ ) ) T v(x)=max(f(x))(cos(\theta), sin(\theta))^T v(x)=max(f(x))(cos(θ),sin(θ))T.

∘ \circ 三角场和商场可以作用于非线性池化中。 通过group pooling或者最大投影操作: R c → R   ,   f ( x ) → m a x ( f ( x ) ) \mathbb{R}^c \rightarrow \mathbb{R} \ , \ f(x)\rightarrow max(f(x)) RcR , f(x)max(f(x)),提取正规或者商场的最大值。最大值操作的不变性意味着生成的特征是标量场。由于group pooling操作会完全放弃特征的方向信息,所以,其他论文中提出了 C N C_N CN的正规表示的非线性向量场 σ v e c t : R N → R 2 \sigma_{vect}: \mathbb{R}^N \rightarrow \mathbb{R}^2 σvect:RNR2。非线性向量场不经保存了 ( f ( x ) ) (f(x)) (f(x))的最大响应,还保存了指数index的 a r g   m a x   ( f ( x ) ) arg \ max \ (f(x)) arg max (f(x))。该指数对应了旋转角 θ   =   2 π N a r g   m a x   ( f ( x ) ) \theta \ = \ \frac{2 \pi}{N}arg \ max \ (f(x)) θ = N2πarg max (f(x)),此角度是用元素 v ( x ) = m a x ( f ( x ) ) ( c o s ( θ ) , s i n ( θ ) ) T v(x)=max(f(x))(cos(\theta), sin(\theta))^T v(x)=max(f(x))(cos(θ),sin(θ))T 定义矢量场

\qquad

∙ \bullet In general, any pair of feature fields f 1 : R 2 → R c 1 f_1:\mathbb{R}^2 \rightarrow \mathbb{R}^{c_1} f1:R2Rc1 and f 2 : R 2 → R c 2 f_2:\mathbb{R}^2 \rightarrow \mathbb{R}^{c_2} f2:R2Rc2 can be combined via the tensor product operation f 1 ⊗ f 2 f_1 \otimes f_2 f1f2. Given that the individual fields transform under arbitrary representations ρ 1 \rho_1 ρ1 and ρ 2 \rho_2 ρ2 , their product transforms under the tensor product representation ρ 1 ⊗ ρ 2 \rho_1 \otimes \rho_2 ρ1ρ2. Any pair f 1 f_1 f1 and f 2 f_2 f2 of feature fields can furthermore be concatenated by taking their direct sum f 1 ⊕ f 2 : R 2 → R c 1 + c 2 f_1 \oplus f_2: \mathbb{R}^2 \rightarrow \mathbb{R}^{c_1+c_2} f1f2:R2Rc1+c2 which we used in equation(1) to define feature spaces comprising multiple feature fields. The concatenated field transforms according to the direct sum representation as ( ρ 1 ⊕ ρ 2 ) ( g ) ( f 1 ⊕ f 2 )   : =   ρ 1 ( g ) f 1 ⊕ ρ 2 ( g ) f 2 (\rho_1 \oplus \rho_2)(g)(f_1 \oplus f_2)\ := \ \rho_1(g)f_1 \oplus \rho_2(g)f_2 (ρ1ρ2)(g)(f1f2) := ρ1(g)f1ρ2(g)f2

∘ \circ 通常,任何特征场对 f 1 : R 2 → R c 1 f_1:\mathbb{R}^2 \rightarrow \mathbb{R}^{c_1} f1:R2Rc1 f 2 : R 2 → R c 2 f_2:\mathbb{R}^2 \rightarrow \mathbb{R}^{c_2} f2:R2Rc2 能通过卷积操作 f 1 ⊗ f 2 f_1 \otimes f_2 f1f2联系在一起。如果在任意表示 ρ 1 \rho_1 ρ1 ρ 2 \rho_2 ρ2作用下进行场变换,那么他们的积会随着张量积表示 ρ 1 ⊗ ρ 2 \rho_1 \otimes \rho_2 ρ1ρ2而变换。任何特征场对 f 1 f_1 f1 f 2 f_2 f2可以通过其直和 f 1 ⊕ f 2 : R 2 → R c 1 + c 2 f_1 \oplus f_2: \mathbb{R}^2 \rightarrow \mathbb{R}^{c_1+c_2} f1f2:R2Rc1+c2进行拼接,这个是用于公式(1)的特征空间定义,包含了多重特征场。拼接后的特征场会根据直和变换,如 ( ρ 1 ⊕ ρ 2 ) ( g ) ( f 1 ⊕ f 2 )   : =   ρ 1 ( g ) f 1 ⊕ ρ 2 ( g ) f 2 (\rho_1 \oplus \rho_2)(g)(f_1 \oplus f_2)\ := \ \rho_1(g)f_1 \oplus \rho_2(g)f_2 (ρ1ρ2)(g)(f1f2) := ρ1(g)f1ρ2(g)f2,相当于做了个分解。

\qquad

\qquad

Group restrictions and inductions

\qquad

∙ \bullet The key idea of equivariant networks is to exploit symmetries in the distribution of characteristic patterns in signals. The level of symmetry present in data might thereby vary over different length scales. For instance, natural images typically show small features like edges or intensity gradients in arbitrary orientations and reflections. On a larger length scale, however, the rotational symmetry is broken as manifested in visual patterns exclusively appearing upright but still in different reflections. Each individual layer of a convolutional network should therefore be adapted to the symmetries present in the length scale of its fields of view.

∘ \circ 等变网络的关键思想是采取信号中的特征模式分布的对称性。出现在数据中的对称性等级会在长度尺度上变化。例如,自然图像通常在任意方向和和镜像对称中会展示一些小特征,比如边和梯度强度等。然而在更大的长度尺度上,旋转的对称性会被打破,表现为视觉特征出现竖直但仍然在不同的镜像反射中。因此,卷积网络的每个单独的层都会适应这种对称性存在于其视场的长度尺度中。

\qquad

∙ \bullet A loss of symmetry can be implemented by restricting the equivariance constraints at a certain depth to a subgroup ( R 2 ,   + ) ⋊   H ⩽ ( R 2 ,   + ) ⋊ G (\mathbb{R}^2, \ +) \rtimes \ H \leqslant (\mathbb{R}^2, \ +) \rtimes G (R2, +) H(R2, +)G, where H ⩽ G H \leqslant G HG; e.g. from rotations and reflections G = O ( 2 ) G=O(2) G=O(2) to mere reflections H = ( { ± 1 } , ∗ ) H=(\{\pm 1 \}, *) H=({ ±1},) in the example above. This requires the feature fields produced by a layer with a higher level of equivariance to be reinterpretered in the following layer as fields transforming under a subgroup. Specially, a ρ \rho ρ-field, transforming under a representation ρ → G → G L ( R c ) \rho \rightarrow G \rightarrow GL(\mathbb{R}^c) ρGGL(Rc), needs to be reinterpreted as a ρ ~ \tilde{\rho} ρ~-field, where ρ ~ : H → G L ( R c ) \tilde{\rho}: H \rightarrow GL(\mathbb{R}^c) ρ~:HGL(Rc) is a representation of the subgroup H ⩽ G H \leqslant G HG. This is naturally achieved by define ρ ~ \tilde{\rho} ρ~ to be restricted representation:

∘ \circ 可以通过限制某些深度的等变约束到子群 ( R 2 ,   + ) ⋊   H ⩽ ( R 2 ,   + ) ⋊ G (\mathbb{R}^2, \ +) \rtimes \ H \leqslant (\mathbb{R}^2, \ +) \rtimes G (R2, +) H(R2, +)G上,实现对称损失,该子群 中 H ⩽ G H \leqslant G HG,例如从旋转和镜像反射 G = O ( 2 ) G=O(2) G=O(2)到只有镜像反射 H = ( { ± 1 } , ∗ ) H=(\{\pm 1 \}, *) H=({ ±1},)。这就要求由较高等变性水平的层生成的特征场在随后的层中重新解释为子群作用下变换的特征场。尤其是,在 ρ : G → G L ( R c ) \rho : G \rightarrow GL(\mathbb{R}^c) ρ:GGL(Rc)表示下变换的 ρ \rho ρ场需要被重新解释为一个 ρ ~ \tilde{\rho} ρ~场,即 ρ ~ : H → G L ( R c ) \tilde{\rho}: H \rightarrow GL(\mathbb{R}^c) ρ~:HGL(Rc)是子群 H ⩽ G H \leqslant G HG的表示。通过定义 ρ ~ \tilde{\rho} ρ~为受限表达:

\qquad

ρ ~   : =   R e s H G ( ρ )   :   H → G L ( R c ) ,   h ↦ ρ ( h ) \tilde{\rho} \ := \ Res^G_H(\rho) \ : \ H \rightarrow GL(\mathbb{R}^c), \ h \mapsto \rho(h) ρ~ := ResHG(ρ) : HGL(Rc), hρ(h)
\qquad

\qquad

Implementation details

Author’s implemetation involves:


  • 计算可控卷积核的偏置
  • 具有拓展系数的上述偏置下,可控卷积核的拓展(expansion)
  • 实际卷积路径的进行

\qquad

公式1

f ( x )   ↦   ( [ I n d G ( R 2 ,   + )   ⋊   G   ρ ] ( t g ) ⋅ f ( x ) : = ρ ( g ) ⋅ f ( g − 1 ( x − t ) ) (1) \qquad \\ f(x) \ \mapsto \ ([Ind_G^{(\mathbb{R}^2, \ +) \ \rtimes \ G} \ \rho](tg)\cdot f(x) \quad := \quad \rho(g)\cdot f(g^{-1}(x-t))\tag {1} f(x)  ([IndG(R2, +)  G ρ](tg)f(x):=ρ(g)f(g1(xt))(1)
\qquad

公式2
k ( g x )   =   ρ o u t ( g ) k ( x ) ρ i n ( g − 1 ) ∀ g ∈ G ,   x ∈ R 2 (2) \qquad \\ k(gx) \ = \ \rho_{out}(g)k(x)\rho_{in}(g^{-1}) \quad \forall g \in G, \ x \in \mathbb{R}^2 \tag{2} k(gx) = ρout(g)k(x)ρin(g1)gG, xR2(2)
\qquad

公式3
κ i j ( g x )   =   ψ i ( g ) κ i j ( x ) ψ j − 1 ( g ) ∀ g ∈ G ,   x ∈ R 2 w h e r e   i i n I o u t ,   j ∈ I i n (3) \qquad \\ \kappa^{ij}(gx) \ = \ \psi_i(g)\kappa^{ij}(x)\psi^{-1}_{j}(g) \quad \forall g \in G, \ x \in \mathbb{R}^2 \quad where \ i in I_ {out}, \ j \in I_{in}\tag{3} κij(gx) = ψi(g)κij(x)ψj1(g)gG, xR2where iinIout, jIin(3)
\qquad

公式4
k 1 , ⋯   , k d   : =   ⋃ i ∈ I o u t ⋃ j ∈ I i n   { Q o u t − 1   k ˉ 1 i j   Q i n } (4) \qquad \\ {k_1, \cdots, k_d} \ := \ \bigcup{_{i \in I_{out}}} \bigcup{_{j \in I_{in}}} \ \{Q^{-1}_{out} \ \bar{k}^{ij}_{1} \ Q_{in}\} \tag {4} k1,,kd := iIoutjIin { Qout1 kˉ1ij Qin}(4)
\qquad

公式5
κ α β i j ( x ( r , ϕ ) )   =   A α β   ,   0 ( r ) + ∑ μ = 1 ∞ [ A α β   ,   μ ( r ) cos ⁡ ( μ   ϕ ) + B α β   ,   μ ( r ) sin ⁡ ( μ   ϕ ) ] (5) \qquad \\ \kappa^{ij}_{\alpha \beta}(x(r,\phi)) \ = \ A_{\alpha \beta \ , \ 0 }(r) + \sum^{\infty}_{\mu=1} \left[A_{\alpha \beta \ , \ \mu}(r)\cos{(\mu \ \phi)} + B_{\alpha \beta \ , \ \mu }(r)\sin{ (\mu \ \phi )} \right] \tag{5} καβij(x(r,ϕ)) = Aαβ , 0(r)+μ=1[Aαβ , μ(r)cos(μ ϕ)+Bαβ , μ(r)sin(μ ϕ)](5)

给定 G ⩽ O ( 2 ) G \leqslant O(2) GO(2)下的输入 ρ i n \rho_{in} ρin和输出 ρ o u t \rho_{out} ρout ,首先预计算一组满足等式(2)的 G G G-steerable卷积核的一组偏置 { k 1 , ⋯   , k d } \{k_1, \cdots, k_d\} { k1,,kd}

为了解核约束,作者计算了不可约输入输出的表示的类型和稳态解。

通过解线性系统 ρ ( g ) = Q − 1 [ ⨁ i ∈ I ψ i ( g ) ] Q   ∀ g ∈ G \rho(g)=Q^{-1}\left[\bigoplus{_{i \in I}}\psi_i(g) \right]Q \ \forall g \in G ρ(g)=Q1[iIψi(g)]Q gG 可以获得偏置的改变。

代码在e2cnn/group/representation.py中求直和

cob = np.zeros((size, size))
cob_inv = np.zeros((size, size))
p=0
for r in reqrs:
	# 把矩阵放在对角线上 
	cob[c] = r.change_of_basis
	cob_inv[p:p + r.size, p:p + r.size] = r.change_of_basis_inv
	p += r.size

对于输入 ρ i n \rho_{in} ρin和输出 ρ o u t \rho_{out} ρout 的每对不可约 ψ i \psi_i ψi ψ j \psi_j ψj,作者得到解析解 { κ 1 i j , … , κ d i , j i j } \{\kappa^{ij}_1, \dots , \kappa^{ij}_{d_{i, j}}\} { κ1ij,,κdi,jij}

同偏置矩阵 Q i n Q_{in} Qin Q o u t Q_{out} Qout的变化,他们共同决定了G-steerable卷积核的偏置 { k 1 , … , k d } \{k_1, \dots, k_d\} { k1,,kd}的三角表示部分。

由于核空间约束仅仅影响核的三角表示,所以我们可以自由选择任何径向的轮廓。

作者选择了高斯分布的径向轮廓 e 1 2 σ 2 ( r − R ) 2 e^{\frac{1}{2\sigma^2}(r-R)^2} e2σ21(rR)2,标准差 σ \sigma σ,中心的半径为 R = 1 , … , ⌞ s / 2 ⌟ R=1,\dots,\llcorner s/2 \lrcorner R=1,,s/2

关于 r r r R R R σ \sigma σ的生成
首先在e2cnn/nn/modules/r2_conv/r2convolution.py中的:

def compute_bias_params(kernelsize: int,
						frequency_cutoff: Union[float, Callable[[float], float]] = None,
						rings: List[float] = None,
						sigma: List[float]=None,
						dilation: int = 1,
						custom_basis_filter: Callable[[dict], bool] = None):
	if rings is None:
		n_rings = math.ceil(kernel_size / 2)
		rings = torch.linspace(0, (kernel_size - 1) // 2, n_rings)  * dilation
		rings = rings.tolist()
	if sigma is None:
		# 变成[0.6, 0.6, 0.6, 0.4]
		sigma = [0.6] * (len(rings)-1) + [0.4]
		for i, r in enumerate(rings):
			if r == 0. :
				# 避免出现sigma=0的情况,因为sigma是分母
				sigma[i] = 0.005

rings> 然后再在e2cnn/kernels/basis.py中,把rings(例如图片中的 r i n g s = [ 0 , 1 , 2 , 3 ] rings=[0, 1, 2, 3] rings=[0,1,2,3])分别变成

self.radii = np.array(rings).reshape(1, 1, -1, 1) #这就是r
radii = radii.reshape(1, 1, 1, -1) #这就是R
"""对角线为0的矩阵
[[0,  1,  4,  9],
 [1,  0,  1,  4],
 [4,  1,  0,  1],
 [9,  4,  1,  0]]
"""
d = (self.radii - radii) **2
out = np.exp(-0.5*d/sigma**2)

如何使用E2CNN代码?

  1. 首先确定输入特征或者图片的输入通道、输出通道和旋转方式
# 旋转8次,每次 pi/4 = 45°
orientation = 8
# self.gspace是包含了所有旋转信息的函数(类似字典),通过属性调用
self.gspace = e2cnn.gspaces.Rot2dOnR2(orientation)
self.in_type = e2cnn.FieldType(self.gspace, [self.gspace.trivial_repr] * input_channels)
self.out_type = e2cnn.FieldType(self.gspace, [self.gspace.trivial_repr] * output_channels)
  1. 然后把输入特征或者图片变成Geometric_tensor,这里就是一个封装函数,属性通过.attribute来调用
# 这里是e2cnn专用的Geometric_tensor,如果想变成普通的tensor,只需要x.tensor,就可以让普通卷积使用
x = img # 或者 x = feature_map
x = e2cnn.GeometricTensor(x, self.in_type)
  1. 初始化e2cnn卷积函数、ReLU函数、Norm函数和Pool函数
in_type = self.in_type
out_tpye = self.out_type
stride = 1
padding = 0
dilation = 1
bias = False
self.conv_nxn = e2cnn.R2Conv(in_type, 
							 out_type, 
							 n, # n就是kernel_size
							 stride=stride,
							 padding=padding,
							 dilation=dilation,
							 bias=bias,
							 sigma=None,
							 # 这里的frequencies_cutoff必须是可迭代对象
							 frequencies_cutoff=lambda r: 3 * r)
self.relu = e2cnn.ReLU(self.conv_nxn.out_type, inplace=True)
self.norm = e2cnn.InnerBatchNorm(self.conv_nxn.out_type)
self.pool = e2cnn.PointwiseMaxPool(self.conv_nxn.out_type, 
								   kernel_size=n, 
								   stride=stride, 
								   padding=padding)
self.interp = e2nn.R2Upsampling(self.pool.out_type, 
								scale_factor, 
								mode='nearest', 
								align_corners=True)
# 或者使用这个, 二选其一
self.relu = e2cnn.ReLU(self.out_type, inplace=True)
self.norm = e2cnn.InnerBatchNorm(self.out_type)
self.pool = e2cnn.PointwiseMaxPool(self.out_tpye, 
								   kernel_size=n, 
								   stride=stride, 
								   padding=padding)
self.interp = e2nn.R2Upsampling(self.in_type, 
								scale_factor, 
								mode='nearest', 
								align_corners=True)
								   
  1. 进行一个卷积的unit
x = feature_map
x = e2cnn.GeometricTensor(x, self.in_type)
conv_out = self.conv_nxn(x)
relu_out = self.relu(conv_out)
norm_out = self.norm(relu_out)
pool_out = self.pool(norm_out)
interp_out = self.interp(pool_out)
# 如果这里需要普通操作,例如+-x÷和普通卷积之类的,需要调用tensor属性
out = interp_out.tenosr
final_out = out * out
  1. 一定要有.tensor属性调用,不然会报错,因为非e2cnn的计算操作只针对tensor或者numpy有效
# 如果没有上一步调用tensor属性,则会报错
print(type(interp_out))
print(type(out))
输出:GeometricTensor
输出:Tensor

猜你喜欢

转载自blog.csdn.net/Soonki/article/details/131215127
今日推荐