Tensor decomposition and applications - study notes [03]

4 compression and Tucker Decomposition

4.0 Tucker decomposition method defined

  • Tucker decomposition method can be regarded as a higher order PCA. It tensor is decomposed into core tensor (core tensor) of the product of the matrix on each mode. Thus, the three-dimensional tensor \ (\ mathcal {X} \ in \ mathbb {R} ^ { I \ times J \ times K} \) , we have decomposed as follows:
    \ [(4.1) \ X-mathcal {} \ approx \ mathcal {G} \ times_1 \ mathrm {A} \ times_2 \ mathrm {B} \ times_3 \ mathrm {C} = \ sum_ {p = 1} ^ P \ sum_ {q = 1} ^ Q \ sum_ {r = 1} ^ R g_ {pqr} \: \ mathrm {a} _p \ circ \ mathrm {b} _q \ circ \ mathrm {c} _r = [\! [\ mathcal {G} \ ,; A, B, C] \!]. \]

  • Wherein, \ (\ mathrm {A} \ in \ mathbb {R & lt} ^ {the I \ Times P} \) , \ (\ mathrm {B} \ in \ mathbb {R & lt} ^ {J \ Times Q} \) , and \ (\ mathrm {C} \ in \ mathbb {R} ^ {K \ times R} \) is called factor matrix, they are generally perpendicular (orthogonal). they can usually be made at each mode the main component of principal components. wherein tensor \ (\ mathcal {G} \ in \ mathbb {R} ^ {P \ times Q \ times R} \) is called core tensor (core tensor). his each number represents the element level of interaction between the different components.

  • For each element, Tucker decomposition can be written as:
    \ [X_ {ijk} \ approx \ sum_. 1} = {P ^ P \ sum_. 1} = {Q ^ Q \ {R & lt sum_. 1} = {^ R & lt G_ pqr} a_ {ip} b_ { jq} c_ {kr} \ quad \ text {for} \ quad i = 1, \ dots, I, j = 1, \ dots, J, k = 1, \ dots, K. \]

  • \ (P, Q \) and \ (R & lt \) is the corresponding factor matrix \ (A, B \) and \ (C \) into a fraction (e.g., the number of column vectors). If \ (P, Q, R \) is less than \ (the I, J, K \) , we can core tensor \ (\ mathcal {G} \ ) considered \ (\ mathcal {X} \ ) is a compressed version. in some cases, compressed version of the space required is much smaller than the original tensor.

  • Most fitting algorithm assumes that the column vectors factor matrix is a unit orthogonal (orthnonormal), but in fact this is not necessary in fact, CP decomposition can be regarded as a special case Tucker decomposition method: that is, when satisfied core tensor ultra diagonal tensor and \ (P = Q = R & lt \) .

  • 矩阵化后的形式为:
    \[ \mathrm{X}_{(1)} \approx \mathrm{A}\mathrm{G}_{(1)}(\mathrm{C}\otimes \mathrm{B})^\mathsf{T}, \]
    \[ \mathrm{X}_{(2)} \approx \mathrm{B}\mathrm{G}_{(1)}(\mathrm{C}\otimes \mathrm{A})^\mathsf{T}, \]
    \[ \mathrm{X}_{(3)} \approx \mathrm{C}\mathrm{G}_{(1)}(\mathrm{B}\otimes \mathrm{A})^\mathsf{T}. \]

  • In the above equation 3-dimensional tensor, for example, but this does not preclude the concept extends to the N-dimensional tensor:

\[ \mathcal{X} = \mathcal{G} \times_1 \mathrm{A}^{(1)} \times_2 \mathrm{A}^{(2)}\dots \times_N \mathrm{A}^{(N)} = [\![\mathcal{G}; \mathrm{A}^{(1)}, \mathrm{A}^{(2)}, \dots , \mathrm{A}^{(N)}]\!] \]

  • Each element is

\[ x_{i_1i_2\dots i_N}=\sum_{r_1=1}^{R_1} \sum_{r_2 = 1}^{R_2} \dots \sum_{r_N=1}^{R_N}g_{r_1r_2\dots r_N}a^{(1)}_{i_1r_1}a^{(2)}_{i_2r_2}\dots a_{i_N r_N}^{(N)} \\ \quad \text{for} \quad i_n =1,\dots,I_n, n=1,\dots,N. \]

  • Matrix version can be written as
    \ [\ mathrm {X} _ {(n)} = \ mathrm {A} ^ {(n)} \ mathrm {G} _ {(n)} (\ mathrm {A} ^ { (N)} \ otimes \ dots \ otimes \ mathrm {A} ^ {(n + 1)} \ otimes \ mathrm {A} ^ {(n-1)} \ otimes \ dots \ otimes \ mathrm {A} ^ {(1)}) ^ \ mathsf {T}. \]

  • Two other Tucker decomposition variant deserve special attention. The first is Tucker2 decomposition. As the name suggests, he is only using a 2 matrix decomposition, the third matrix is the identity matrix, so Tucker2 decomposition of a third-order tensor can be written the following form:
    ! \ [\ mathcal {X-} = \ mathcal {G} \ times_1 \ mathrm {A} \ times_2 \ mathrm {B} = [\ [\ mathcal {G} \,; \ mathrm {A, B , I}] \!]. \]
  • This fact and the original Tucker exploded no different, except \ (\ mathcal {G} \ in \ mathbb {R} ^ {P \ times Q \ times R} \) and \ (R = K \) and \ (\ the I = {C} mathrm \) ( \ (K \ K Times \) is a unit matrix). similar, Tucker1 decomposition method using only one matrix is decomposed and the remaining matrix is a unit matrix. for example, assuming that the second and the third factor matrix is the identity matrix, we can get:

\[ \mathcal{X} = \mathcal{G} \times_1 \mathrm{A} = [\![\mathcal{G}\,;\mathrm{A,I,I}]\!]. \]

  • This is equivalent to a standard 2-dimensional PCA (Principal Component Analysis). Since
    \ [\ mathrm {X} _ {(1)} = \ mathrm {A} \ mathrm {G} _ {(1)}. \]
  • These concepts can be easily extended to N-dimensional tensor: We can set up any factor matrix are a subset of the identity matrix.

  • Obviously, tensor decomposition has many choices, sometimes give us trouble selecting model based on a specific task. To learn how to choose the next 3-dimensional model, you can refer to this paper .

4.1. The n-Rank

  • Order \ (\ mathcal {X} \ ) of a size \ (I_1 \ times I_2 \ times \ dots \ times I_N \) in N tensor. Then, he n-rank, writing \ (\ text {rank } _n (\ mathcal {X} ) \) is \ (\ mathrm {X} _ {(n)} \) of the column rank (column rank). in other words, n-rank mode-n fiber is generated dimension (span) vector space. If we let \ (R_n = \ text {Rank} _n (\ mathcal {X-}) \, \ text {for} \, n-=. 1, \ DOTS, N \) , then we It can be said \ (\ mathcal {X} \ ) a \ (\ text {rank} - (R_1, R_2, \ dots, R_N) \) tensor.

  • n-rank remember not to rank (Rank), which is a minimum number of components rank confused.

  • 显然, \(R_n \leq I_n \text{ for all }n=1,\dots,N.\)

  • For a given tensor \ (\ mathcal {the X-} \) , we can find a rank easily \ (\ big (R_1, R_2 , \ dots, R_N \ text {where} R_n = rank_n (\ mathcal {X}) \ big) \ ) exact (exact) Tucker decomposition. Tucker decomposition, but if we calculate the rank lower than the set value, it is for some n \ (R_n <\ text {rank } _n (\ mathcal {X}) \ ) then we will not necessarily accurate results and the calculation becomes more difficult. the following figure shows a truncated Tucker decomposition (not necessarily an accurate Tucker decomposition was truncated from be). this decomposition will not accurately restore \ (\ mathcal {X} \ )

4.2 Calculation of decomposition Tucker

  • Calculation of a feasible way from Tucker1 Tucker decomposition algorithm to previously described, i.e., a truncated matrix capture distribution changes its mode-n fiber of (Variation) to maximize Since later research and analysis, this method later to be known as higher-order SVD (HOSVD). the researchers note that, HOSVD a credible theory is a generalization of SVD matrix, and discuss how much more computationally efficient \ (\ mathrm {X} _ {(n)} \) of the method of the front singular vectors. when for some n-, \ (R_n <\ {text} Rank _n (\ X-mathcal {}) \) , we call truncated HOSVD in fact, the core tensor HOSVD is fully orthogonal (all-orthogonal), which is the truncation (truncate) at about decomposition. For details, see this paper

  • truncatd HOSVD在最小化估计错误的范数这个角度上来说并不是最优的, 但给迭代交替最小方差法(iterative ALS algorithm)提供了一个不错的起点. 1980年, 计算三维张量的Tucker分解的ALS法TUCKALS3诞生了. 然后该法被延伸至对n维张量也有效. 同时, 一种更为有效的计算因子矩阵的方法被运用: 简单地说, 只去计算\(\mathrm{X}_{(n)}\)的主奇异向量(dominant singular vectors). 并且运用SVD来代替特征值分解(eigenvalue decomposition)或者只计算其主子空间(dominant subspace)的单位正交基底向量(orthonomal basis)即可. 这种崭新的提高效率的算法被称之为更高维正交递归(higher-order orthogonal iteration HOOI). 见下图4.4.

  • \(\mathcal{X}\)是一个\(I_1 \times I_2 \times \dots \times I_N\)尺寸的张量, 那么我们渴望解决的优化问题可以被写作:
    \[ \begin{aligned} (4.3) \quad \min_{\mathcal{G}, \mathrm{A}^{(1)},\dots,\mathrm{A^{(N)}}} &\Big|\Big| \mathcal{X} - [\![\mathcal{G}\,;\mathrm{A}^{(1)},\mathrm{A}^{(2)},\dots,\mathrm{A}^{(N)}]\!]\Big|\Big| \\ &\mathcal{G}\in\mathbb{R}^{R_1\times R_2 \times \dots \times R_N},\,\mathrm{A}^{(n)}\in \mathbb{R}^{I_n \times R_n}, \text{columnwise orthogonal for all n} \end{aligned} \]

  • 上述目标函数改写为矩阵形式后如下:
    \[ \big|\big| \text{vec}(\mathcal{X}) - (\mathrm{A}^{(N)}\otimes\mathrm{A}^{(N-1)}\otimes \dots \otimes \mathrm{A}^{(1)})\text{vec}(\mathcal{G}) \big|\big| \]

  • 显然, 核心张量\(\mathcal{G}\)必须满足
    \[ \mathcal{G} = \mathcal{X} \times_1 \mathrm{A}^{(a)\mathsf{T}} \times_2 \mathrm{A}^{(2)\mathsf{T}} \dots \times_N \mathrm{A}^{(N)\mathsf{T}}. \]

  • 那么我们就能把上述目标函数(的平方)写为:
    \[ \begin{aligned} \Big|\Big|\mathcal{X} - &[\![\mathcal{G}\,; \mathrm{A}^{(1)}, \mathrm{(2)},\dots,\mathrm{A}^{(N)}] \!]\Big|\Big|^2 \\&= ||\mathcal{X}{||}^2 - 2 \langle \mathcal{X}, [\![\mathcal{G}\,;\mathrm{A}^{(1)},\mathrm{A}^{(2)},\,\dots,\mathrm{A}^{(N)}]\!]\rangle + ||\mathcal{G}\,;\mathrm{A}^{(1)},\mathrm{A}^{(2)},\dots,\mathrm{A}^{(N)}{||}^2\\ &= ||\mathcal{X}{||}^2 - 2\langle \mathcal{X} \times_1 \mathrm{A}^{(1)\mathsf{T}}\dots \times_N \mathrm{A}^{(N)\mathsf{T}},\mathcal{G}\rangle + ||\mathcal{G}{||}^2\\ &= ||\mathcal{X}{||}^2 - 2\langle \mathcal{G},\,\mathcal{G}\rangle + ||\mathcal{G}{||}^2\\ &= ||\mathcal{X}{||}^2 - ||\mathcal{G}{||}^2\\ &= ||\mathcal{X}{||}^2 - ||\mathcal{X}\times_1 \mathrm{A}^{(1)\mathsf{T}} \times_2 \dots \times_N \mathrm{A}^{(N)\mathsf{T}}{||}^2. \end{aligned} \]

  • 此变形的细节在多篇论文中提及,如此篇论文,及这篇论文.

  • 我们仍然可以使用ALS来解(4.3)的目标函数, 由于\(||\mathcal{X}||\)是个常数, (4.3)可以被重新定义为一系列包含如下最大化问题的子问题集, 其中每个n对应了第n个成分矩阵:
    \[ (4.4)\quad\quad\max_{\mathrm{A}^{(n)}}\Big|\Big|\mathcal{X}\times_1 \mathrm{A}^{(1)\mathsf{T}} \times_2 \mathrm{A}^{(2)\mathsf{T}}\dots \times_N\mathrm{A}^{(N)\mathsf{T}}\Big|\Big|\\ \text{subject to $\mathrm{A}^{(n)}\in \mathbb{R}^{I_n\times R_n}$ and columnwise orthogonal.} \]
  • 目标函数(4.4)也可以被写作矩阵形式:
    \[ \Big|\Big|\mathrm{A}^{(n)\mathsf{T}}\mathrm{W}\Big|\Big| \text{ with } \mathrm{W} = \mathrm{X}_{(n)}(\mathrm{A}^{(N)}\otimes\dots\otimes\mathrm{A}^{(n+1)}\otimes\mathrm{A}^{(n-1)}\otimes\dots\otimes\mathrm{A}^{(1)}). \]
  • 其解可以被SVD所确定:只需将\(A^{(n)}\)定义为W的前\(R_n\)个奇异向量即可. 这种方法会收敛至一个解使得目标函数不再下降, 但并不确保收敛至一个全局最优解甚至是一个驻点.

  • 最近, Newton-Grassmann优化法也被考虑进计算Tucker分解法之中. 他使得解可以收敛至一个驻点, 且速度更快(即使每次迭代的计算成本变高了). 详情请参考
    此论文

  • 此论文着重讲述了如何选择Tucker分解法的rank, 和我们CP分解法讨论时类似, 他通过计算一个HOSVD来做出选择.

4.3. 唯一性的缺乏及克服它的方法

  • Tucker分解法的结果并不是唯一的. 考察一个普通的3维分解(4.1, 见本文开头), 令\(\mathrm{U}\in\mathbb{R}^{P\times P}\), \(\mathrm{V}\in \mathbb{R}^{Q\times Q}\)\(\mathrm{W}\in\mathbb{R}^{R\times R}\)为非奇异矩阵. 那么:
    \[ [\![\mathcal{G}\,;\mathrm{A},\mathrm{B},\mathrm{C}]\!] = [\![\mathcal{G} \times_1 \mathrm{U} \times_2 \mathrm{V} \times_3 \mathrm{W}\,;\mathrm{AU}^{-1}, \mathrm{BV}^{-1}, \mathrm{CW}^{-1}]\!]. \]

  • 换句话说, 我们可以通过将因子矩阵乘以相反的修改方式来抵消修改核心张量\(\mathcal{G}\)的影响,从而使得我们可以不影响拟合的情况下随意修改核心张量.

  • 这种自由度延伸出了一种新的分支: 选择一种修正法使得核心矩阵的大部分元素为0, 通过降低不同成分之间的互相影响来尽可能提高唯一性. 核心矩阵的超对角被证明是不可能的, 但尽可能多的使得元素为0或接近于0是可能的. 一种方式是通过简化核心来使得目标函数最小化; 另一种方式是用Jacob及类型的算法最大化对角上的值; 最后, 我们提到过HOSVD的解为一个全正交的核, 这种特殊的类型也许会被证实有用.

Tucker分解法的介绍就到此结束了. 本文仍然缺少一些例子与引用论文的原理讲述, 将在日后补足.
下一章将是最后一章: 多种其他分解法的介绍, 欢迎关注!

Guess you like

Origin www.cnblogs.com/lywangjapan/p/12089285.html