7.2 伪逆和线性方程 $A\mathbf{x}=\mathbf{b}$

7.2 伪逆和线性方程 A x = b A\mathbf{x}=\mathbf{b}

矩阵 4 4 个空间的正交基

1、根据 A v i = σ i u i A\mathbf{v}_i = \sigma_i\mathbf{u}_i ,大于零的奇异值 σ i > 0 \sigma_i > 0 对应的左奇异向量 u 1 , , u r \mathbf{u}_1,\cdots,\mathbf{u}_r 是矩阵 A A 列空间的正交基;

2、根据 A T u i = σ i v i A^T\mathbf{u}_i = \sigma_i\mathbf{v}_i ,等于零的奇异值 σ i = 0 \sigma_i = 0 对应的左奇异向量 u r + 1 , , u m \mathbf{u}_{r+1},\cdots,\mathbf{u}_m 是矩阵 A A 左零空间的正交基;

3、根据 A T u i = σ i v i A^T\mathbf{u}_i = \sigma_i\mathbf{v}_i ,大于零的奇异值 σ i > 0 \sigma_i > 0 对应的右奇异向量 v 1 , , v r \mathbf{v}_1,\cdots,\mathbf{v}_r 是矩阵 A A 行空间的正交基;

4、根据 A v i = σ i u i A\mathbf{v}_i = \sigma_i\mathbf{u}_i ,等于零的奇异值 σ i = 0 \sigma_i = 0 对应的右奇异向量 v r + 1 , , v n \mathbf{v}_{r+1},\cdots,\mathbf{v}_n 是矩阵 A A 零空间的正交基。

伪逆

根据 A v i = σ i u i A\mathbf{v}_i = \sigma_i\mathbf{u}_i ,矩阵 A A 把解空间 R n R^n 中的正交基 v i \mathbf{v}_i 变换为列空间 R m R^m 的向量 σ i u i \sigma_i\mathbf{u}_i ,如果存在矩阵 B B 能把列空间 R m R^m 的向量 σ i u i \sigma_i\mathbf{u}_i 变换为 R n R^n 空间中的正交基 v i \mathbf{v}_i ,则矩阵 B B 就是矩阵 A A 的逆变换!即 B σ i u i = v i B\sigma_i\mathbf{u}_i = \mathbf{v}_i B u i = 1 / σ i v i = σ i v i B\mathbf{u}_i = 1/\sigma_i \mathbf{v}_i = \sigma'_i \mathbf{v}_i ,所以有 B = σ 1 v 1 u 1 T + + σ r v r u r T = V Σ + U T B = \sigma'_1\mathbf{v}_1\mathbf{u}^T_1+\cdots+\sigma'_r\mathbf{v}_r\mathbf{u}^T_r = V\Sigma^{+} U^T 注意矩阵 Σ + \Sigma^{+} 尺寸为 ( n , m ) (n,m) ,并不是对角阵,但其前 ( r , r ) (r,r) 子矩阵 Σ r + \Sigma^{+}_r 是对角阵,对角元素为 1 / σ i > 0 1/\sigma_i > 0 ,矩阵其它元素均为 0 0 。矩阵 B = V Σ + U T B=V\Sigma^{+} U^T 称为矩阵 A = U Σ V T A=U\Sigma V^T 的伪逆,记为 A + A^{+} 或称为加号逆,或Moore-Penrose逆。

通解结构

方程 A x = b A\mathbf{x}=\mathbf{b} 的特解可表示为 x p = A + b \mathbf{x}_p=A^{+}\mathbf{b} ,零解为 v r + 1 , , v n \mathbf{v}_{r+1},\cdots,\mathbf{v}_n ,故通解为
x = x p + x z = A + b + ( k 1 v r + 1 + + k n r v n ) , k i \mathbf{x} = \mathbf{x}_p + \mathbf{x}_z = A^{+}\mathbf{b} + (k_1\mathbf{v}_{r+1} + \cdots + k_{n-r}\mathbf{v}_n),k_i是任意实数

向量 b \mathbf{b} 进行正交分解得 b = u 1 u 1 T b + + u m u m T b = b 1 U u 1 + + b m U u m = U d i a g ( b 1 U , , b m U ) , b i U = u i T b U \mathbf{b} = \mathbf{u}_1\mathbf{u}^T_1\mathbf{b} + \cdots + \mathbf{u}_m\mathbf{u}^T_m\mathbf{b} = b^U_1\mathbf{u}_1 + \cdots + b^U_m\mathbf{u}_m = U diag(b^U_1,\cdots,b^U_m),其中 b^U_i = \mathbf{u}^T_i\mathbf{b}为在坐标系 U下的坐标分量 ,代入特解公式得
x p = A + b = V Σ + U T U d i a g ( b 1 U , , b m U ) = V Σ + d i a g ( b 1 U , , b m U ) = b 1 U / σ 1 v 1 + + b r U / σ r v r \mathbf{x}_p=A^{+}\mathbf{b}\\ =V\Sigma^{+} U^TU diag(b^U_1,\cdots,b^U_m)\\ =V\Sigma^{+} diag(b^U_1,\cdots,b^U_m) \\ =b^U_1/\sigma_1\mathbf{v}_1 + \cdots + b^U_r/\sigma_r\mathbf{v}_r

故通解为
x = x p + x z = b 1 U / σ 1 v 1 + + b r U / σ r v r + ( k 1 v r + 1 + + k n r v n ) , b i = u i T b , k i \mathbf{x} = \mathbf{x}_p + \mathbf{x}_z = b^U_1/\sigma_1\mathbf{v}_1 + \cdots + b^U_r/\sigma_r\mathbf{v}_r + (k_1\mathbf{v}_{r+1} + \cdots + k_{n-r}\mathbf{v}_n),b_i= \mathbf{u}^T_i\mathbf{b}为坐标分量,k_i是任意实数

上述通解结构和矩阵 A A 为满秩矩阵时的解结构一致,下面论述。

1、当矩阵 A A 是方阵且可逆时即 r = r a n k A = m = n r = rank A = m = n ,根据 A = U Σ V T A=U\Sigma V^T 知矩阵 Σ \Sigma 可逆,故对角元素均大于零,则 A 1 = ( U Σ V T ) 1 = V Σ 1 U T A^{-1} = (U\Sigma V^T)^{-1} = V \Sigma^{-1} U^T ,矩阵 Σ 1 \Sigma^{-1} 对角元素为 1 / σ i 1/\sigma_i ,等于矩阵 Σ + \Sigma^{+} ,故 A 1 = A + A^{-1} = A^{+}

2、当矩阵 A A 是列满秩矩阵时即 r = r a n k A = n < m r = rank A = n < m ,左逆 A L 1 = ( A T A ) 1 A T = [ ( U Σ V T ) T U Σ V T ] 1 ( U Σ V T ) T = ( V Σ U T U Σ V T ) 1 ( V Σ U T ) = ( V Σ 2 V T ) 1 ( V Σ U T ) = V Σ 2 V T V Σ U T = V Σ 2 Σ U T = V Σ 1 U T A^{-1}_L = (A^TA)^{-1}A^T = [(U\Sigma V^T)^TU\Sigma V^T]^{-1}(U\Sigma V^T)^T = (V\Sigma U^TU\Sigma V^T)^{-1}(V\Sigma U^T)=(V\Sigma^2 V^T)^{-1}(V\Sigma U^T)= V\Sigma^{-2} V^TV\Sigma U^T=V\Sigma^{-2} \Sigma U^T=V\Sigma^{-1} U^T ,故 A L 1 = A + A^{-1}_L = A^{+} ,又 r = n r = n 故不存在零解。

3、当矩阵 A A 是行满秩矩阵时即 r = r a n k A = m < n r = rank A = m < n ,右逆 A R 1 = A T ( A A T ) 1 = ( U Σ V T ) T [ U Σ V T ( U Σ V T ) T ] 1 = ( V Σ U T ) ( U Σ V T V Σ U T ) 1 = ( V Σ U T ) ( U Σ 2 U T ) 1 = V Σ U T U Σ 2 U T = V Σ Σ 2 U T = V Σ 1 U T A^{-1}_R = A^T(AA^T)^{-1} = (U\Sigma V^T)^T[U\Sigma V^T(U\Sigma V^T)^T]^{-1} = (V\Sigma U^T)(U\Sigma V^TV\Sigma U^T)^{-1}=(V\Sigma U^T)(U\Sigma^2 U^T)^{-1}= V\Sigma U^TU\Sigma^{-2} U^T=V\Sigma\Sigma^{-2} U^T=V\Sigma^{-1}U^T ,故 A R 1 = A + A^{-1}_R = A^{+} ,又 r = m < n r = m < n 故存在零解。

当矩阵 A A 为秩亏矩阵时即 r = r a n k A < ( m , n ) r = rank A < (m,n) ,研究下特解 x p = A + b \mathbf{x}_p=A^{+}\mathbf{b} 具有什么性质?

A A + = ( U Σ V T ) ( V Σ + U T ) = U Σ Σ + U T = U r U r T = u 1 u 1 T + + u r u r T A x p = A ( A + b ) = ( u 1 u 1 T + + u r u r T ) b = u 1 u 1 T b + + u r u r T b AA^{+} = (U\Sigma V^T)(V\Sigma^{+} U^T)=U\Sigma \Sigma^{+} U^T = U_rU_r^T = \mathbf{u}_1\mathbf{u}^T_1 + \cdots + \mathbf{u}_r\mathbf{u}^T_r \\ A\mathbf{x}_p = A(A^{+}\mathbf{b}) = (\mathbf{u}_1\mathbf{u}^T_1 + \cdots + \mathbf{u}_r\mathbf{u}^T_r)\mathbf{b} = \mathbf{u}_1\mathbf{u}^T_1\mathbf{b} + \cdots + \mathbf{u}_r\mathbf{u}^T_r\mathbf{b} \\

A x p A\mathbf{x}_p 是向量 b \mathbf{b} 在空间 u 1 , , u r \mathbf{u}_1,\cdots,\mathbf{u}_r 的投影,空间 u 1 , , u r \mathbf{u}_1,\cdots,\mathbf{u}_r 就是矩阵 A A 列空间,所以 A x p A\mathbf{x}_p 就是向量 b \mathbf{b} 在矩阵 A A 列空间的投影即 A x p = b p A\mathbf{x}_p = \mathbf{b}_p ,矩阵 P = A A + P=AA^{+} 是投影矩阵,解 A + b A^{+}\mathbf{b} 是最小二乘解。

又因为向量 b \mathbf{b} 在列空间进行正交分解得
b = u 1 u 1 T b + + u m u m T b \mathbf{b} = \mathbf{u}_1\mathbf{u}^T_1\mathbf{b} + \cdots + \mathbf{u}_m\mathbf{u}^T_m\mathbf{b}

所以残差向量为 b b p = u r + 1 u r + 1 T b + + u m u m T b \mathbf{b} - \mathbf{b}_p = \mathbf{u}_{r+1}\mathbf{u}^T_{r+1}\mathbf{b} + \cdots + \mathbf{u}_m\mathbf{u}^T_m\mathbf{b} ,其范数为 b b p = ( u r + 1 T b ) 2 + + ( u m T b ) 2 \| \mathbf{b} - \mathbf{b}_p \| = \sqrt{(\mathbf{u}^T_{r+1}\mathbf{b})^2 + \cdots + (\mathbf{u}^T_m\mathbf{b})^2}

又因为

x p = A + b = ( σ 1 v 1 u 1 T + + σ r v r u r T ) b = σ 1 v 1 u 1 T b + + σ r v r u r T b = k 1 v 1 + + k r v r k i = σ i u i T b = u i T b / σ i \mathbf{x}_p = A^{+}\mathbf{b} = (\sigma'_1\mathbf{v}_1\mathbf{u}^T_1+\cdots+\sigma'_r\mathbf{v}_r\mathbf{u}^T_r)\mathbf{b} = \sigma'_1\mathbf{v}_1\mathbf{u}^T_1\mathbf{b}+\cdots+\sigma'_r\mathbf{v}_r\mathbf{u}^T_r\mathbf{b} = k_1\mathbf{v}_1+\cdots+k_r\mathbf{v}_r \\ k_i = \sigma'_i\mathbf{u}^T_i\mathbf{b} = \mathbf{u}^T_i\mathbf{b}/\sigma_i
即特解 x p \mathbf{x}_p 位于空间 v 1 , , v r \mathbf{v}_1,\cdots,\mathbf{v}_r ,垂直于矩阵 A A 的零空间 v r + 1 , , v n \mathbf{v}_{r+1},\cdots,\mathbf{v}_n ,所以特解 x p \mathbf{x}_p 是最小范数解。

故特解 A + b A^{+}\mathbf{b} 是最小范数最小二乘解,具有极好的性质。

因为
A + A = ( V Σ + U T ) ( U Σ V T ) = V Σ Σ + V T = V r V r T = v 1 v 1 T + + v r v r T E A + A = V n V n T V r V r T = v r + 1 v r + 1 T + + v n v n T A^{+}A = (V\Sigma^{+} U^T)(U\Sigma V^T)=V\Sigma \Sigma^{+} V^T = V_rV_r^T = \mathbf{v}_1\mathbf{v}^T_1 + \cdots + \mathbf{v}_r\mathbf{v}^T_r \\ E-A^{+}A = V_nV_n^T - V_rV_r^T = \mathbf{v}_{r+1}\mathbf{v}^T_{r+1} + \cdots + \mathbf{v}_n\mathbf{v}^T_n
A A + = U r U r T = u 1 u 1 T + + u r u r T E A A + = U m U m T U r U r T = u r + 1 u r + 1 T + + u m u m T AA^{+} = U_rU_r^T = \mathbf{u}_1\mathbf{u}^T_1 + \cdots + \mathbf{u}_r\mathbf{u}^T_r \\ E-AA^{+} = U_mU_m^T - U_rU_r^T = \mathbf{u}_{r+1}\mathbf{u}^T_{r+1} + \cdots + \mathbf{u}_m\mathbf{u}^T_m

矩阵 P = A + A P=A^{+}A 是向矩阵 A A 的行空间投影,矩阵 P = A A + P=AA^{+} 是向矩阵 A A 的列空间投影。 E A + A E-A^{+}A 是零空间映射矩阵, E A A + E-AA^{+} 是列空间残差映射矩阵。

故方程零解也可表示为
x z = ( E A + A ) a a \mathbf{x}_z = (E-A^{+}A)\mathbf{a} \\ \mathbf{a} 是任意向量
方程 A x = b A\mathbf{x}=\mathbf{b} 通解也可表示为
x = x p + x z = A + b + ( E A + A ) a a \mathbf{x} = \mathbf{x}_p + \mathbf{x}_z = A^{+}\mathbf{b} + (E-A^{+}A)\mathbf{a} \\ \mathbf{a} 是任意向量

x = x p + x z = A + b + ( k 1 v r + 1 + + k n r v n ) , k i \mathbf{x} = \mathbf{x}_p + \mathbf{x}_z = A^{+}\mathbf{b} + (k_1\mathbf{v}_{r+1} + \cdots + k_{n-r}\mathbf{v}_n),k_i是任意实数

特解的数值稳定性和正则化

x p = A + b = V Σ + U T b = ( 1 / σ 1 v 1 u 1 T + + 1 / σ r v r u r T ) b \mathbf{x}_p = A^{+}\mathbf{b} = V\Sigma^{+} U^T \mathbf{b}= (1/\sigma_1\mathbf{v}_1\mathbf{u}^T_1+\cdots+1/\sigma_r\mathbf{v}_r\mathbf{u}^T_r) \mathbf{b}

可见当奇异值 σ i \sigma_i 趋近 0 0 时, 1 / σ i 1/\sigma_i 趋近无穷大,导致特解数值变得无穷大,数值不稳定。 σ i \sigma_i 趋近 0 0 对应的分量 σ i u i v i T \sigma_i\mathbf{u}_i\mathbf{v}^T_i 占矩阵 A A 的比例很小,主要由误差造成的,理论上应该等于 0 0 ,所以希望 1 / σ i 1/\sigma_i 趋近 0 0 ,这可以采用第五章介绍的阻尼倒数法处理,此时 v i \mathbf{v}_i 应该作为一个零解。

第五章解释了正则化解为 x ^ λ = ( A T A + λ E ) 1 A T b \mathbf{\hat{x}_\lambda}=(A^TA+\lambda E)^{-1}A^T\mathbf{b} ,采用矩阵 A = U Σ V T A=U\Sigma V^T 代入上式,可以看到本质,注意矩阵 A A 是列满秩矩阵。

x ^ λ = ( V Σ 2 V T + λ V E V T ) 1 V Σ U T b = ( V ( Σ 2 + λ E ) V T ) 1 V Σ U T b = V ( Σ 2 + λ E ) 1 V T V Σ U T b = V ( Σ 2 + λ E ) 1 Σ U T b = ( σ 1 σ 1 2 + λ v 1 u 1 T + + σ n σ n 2 + λ v n u n T ) b \mathbf{\hat{x}_\lambda}=(V\Sigma^2 V^T+\lambda VEV^T)^{-1}V\Sigma U^T \mathbf{b} \\ = (V(\Sigma^2+\lambda E)V^T)^{-1}V\Sigma U^T \mathbf{b} \\ = V(\Sigma^2+\lambda E)^{-1}V^TV\Sigma U^T \mathbf{b} \\ = V(\Sigma^2+\lambda E)^{-1}\Sigma U^T \mathbf{b} \\ = (\frac{\sigma_1}{\sigma^2_1+\lambda}\mathbf{v}_1\mathbf{u}^T_1+\cdots+\frac{\sigma_n}{\sigma^2_n+\lambda}\mathbf{v}_n\mathbf{u}^T_n)\mathbf{b}

与无正则化解 x ^ 0 = 1 / σ 1 v 1 u 1 T + + 1 / σ n v n u n T \mathbf{\hat{x}_0} = 1/\sigma_1\mathbf{v}_1\mathbf{u}^T_1+\cdots+1/\sigma_n\mathbf{v}_n\mathbf{u}^T_n 对比,发现正则化就是 1 σ i \frac{1}{\sigma_i} 变换为 σ i σ i 2 + λ \frac{\sigma_i}{\sigma^2_i+\lambda} ,就是阻尼倒数法!且对所有的奇异值采用相同的参数 λ \lambda

但必须指出,奇异值 σ r \sigma_r 趋近 0 0 时不一定会导致解的数值不稳定,
根据通解为
x = x p + x z = b 1 U / σ 1 v 1 + + b r U / σ r v r + ( k 1 v r + 1 + + k n r v n ) , b i U = u i T b U , k i \mathbf{x} = \mathbf{x}_p + \mathbf{x}_z = b^U_1/\sigma_1\mathbf{v}_1 + \cdots + b^U_r/\sigma_r\mathbf{v}_r + (k_1\mathbf{v}_{r+1} + \cdots + k_{n-r}\mathbf{v}_n),b^U_i = \mathbf{u}^T_i\mathbf{b}为在坐标系 U 下的坐标分量,k_i是任意实数
如果此时有 b r U = 0 b^U_r=0 ,则解还是稳定的。即向量 b \mathbf{b} 在很小奇异值 σ u \sigma_u 对应的奇异向量 u i \mathbf{u}_i 的坐标分量也很小时,此时 b i U / σ i b^U_i/\sigma_i 是个有限值,则解稳定。

伪逆性质

先把前面重要性质罗列如下:
矩阵奇异值分解 A = U Σ V T = σ 1 u 1 v 1 T + + σ r u r v r T , r = r a n k A , U , V Σ σ 1 σ 2 σ r > 0 A = U\Sigma V^T=\sigma_1\mathbf{u}_1\mathbf{v}^T_1+\cdots+\sigma_r\mathbf{u}_r\mathbf{v}^T_r,r=rank A,U,V是正交矩阵,\Sigma是伪对角阵,\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0
A + = V Σ + U T = 1 / σ 1 v 1 u 1 T + + 1 / σ r v r u r T A T = V Σ U T = σ 1 v 1 u 1 T + + σ r v r u r T A + A = V r V r T = v 1 v 1 T + + v r v r T A A + = U r U r T = u 1 u 1 T + + u r u r T A T A = V Σ 2 V T = σ 1 2 v 1 v 1 T + + σ r 2 v r v r T A A T = U Σ 2 U T = σ 1 2 u 1 u 1 T + + σ r 2 u r u r T A^{+} = V\Sigma^{+} U^T=1/\sigma_1\mathbf{v}_1\mathbf{u}^T_1+\cdots+1/\sigma_r\mathbf{v}_r\mathbf{u}^T_r \\ A^T = V\Sigma U^T = \sigma_1\mathbf{v}_1\mathbf{u}^T_1+\cdots+\sigma_r\mathbf{v}_r\mathbf{u}^T_r \\ A^{+}A = V_rV_r^T = \mathbf{v}_1\mathbf{v}^T_1 + \cdots + \mathbf{v}_r\mathbf{v}^T_r \\ AA^{+} = U_rU_r^T = \mathbf{u}_1\mathbf{u}^T_1 + \cdots + \mathbf{u}_r\mathbf{u}^T_r \\ A^TA = V\Sigma^2 V^T = \sigma^2_1\mathbf{v}_1\mathbf{v}^T_1 + \cdots + \sigma^2_r\mathbf{v}_r\mathbf{v}^T_r \\ AA^T = U\Sigma^2 U^T = \sigma^2_1\mathbf{u}_1\mathbf{u}^T_1 + \cdots + \sigma^2_r\mathbf{u}_r\mathbf{u}^T_r \\

r = m = n r = m = n 即矩阵 A A 可逆时, A + = A A^{+} = A^{-}
r = n < m r = n < m 即矩阵 A A 列满秩时, A + = ( A T A ) 1 A T A^{+} = (A^TA)^{-1}A^T
r = m < n r = m < n 即矩阵 A A 行满秩时, A + = A T ( A A T ) 1 A^{+} = A^T(AA^T)^{-1}
矩阵 D = d i a g ( d 1 , , d n ) D=diag(d_1,\cdots,d_n) 是对角阵时,则 D + = d i a g ( d 1 + , , d n + ) D^{+} = diag(d^{+}_1,\cdots,d^{+}_n) 其中 d i + = 1 / d i f o r d i 0 e l s e 0 d^{+}_i = 1/d_i \quad for \quad d_i \ne 0 \quad else \quad 0

伪逆还具有如下性质:
A A + A = A A + A A + = A + ( A + A ) T = A + A ( A A + ) T = A A + AA^{+}A = A\\ A^{+}AA^{+} = A^{+}\\ (A^{+}A)^T=A^{+}A\\ (AA^{+})^T=AA^{+}\\
通过这四个性质可以定义伪逆,或者说伪逆 A + A^{+} 由这四个性质唯一定义,这四个性质称为 Moore-Penrose方程。

( A + ) + = A ( A T ) + = ( A + ) T = A + T ( A T A ) + = A + ( A T ) + r a n k A = r a n k A + = r a n k A + A = r a n k A A + A + = ( A T A ) + A T = A T ( A A T ) + (A^{+})^{+}=A \\ (A^T)^{+}=(A^{+})^T=A^{+T} \\ (A^TA)^{+} = A^{+}(A^T)^{+} \\ rank A = rank A^{+} = rank A^{+}A = rank AA^{+}\\ A^{+} = (A^TA)^{+}A^T = A^T(AA^T)^{+} \\

这些性质通过简单的计算即可验证,可见伪逆 A + A^{+} 和逆矩阵 A 1 A^{-1} 性质很近似,但特别注意不成立 ( A B ) + B + A + (AB)^{+} \ne B^{+}A^{+}

**对任意矩阵,伪逆存在且唯一。**这个性质十分好,这保证任意线性方程都存在最小范数最小二乘解。对比下,只有矩阵的列向量组是基时,逆才存在且唯一;只有矩阵的列向量组是无关组时,左逆才存在但不唯一;只有矩阵的行向量组是无关组时,右逆才存在但不唯一。伪逆存在性由对称矩阵谱分解定理保证,唯一性由Moore-Penrose方程保证,具体证明方法从略。

根据 A A + A = A AA^{+}A = A A A + a i = a i AA^{+}\mathbf{a}_i = \mathbf{a}_i ,又因为 A A + = U r U r T AA^{+} = U_rU^T_r U r U r T a i = a i U_rU^T_r\mathbf{a}_i = \mathbf{a}_i 即矩阵 A A 列向量都位于子空间 U r U_r

猜你喜欢

转载自blog.csdn.net/jhshanvip/article/details/106043323
7.2