解对称正定矩阵线性方程组的平方根方法

0. 引言

在求解线性方程组中,有一类比较特殊的系数矩阵即对称正定矩阵的方程求解。这一类矩阵方程使用普通线性方程求解的方法比较繁琐。平方根方法是一种对称正定矩阵的一种三角分解方法,下面介绍这一中方法。

1. 对称正定矩阵线性方程组

1.1 对称正定矩阵及其三角分解法

什么是对称形式的正定矩阵?
定义 A ∈ R n × n A\in{\mathbb{R}^{n\times{n}}} ARn×n,如果矩阵 A A A满足:
(1) A T = A A^{T}=A AT=A
(2) ∀ x ⃗ ∈ R n \forall \vec{x}\in{\mathbb{R}^{n}} x Rn并且 x ⃗ ≠ 0 , x ⃗ A x ⃗ > 0 \vec{x}\neq{0},\vec{x}A\vec{x}>0 x =0,x Ax >0
则称 A A A为对称正定矩阵。
性质 设矩阵 A A A是对称正定阵,那么它必有以下的性质:
(1) A A A是非奇异矩阵,并且 A − 1 A^{-1} A1也是对称正定阵;
(2) A A A的顺序主子阵 A k , ( k = 1 , 2 , . . . , n ) A_{k},(k=1,2,...,n) Ak,(k=1,2,...,n)是对称正定阵;
(3) A A A的顺序主子式都大于零,即 det ⁡ ( A k ) > 0 , ( k = 1 , 2 , . . . , n ) \det(A_{k})>0,(k=1,2,...,n) det(Ak)>0,(k=1,2,...,n)
(4) A A A的特征值 λ i ( A ) > 0 , ( i = 1 , 2 , . . . , n ) \lambda_{i}(A)>0,(i=1,2,...,n) λi(A)>0,(i=1,2,...,n)

对于性质(1),由于 A T = A A^{T}=A AT=A,所以 A − 1 = ( A T ) − 1 = ( A − 1 ) T A^{-1}=(A^{T})^{-1}=(A^{-1})^{T} A1=(AT)1=(A1)T,证毕。
对于性质(3),设 λ \lambda λ A ∈ R n × n A\in{\mathbb{R}^{n\times{n}}} ARn×n的一个实特征值, x ∈ R n x\in{R^{n}} xRn λ \lambda λ对应的特征向量,那么 A x = λ x Ax=\lambda x Ax=λx,故而 λ x T x = x T A x > 0 \lambda x^{T}x=x^{T}Ax>0 λxTx=xTAx>0,从而有 λ > 0 \lambda>0 λ>0

对称正定矩阵的判别方法(充分条件)
(1) 如果 A ∈ R n × n A\in{\mathbb{R}^{n\times{n}}} ARn×n是对称矩阵,并且 det ⁡ A k > 0 , ( k = 1 , 2 , . . . , n ) \det{A_{k}}>0,(k=1,2,...,n) detAk>0,(k=1,2,...,n),那么 A A A是正定阵。
(2) 如果 A ∈ R n × n A\in{\mathbb{R}^{n\times{n}}} ARn×n是对称矩阵, A A A的特征值 λ i > 0 , ( i = 1 , 2 , . . . , n ) \lambda_{i}>0,(i=1,2,...,n) λi>0,(i=1,2,...,n),则 A A A是对称正定阵。
对称矩阵的三角分解方法
首先我们引入下面的定理:
定理一:对于任意矩阵 X ∈ R n × n X\in{\mathbb{R}^{n\times{n}}} XRn×n,若 X X X的各阶顺序主子式均不为零,那么 X X X具有唯一的Doolittle分解: X = L ~ U ~ X=\tilde{L}\tilde{U} X=L~U~,其中 L ~ , U ~ \tilde{L},\tilde{U} L~,U~分别为下三角矩阵和上三角矩阵。
定理二:若矩阵 X ∈ R n × n X\in{\mathbb{R}^{n\times{n}}} XRn×n是非奇异矩阵,并且Doolittle分解存在,则Doolittle分解是唯一的。
特别地,对于三角矩阵 L ~ , U ~ \tilde{L},\tilde{U} L~,U~我们可以分解为 L ~ = L D 1 , U ~ = D 2 U \tilde{L}=LD_{1},\tilde{U}=D_{2}U L~=LD1,U~=D2U,其中 L , U L,U L,U是单位三角矩阵, D 1 , D 2 D_{1},D_{2} D1,D2是对角矩阵。所以对于矩阵 A A A,则有
X = L ~ U ~ = L D 1 D 2 U X=\tilde{L}\tilde{U}=LD_{1}D_{2}U X=L~U~=LD1D2U

D = D 1 D 2 D=D_{1}D_{2} D=D1D2,所以对于一般矩阵来说,可以分解为一个下三角矩阵、对角矩阵和一个上三角矩阵的乘积,并且这样的分解唯一:
X = L D U X=LDU X=LDU

现在,我们设 A A A是实对称正定矩阵,根据定理一以及对称正定矩阵的性质我们可以知道,存在Doolittle分解,所以必存在唯一的上三角矩阵 L L L,对角矩阵 D D D以及下三角矩阵 U U U以及对角矩阵 D D D,使得
A = L D U A=LDU A=LDU

根据对称矩阵的性质可知 A = A T A=A^{T} A=AT,所以得到
L D U = U T D T L T = U T D L T LDU=U^{T}D^{T}L^{T}=U^{T}DL^{T} LDU=UTDTLT=UTDLT
由于唯一性可以得到 L = U T , U = L T L=U^{T},U=L^{T} L=UT,U=LT,所以有
A = L T D L A=L^{T}DL A=LTDL
其中,对角矩阵 D = diag ( d 1 , d 2 , . . . , d n ) D=\text{diag}(d_{1},d_{2},...,d_{n}) D=diag(d1,d2,...,dn)的元素 d k d_{k} dk由下面的表达式给出
{ d 1 = D 1 > 0 d k = D k D k − 1 > 0 , k = 2 , 3 , . . , n \begin{cases} d_{1}=D_{1}>0\\ d_{k}=\frac{D_{k}}{D_{k-1}}>0&,k=2,3,..,n \end{cases} { d1=D1>0dk=Dk1Dk>0,k=2,3,..,n
其中 D k = det ⁡ ( A k ) D_{k}=\det(A_{k}) Dk=det(Ak) A A A的顺序主子式。从而得到
D = diag ( d 1 , d 2 , . . . , d n ) ⋅ diag ( d 1 , d 2 , . . . , d n ) D=\text{diag}(\sqrt{d_{1}},\sqrt{d_{2}},...,\sqrt{d_{n}})\cdot{\text{diag}(\sqrt{d_{1}},\sqrt{d_{2}},...,\sqrt{d_{n}})} D=diag(d1 ,d2 ,...,dn )diag(d1 ,d2 ,...,dn )

D 1 / 2 = diag ( d 1 , d 2 , . . . , d n ) D^{1/2}=\text{diag}(\sqrt{d_{1}},\sqrt{d_{2}},...,\sqrt{d_{n}}) D1/2=diag(d1 ,d2 ,...,dn )那么上式就可以写为 D = D 1 / 2 D 1 / 2 D=D^{1/2}D^{1/2} D=D1/2D1/2(唯一)。从而得到
A = L T D 1 / 2 D 1 / 2 L = M T M A=L^{T}D^{1/2}D^{1/2}L=M^{T}M A=LTD1/2D1/2L=MTM

其中 M M M为下三角矩阵并且对角线元素大于 0 0 0。所以我们有以下的分解定理:
定理 对称正定矩阵的Cholesdy分解:若 A ∈ R n × n A\in{\mathbb{R}^{n\times{n}}} ARn×n为正定阵,则存在唯一的具有正对角元的下三角阵 M M M,使得 A = M T M A=M^{T}M A=MTM
通过上述的定理推论,我们来求解对称正定矩阵中的线性方程组求法。

1.2 平方根法

基本求解思路:
(1) 分解对称正定矩阵 A = M T M A=M^{T}M A=MTM
(2) 求解 A x = b Ax=b Ax=b即求解 M T M x = b M^{T}Mx=b MTMx=b
所以最终求解方法即为:
{ M T y = b ,  solve  y M x = y ,  solve  x \begin{cases} M^{T}y=b&,\text{ solve } y\\ Mx=y&,\text{ solve } x \end{cases} { MTy=bMx=y, solve y, solve x
n n n阶对称正定矩阵 A A A有分解 A = L L T A=LL^{T} A=LLT,令
A = [ a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n : : : : a n 1 a n 2 . . . a n n ] = [ l 11 l 21 l 22 : : l n 1 l n 2 . . . l n n ] [ l 11 l 12 . . . l 1 n l 22 . . . l 2 n : : l n n ] A=\left [\begin{array}{cccc} a_{11}& a_{12}&...&a_{1n}\\ a_{21}&a_{22}&...&a_{2n}\\ :&:&:&:\\ a_{n1}&a_{n2}&...&a_{nn}\\ \end{array}\right] =\left [\begin{array}{cccc} l_{11}\\ l_{21}&l_{22}\\ :&:\\ l_{n1}&l_{n2}&...&l_{nn} \end{array}\right] \left [\begin{array}{cccc} l_{11}&l_{12}&...&l_{1n}\\ &l_{22}&...&l_{2n}\\ &&:&:\\ &&&l_{nn}\\ \end{array}\right] A=a11a21:an1a12a22:an2......:...a1na2n:ann=l11l21:ln1l22:ln2...lnnl11l12l22......:l1nl2n:lnn

可以看出,必有以下的等式关系:
a i j = ∑ k = 1 n l i k l j k = ∑ j − 1 k = 1 l i k l j k + l i j l j j a_{ij}=\sum_{k=1}^{n}l_{ik}l_{jk}=\sum_{j-1}^{k=1}l_{ik}l_{jk}+l_{ij}l_{jj} aij=k=1nlikljk=j1k=1likljk+lijljj

其中,当 k > j k>j k>j时候, l j k = 0 l_{jk}=0 ljk=0,所以我们求解对称正定方程组 A x = b Ax=b Ax=b时候,计算过程如下所示:
输入 对称矩阵 A A A以及 b b b

  1. 分解计算 A = L L T A=LL^{T} A=LLT
     repeat  i = 1  to  n \text{ repeat }i=1 \text{ to } n  repeat i=1 to n
    l i j = ( a i j − ∑ k = 1 j − 1 l i k l j k ) / l j j , ( j = 1 , 2 , . . . , i − 1 ) ( j < i ) l i i = ( a i i − ∑ k = 1 i − 1 l i k 2 ) 1 / 2 ( i = 1 , 2 , . . . , n ) l_{ij}=(a_{ij}-\sum_{k=1}^{j-1}l_{ik}l_{jk})/l_{jj},(j=1,2,...,i-1)(j<i)\\ l_{ii}=(a_{ii}-\sum_{k=1}^{i-1}l_{ik}^{2})^{1/2}(i=1,2,...,n) lij=(aijk=1j1likljk)/ljj,(j=1,2,...,i1)(j<i)lii=(aiik=1i1lik2)1/2(i=1,2,...,n)
     end  \text{ end }  end 
  2. 求解计算
    { L y = b ,  solve  y L T x = y ,  solve  x \begin{cases} Ly=b&,\text{ solve } y\\ L^{T}x=y&,\text{ solve } x \end{cases} { Ly=bLTx=y, solve y, solve x
    repeat i = 1 to n \text{repeat i = 1 to n} repeat i = 1 to n
    y i = ( b i − ∑ k = 1 i − 1 l i k y k ) / l i i y_{i}=(b_{i}-\sum_{k=1}^{i-1}l_{ik}y_{k})/l_{ii} yi=(bik=1i1likyk)/lii
    end \text{end} end
     repeat  i = n  to  1 \text{ repeat }i=n \text{ to } 1  repeat i=n to 1
    x i = ( y i − ∑ k = i + 1 n l k i x k ) / l i i x_{i}=(y_{i}-\sum_{k=i+1}^{n}l_{ki}x_{k})/l_{ii} xi=(yik=i+1nlkixk)/lii
    end \text{end} end

输出 未知向量 x x x的解

平方根方法有一些优点,
(1) 计算量比较小,计算复杂度为 O ( n 3 ) O(n^{3}) O(n3)(实际上大约是 n 3 / 6 n^{3}/6 n3/6次乘除法),是一般矩阵 A A A的Doolittle分解计算量的一半。
(2) 计算数值稳定。
缺点:计算 l i i l_{ii} lii的时候需要开方,增加了计算量。

1.3 改进平方根法

设对称正定矩阵 A A A有分解 A = L D L T A=LDL^{T} A=LDLT,其中 L L L为单位下三角阵, D D D为对角阵,即有
A = [ a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n : : : : a n 1 a n 2 . . . a n n ] = [ l 11 l 21 l 22 : : l n 1 l n 2 . . . l n n ] [ d 1 d 2 . . . d n ] [ l 11 l 12 . . . l 1 n l 22 . . . l 2 n : : l n n ] A= \left [\begin{array}{cccc} a_{11}&a_{12}&...&a_{1n}\\ a_{21}&a_{22}&...&a_{2n}\\ :&:&:&:\\ a_{n1}&a_{n2}&...&a_{nn}\\ \end{array}\right] =\left [\begin{array}{cccc} l_{11}\\ l_{21}&l_{22}\\ :&:\\ l_{n1}&l_{n2}&...&l_{nn} \end{array}\right] \left [\begin{array}{cccc} d_{1}\\ &d_{2}\\ &&...\\ &&&d_{n}\\ \end{array}\right] \left [\begin{array}{cccc} l_{11}&l_{12}&...&l_{1n}\\ &l_{22}&...&l_{2n}\\ &&:&:\\ &&&l_{nn}\\ \end{array}\right] A=a11a21:an1a12a22:an2......:...a1na2n:ann=l11l21:ln1l22:ln2...lnnd1d2...dnl11l12l22......:l1nl2n:lnn
d 1 = a 11 d_{1}=a_{11} d1=a11,则
a i j = ∑ k = 1 n ( L D ) i k ( L T ) k j = ∑ k = 1 j − 1 ( l i k d k ) l j k + l i j d j a_{ij}=\sum_{k=1}^{n}(LD)_{ik}(L^{T})_{kj}=\sum_{k=1}^{j-1}(l_{ik}d_{k})l_{jk}+l_{ij}d_{j} aij=k=1n(LD)ik(LT)kj=k=1j1(likdk)ljk+lijdj

特别地,和平方根方法类似,当 k > j k>j k>j时, l j k = 0 l_{jk}=0 ljk=0。故而有以下的 n n n阶对称正定矩阵 A A A L D L T LDL^{T} LDLT分解公式:
(1) d 1 = a 11 d_{1}=a_{11} d1=a11
(2) 当 i = 2 , 3 , . . . , n i=2,3,...,n i=2,3,...,n时,
l i j = ( a i j − ∑ k = 1 j − 1 l i k d k l j k ) / d j , ( j = 1 , 2 , . . , i − 1 ) d i = a i i − ∑ k = 1 i − 1 l i k 2 d k l_{ij}=(a_{ij}-\sum_{k=1}^{j-1}l_{ik}d_{k}l_{jk})/d_{j},(j=1,2,..,i-1)\\ d_{i}=a_{ii}-\sum_{k=1}^{i-1}l_{ik}^{2}d_{k} lij=(aijk=1j1likdkljk)/dj,(j=1,2,..,i1)di=aiik=1i1lik2dk

综上所述,改进平方根方法的计算过程如下所示:
输入 矩阵系数 A A A以及 b b b

  1. 分解计算 A = L D L T A=LDL^{T} A=LDLT
    d 1 = a 11 d_{1}=a_{11} d1=a11
    repeat i = 2  to  n \text{repeat} i=2 \text{ to } n repeati=2 to n
    t i j = l i j d j t_{ij}=l_{ij}d_{j} tij=lijdj
    t i j = a i j − ∑ k = 1 j − 1 t i k l j k , ( j = 1 , 2 , . . . , i − 1 ) t_{ij}=a_{ij}-\sum\limits_{k=1}^{j-1}t_{ik}l_{jk},(j=1,2,...,i-1) tij=aijk=1j1tikljk,(j=1,2,...,i1)
    l i j = t i j / d j , ( j = 1 , 2 , . . . , i − 1 ) l_{ij}=t_{ij}/d_{j},(j=1,2,...,i-1) lij=tij/dj,(j=1,2,...,i1)
    d i = a i i − ∑ k = 1 i − 1 t i k l i k d_{i}=a_{ii}-\sum\limits_{k=1}^{i-1}t_{ik}l_{ik} di=aiik=1i1tiklik
    end \text{end} end
  2. 求解计算
    { L y = b ,  solve  y L T x = D − 1 y ,  solve  x \begin{cases} Ly=b&,\text{ solve }y\\ L^{T}x=D^{-1}y&,\text{ solve }x\\ \end{cases} { Ly=bLTx=D1y, solve y, solve x
    y 1 = b 1 y_{1}=b_{1} y1=b1
    y i = b i − ∑ k = 1 i − 1 l i k y k , ( i = 2 , . . . , n ) y_{i}=b_{i}-\sum\limits_{k=1}^{i-1}l_{ik}y_{k},(i=2,...,n) yi=bik=1i1likyk,(i=2,...,n)
    x n = y n / d n x_{n}=y_{n}/d_{n} xn=yn/dn
    x i = y i / d i − ∑ k = i + 1 n l k i x k , ( i = n − 1 , . . , 2 , 1 ) x_{i}=y_{i}/d_{i}-\sum\limits_{k=i+1}^{n}l_{ki}x_{k},(i=n-1,..,2,1) xi=yi/dik=i+1nlkixk,(i=n1,..,2,1)

输出 未知向量 x x x
改进平方根算法中具有以下的优点:
(1) 没有开方运算;
(2) 计算量比较小,大约为 n 3 / 6 n^{3}/6 n3/6次乘除法运算,同平方根方法类似,是Doolittle分解方法中计算量的一半,是一种有效的求解对称正定矩阵方程组的方法。
(3) 计算精度较高,不需要选定主元。
缺点:元素的存储量比较大。

1.4 代码实现

平方根方法和改进版平方根方法的核心代码大同小异。平方根方法的核心代码如下所示:

def cholesky_LLT(A):
    if A.ndim!=2 :
        raise ValueError("size of matrix doesn't match!")
    m,n = A.shape
    if m!=n:
        raise ValueError("size of matrix doesn't match!")
    # calculate the matrix L
    L = np.zeros(A.shape,dtype = np.float)
    n = A.shape[0]
    for i in range(0,n):
        for j in range(0,n):
            if i < j+1:
                if j>0:
                    L[j,j] = np.sqrt(A[j,j]-np.sum(L[j,0:j]*L[j,0:j]))
                else:
                    L[j,j] = np.sqrt(A[j,j])
            else:
                if j>0:
                    L[i,j] = (A[i,j] - np.sum(L[i,0:j]*L[j,0:j]))/L[j,j]
                else:
                    L[i,j] = A[i,j]/L[j,j]
    return L
def cholesky_LLT_solve(A,b):
    n,_ = A.shape
    L = cholesky_LLT(A)
    y = np.zeros(n)
    x = y.copy()
    for i in range(0,n):
        if i == 0:
            y[i] = b[i]/L[i,i]
        else:
            y[i] = (b[i] - np.sum(L[i,0:i]*y[0:i]))/L[i,i]
    for i in range(n-1,-1,-1):
        if i == n-1:
            x[i] = y[i]/ L[i,i]
        else:
            x[i] = (y[i] - np.sum(L[i+1:n,i]*x[i+1:n]))/L[i,i]
    return x

改进版平方根方法的核心代码如下所示:

def cholesky_LDLT(A):
    if A.ndim != 2:
        raise ValueError("size of matrix doesn't match!")
    m, n = A.shape
    if m != n:
        raise ValueError("size of matrix doesn't match!")
    # calculate the matrix L
    L = np.zeros(A.shape, dtype=np.float)
    T = np.zeros(A.shape,dtype=np.float)
    n = A.shape[0]
    D = np.zeros(n,dtype=np.float)
    for i in range(0,n):
        if i >0:
            for j in range(0,n):
                if i<j+1:
                    pass
                else:
                    if j > 0:
                        T[i, j] = A[i, j] - np.sum(T[i, 0:j] * L[j, 0:j])
                        L[i,j] = T[i,j]/D[j]
                    else:
                        T[i, j] = A[i, j]
                        L[i,j] = T[i,j]/D[j]
            D[i] = A[i,i] - np.sum(T[i,0:i]*L[i,0:i])
        else:
            D[i] = A[i,i]
    return L,D
def cholesky_LDLT_solve(A,b):
    n, _ = A.shape
    L,D = cholesky_LDLT(A)
    y = np.zeros(n)
    x = y.copy()
    for i in range(0,n):
        if i>0:
            y[i] = b[i] - np.sum(L[i,0:i]*y[0:i])
        else:
            y[i] = b[i]
    for i in range(n-1,-1,-1):
        if i == n-1:
            x[i] = y[i]/D[i]
        else:
            x[i] = y[i]/D[i]-np.sum(L[i+1:n,i]*x[i+1:n])
    return x

完整的代码版本参见笔者github

2. 应用举例:最小二乘法多项式拟合函数

2.1 最小二乘法拟合多项式

假设我们现在有一个数据集:
D = { x k , y k } k = 1 N D=\{x_{k},y_{k}\}_{k=1}^{N} D={ xk,yk}k=1N

对于给定的数据集,我们使用 M M M阶多项式进行曲线拟合,即
f ( x ) = ∑ j = 0 M w j x j f(x)=\sum_{j=0}^{M}w_{j}x^{j} f(x)=j=0Mwjxj

为了使得拟合出的近似曲线能够尽量反映出所给数据的变化趋势,所以我们要求在所有数据点上的残差都比较小:
δ k = ∣ f ( x k ) − y k ∣ \delta_{k}=|f(x_{k})-y_{k}| δk=f(xk)yk

为了达到上述目标值,我们使用平方项之和使得上述求解的值最小:
Q = ∑ k = 1 N δ k 2 = ∑ k = 1 N ( y k − f ( x k ) ) 2 Q=\sum_{k=1}^{N}\delta_{k}^{2}=\sum_{k=1}^{N}(y_{k}-f(x_{k}))^{2} Q=k=1Nδk2=k=1N(ykf(xk))2

称这种方法为最小二乘法原则。
为了求解出使得总残差项 Q Q Q最小,我们对每一个参数 w k w_{k} wk进行求导,并且使其导数值为0:
∂ Q ∂ w j = ∑ k = 1 N ∂ f ∂ w j ⋅ 2 ( y k − f ( x k ) ) = 2 ∑ k = 1 N x k j ( y k − f ( x k ) ) = 0 \frac{\partial Q}{\partial w_{j}}=\sum_{k=1}^{N}\frac{\partial f}{\partial w_{j}}\cdot{2(y_{k}-f(x_{k}))}=2\sum_{k=1}^{N}x_{k}^{j}(y_{k}-f(x_{k}))=0 wjQ=k=1Nwjf2(ykf(xk))=2k=1Nxkj(ykf(xk))=0

对于每一个求导值,会有以下的一个结果:
∂ Q ∂ w j = ∑ k = 1 N ∂ f ∂ w j ⋅ 2 ( y k − f ( x k ) ) = 2 ∑ k = 1 N x k j ( y k − f ( x k ) ) = 0 \frac{\partial Q}{\partial w_{j}}=\sum_{k=1}^{N}\frac{\partial f}{\partial w_{j}}\cdot{2(y_{k}-f(x_{k}))}=2\sum_{k=1}^{N}x_{k}^{j}(y_{k}-f(x_{k}))=0 wjQ=k=1Nwjf2(ykf(xk))=2k=1Nxkj(ykf(xk))=0

所以得到

∑ k = 1 N x k j f ( x k ) = ∑ k = 1 N x k j y k \sum_{k=1}^{N}x_{k}^{j}f(x_{k})=\sum_{k=1}^{N}x_{k}^{j}y_{k} k=1Nxkjf(xk)=k=1Nxkjyk

从而得到一个线性方程组:
[ N ∑ k = 1 N x k . . . ∑ k = 1 N x k j . . . ∑ k = 1 N x k M ∑ k = 1 N x k ∑ k = 1 N x k 2 . . . ∑ k = 1 N x k j + 1 . . . ∑ k = 1 N x k M + 1 : : : : : : ∑ k = 1 N x k M ∑ k = 1 N x k M + 2 . . . ∑ k = 1 N x k j + M . . . ∑ k = 1 N x k 2 M ] [ w 0 w 1 : w j : w M ] = [ ∑ k = 1 N y k ∑ k = 1 N x k y k : ∑ k = 1 N x k j y k : ∑ k = 1 N x k M y k ] \left [\begin{array}{cccc} N &\sum\limits_{k=1}^{N}x_{k} & ... &\sum\limits_{k=1}^{N}x_{k}^{j} & ... & \sum\limits_{k=1}^{N}x_{k}^{M} \\ \sum\limits_{k=1}^{N}x_{k} &\sum\limits_{k=1}^{N}x_{k}^{2} & ... &\sum\limits_{k=1}^{N}x_{k}^{j+1} & ...& \sum\limits_{k=1}^{N}x_{k}^{M+1} \\ : & : &: &:&: &:\\ \sum\limits_{k=1}^{N}x_{k}^{M} &\sum\limits_{k=1}^{N}x_{k}^{M+2} & ... &\sum\limits_{k=1}^{N}x_{k}^{j+M} & ...& \sum\limits_{k=1}^{N}x_{k}^{2M} \\ \end{array}\right] \left [\begin{array}{cccc} w_{0}\\ w_{1}\\ :\\ w_{j}\\ :\\ w_{M} \end{array}\right]= \left [\begin{array}{cccc} \sum\limits_{k=1}^{N}y_{k}\\ \sum\limits_{k=1}^{N}x_{k}y_{k}\\ :\\ \sum\limits_{k=1}^{N}x_{k}^{j}y_{k}\\ :\\ \sum\limits_{k=1}^{N}x_{k}^{M}y_{k}\\ \end{array}\right] Nk=1Nxk:k=1NxkMk=1Nxkk=1Nxk2:k=1NxkM+2......:...k=1Nxkjk=1Nxkj+1:k=1Nxkj+M......:...k=1NxkMk=1NxkM+1:k=1Nxk2Mw0w1:wj:wM=k=1Nykk=1Nxkyk:k=1Nxkjyk:k=1NxkMyk

显然,求解参数的值化为求解线性方程组值的问题。系数矩阵是一个实对称矩阵,那么求解这个问题,首先计算系数矩阵,然后通过Cholesdy分解的办法实现系数的求解过程。
为防止数据的过拟合现象,一般情况下添加有系数的正则化项来减缓数据的过拟合因素:
Q = ∑ k = 1 N δ k 2 + λ 2 ∣ ∣ w ∣ ∣ 2 = ∑ k = 1 N ( y k − f ( x k ) ) 2 + λ 2 ∑ k = 1 M w k 2 Q=\sum_{k=1}^{N}\delta_{k}^{2}+\frac{\lambda}{2}||w||^{2}=\sum_{k=1}^{N}(y_{k}-f(x_{k}))^{2}+\frac{\lambda}{2}\sum_{k=1}^{M}w_{k}^{2} Q=k=1Nδk2+2λw2=k=1N(ykf(xk))2+2λk=1Mwk2

所以求导之后变为
∂ Q ∂ w j = 2 ∑ k = 1 N x k j ( y k − f ( x k ) ) + λ w j \frac{\partial Q}{\partial w_{j}}=2\sum_{k=1}^{N}x_{k}^{j}(y_{k}-f(x_{k}))+\lambda w_{j} wjQ=2k=1Nxkj(ykf(xk))+λwj


∑ k = 1 N x k j y k = ∑ k = 1 N x k j f ( x k ) − λ 2 w j \sum_{k=1}^{N}x_{k}^{j}y_{k}=\sum_{k=1}^{N}x_{k}^{j}f(x_{k})-\frac{\lambda}{2} w_{j} k=1Nxkjyk=k=1Nxkjf(xk)2λwj

用归纳法可以证明,当 λ = 0 \lambda=0 λ=0时候,系数矩阵 A A A是实对称正定矩阵。

2.2 代码实现

我们这里使用到了numpy库来实现上述的对称矩阵线性方程组的算法。首先我们随机的生成一些数据。
首先,我们生成一些多项式拟合的数据,数据在图中如下所示:
随机生成数据表示图
算法的核心代码如下所示

import numpy as np
from model import cholesky_LDLT_solve
class Polynomial:
    def __init__(self,degree,lambd=0):
        self.degree = degree
        self.lambd = lambd
        self.Mat = None
        self.Bais = None
        self.weight = None
    def fit(self,input,target):
        if input.shape != target.shape:
            raise ValueError("size of matrix doesn't match!")
        degree = self.degree
        self.Mat = np.zeros((degree,degree))
        self.Bais = np.zeros(degree)
        for k in range(degree):
            for j in range(degree):
                self.Mat[k,j] = np.sum(np.power(input,k+j))
        for j in range(degree):
            self.Mat[k,j] -= self.lambd/2
            self.Bais[j] = np.sum(np.power(input,j)*target)
        self.weight = cholesky_LDLT_solve(self.Mat,self.Bais)
    def regress(self,input):
        if type(input)==np.ndarray:
            if input.ndim !=1:
                raise ValueError("size of matrix doesn't match!")
            output = np.zeros(input.shape)
            for i in range(len(input)):
                output[i] = np.sum(np.power(input[i],np.arange(self.degree))*self.weight)
            return output
        elif type(input) == float or type(input) == int:
            output = np.sum(np.power(input,np.arange(self.degree))*self.weight)
            return output
        else:
            raise ValueError("Unknown type:%s"%type(input))

数据模拟的效果
数据的拟合结果
完整的代码参见笔者的github

参考文献

[1] https://wenku.baidu.com/view/1c454f27af45b307e8719736.html
[2] https://blog.csdn.net/orangefly0214/article/details/80383456
[3] Pattern recognition and machine learning,BiShop.

猜你喜欢

转载自blog.csdn.net/Zhang_Pro/article/details/105521417