Problem Set 1

Problem Set 1

1 Conditions for Normal Equation Prove the following theorem: The
matrix AT A is invertible if and only if thecolumns of A are linearly
independent.

1.先证:当矩阵A的列向量组线性无关,则矩阵 A T A A^TA ATA可逆。
A T A X = 0 A^TAX=0 ATAX=0,如果 A T A A^TA ATA可逆则方程有唯一解 X = 0 X=0 X=0,
原命题等价于证明当矩阵A的列向量组线性无关,则 A T A X = 0 A^TAX=0 ATAX=0有唯一解 X = 0 X=0 X=0
X T A T A X = 0 X^TA^TAX=0 XTATAX=0,变换得 ( A X ) T A X = 0 (AX)^TAX=0 (AX)TAX=0 A X = 0 AX=0 AX=0
A = [ a 1 , a 2 , . . . , a n ] A=[a_1,a_2,...,a_n] A=[a1,a2,...,an] X = [ x 1 , x 2 , . . . , x n ] T X=[x_1,x_2,...,x_n]^T X=[x1,x2,...,xn]T,
a 1 x 1 + a 2 x 2 + . . . + a n x n = 0 a_1x_1+a_2x_2+...+a_nx_n=0 a1x1+a2x2+...+anxn=0 (1),
因为A的列向量组线性无关,所以方程有唯一解 [ x 1 , x 2 , . . . , x n ] = [ 0 , 0 , . . . , 0 ] [x_1,x_2,...,x_n]=[0,0,...,0] [x1,x2,...,xn]=[0,0,...,0],即 X = 0 X=0 X=0,原命题得证。

若矩阵A的列向量组不满足线性无关的条件,观察式(1),不保证有唯一解 X = 0 X=0 X=0,因此矩阵 A T A A^TA ATA可逆当且仅当矩阵A的列向量组线性无关。

Newton’s Method for Computing Least SquaresIn this problem,we will
prove that if we use Newton’s method solve the leastsquares
optimization problem,then we only need one iteration to converge tothe
optimal parameter O* (a) Find the Hessian of the cost function
J(e)=责〉T(OT a()- y()2 (b)Show that the first iteration of Newton’s
method gives us 0*=(XTX)-1XTg, the solution to our least square
problem.(gdenotes the vector of the fea-tures.)

2.(a)根据Hessian矩阵定义: H i , j = ∂ 2 J ( θ ) ∂ θ i ∂ θ j = ∂ ∂ θ j ( ∂ J ( θ ) ∂ θ i ) = ∂ ∂ θ j ( ∑ k = 1 m ( θ T x ( k ) − y ( k ) ) x i ( k ) ) = ∑ k = 1 m x i ( k ) x j ( k ) H_{i,j}=\frac{\partial ^2 J(\theta)}{\partial \theta _i \partial \theta _j}=\frac{\partial}{\partial \theta _j}(\frac{\partial J(\theta)}{\partial \theta _i})=\frac{\partial}{\partial \theta _j}(\sum_{k=1}^{m}(\theta ^Tx^{(k)}-y^{(k)})x^{(k)}_i)=\sum_{k=1}^{m}x^{(k)}_ix^{(k)}_j Hi,j=θiθj2J(θ)=θj(θiJ(θ))=θj(k=1m(θTx(k)y(k))xi(k))=k=1mxi(k)xj(k)
(b)设 X = [ x ( 1 ) x ( 2 ) . . . x ( m ) ] T X=[x^{(1)} x^{(2)} ... x^{(m)}]^T X=[x(1)x(2)...x(m)]T Y = [ y ( 1 ) y ( 2 ) . . . y ( m ) ] Y=[y^{(1)} y^{(2)} ... y^{(m)}] Y=[y(1)y(2)...y(m)]
可得 H = X T X H=X^TX H=XTX
对于 ▽ θ J ( θ ) ▽_{\theta}J(\theta) θJ(θ),根据梯度的含义:
▽ θ J ( θ ) = [ ∂ J ( θ ) ∂ θ 1 ∂ J ( θ ) ∂ θ 2 . . . ∂ J ( θ ) ∂ θ n ] T ▽_{\theta}J(\theta)=[\frac{\partial J(\theta)}{\partial \theta _1} \frac{\partial J(\theta)}{\partial \theta _2} ... \frac{\partial J(\theta)}{\partial \theta _n}]^T θJ(θ)=[θ1J(θ)θ2J(θ)...θnJ(θ)]T
▽ θ J ( θ ) i = ∑ k = 1 m ( θ T x ( k ) − y ( k ) ) x i ( k ) = ∑ k = 1 m θ T x ( k ) x i ( k ) − ∑ k = 1 m y ( k ) x i ( k ) = ( X T X θ − X T y → ) i ▽_{\theta}J(\theta)_i=\sum_{k=1}^{m}(\theta ^Tx^{(k)}-y^{(k)})x^{(k)}_i=\sum_{k=1}^{m}\theta ^Tx^{(k)}x^{(k)}_i-\sum_{k=1}^{m}y^{(k)}x^{(k)}_i \\ =(X^TX\theta-X^T\overrightarrow{y})_i θJ(θ)i=k=1m(θTx(k)y(k))xi(k)=k=1mθTx(k)xi(k)k=1my(k)xi(k)=(XTXTy )i
因此 ▽ θ J ( θ ) = X T X θ − X T y → ▽_{\theta}J(\theta)=X^TX\theta-X^T\overrightarrow{y} θJ(θ)=XTXTy ,
根据牛顿法迭代公式 θ : = θ − H − 1 ▽ θ J ( θ ) \theta:=\theta-H^{-1}▽_{\theta}J(\theta) θ:=θH1θJ(θ)
第一轮迭代后 θ ∗ = θ − ( X T X ) − 1 ( X T X θ − X T y → ) = ( X T X ) − 1 X T y → \theta^*=\theta-(X^TX)^{-1}(X^TX\theta-X^T\overrightarrow{y})=(X^TX)^{-1}X^T\overrightarrow{y} θ=θ(XTX)1(XTXTy )=(XTX)1XTy .

3Prediction using Linear Regression The sales of a company (in million
dollars)for each year are shown in the tablebelow. x (year) 2005 2006
2007 2008 2009 y (sales) 12 19 29 37 45 (a)Find the least square
regression line y = aa +b. (b)Use the least squares regression line as
a model to estimate the sales of the company in 2012.

3.(a)根据定理,最小二乘的唯一解为 θ = ( X T X ) − 1 X T Y \theta=(X^TX)^{-1}X^TY θ=(XTX)1XTY
根据题意设
X = [ 1 2005 1 2006 1 2007 1 2008 1 2009 ] X=\begin{bmatrix} 1 && 2005 \\ 1 && 2006 \\ 1 && 2007 \\ 1 && 2008 \\ 1 && 2009 \\ \end{bmatrix} X= 1111120052006200720082009 Y = [ 12 19 29 37 45 ] Y=\begin{bmatrix} 12 \\ 19 \\ 29 \\ 37 \\ 45 \\ \end{bmatrix} Y= 1219293745
代入计算得 θ = [ − 1.68 e + 04 ; 8.40 ] \theta=[-1.68e+04;8.40] θ=[1.68e+04;8.40]
即最小二乘回归直线为 y = 8.40 a − 1.68 e 4 y=8.40a-1.68e4 y=8.40a1.68e4.
(b)预测公司2012年的销售额为70.4百万美元。

4Logistic Regression Consider the average empirical loss for logistic
regression: m J(0)= >log (1 +e-yO a’) – >log (ho(gs))) m :1 i=1 ,i=1
where g() e {-1,1} ho(z)= g(0T .c) and g(z)=1/(1+e-2). Find the
HessianH of this function,and show that for any vector z, it holds
true that H z ≥0

4.根据Hessian矩阵定义:
∂ J ( θ ) ∂ θ i = − 1 m ∑ k = 1 m ( 1 − g ( θ T y ( k ) x ( k ) ) ) y ( k ) x i ( k ) \frac{\partial J(\theta)}{\partial \theta _i}=-\frac{1}{m}\sum_{k=1}^{m}(1-g(\theta ^Ty^{(k)}x^{(k)}))y^{(k)}x^{(k)}_i θiJ(θ)=m1k=1m(1g(θTy(k)x(k)))y(k)xi(k)
∂ J ( θ ) ∂ θ i ∂ θ j = 1 m ∑ k = 1 m g ( θ T y ( k ) x ( k ) ) ( 1 − g ( θ T y ( k ) x ( k ) ) ) ( y ( k ) ) 2 x i ( k ) x j ( k ) = H i , j \frac{\partial J(\theta)}{\partial \theta _i \partial \theta _j}=\frac{1}{m}\sum_{k=1}^{m}g(\theta ^Ty^{(k)}x^{(k)})(1-g(\theta ^Ty^{(k)}x^{(k)}))(y^{(k)})^2x^{(k)}_ix^{(k)}_j=H_{i,j} θiθjJ(θ)=m1k=1mg(θTy(k)x(k))(1g(θTy(k)x(k)))(y(k))2xi(k)xj(k)=Hi,j
注意到 y ( k ) ∈ { − 1 , 1 } y^{(k)}\in\{-1, 1\} y(k){ 1,1} ( y ( k ) ) 2 = 1 (y^{(k)})^2=1 (y(k))2=1
以及 h θ ( x ) = g ( θ T x ) h_{\theta}(x)=g(\theta ^T x) hθ(x)=g(θTx) h θ ( x ) = h θ ( − x ) h_{\theta}(x)=h_{\theta}(-x) hθ(x)=hθ(x)
H i , j = 1 m ∑ k = 1 m h θ ( x ( k ) ) ( 1 − h θ ( x ( k ) ) ) x i ( k ) x j ( k ) H_{i,j}=\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))x^{(k)}_ix^{(k)}_j Hi,j=m1k=1mhθ(x(k))(1hθ(x(k)))xi(k)xj(k)

进一步可得 H = 1 m ∑ k = 1 m h θ ( x ( k ) ) ( 1 − h θ ( x ( k ) ) ) x ( k ) x ( k ) T H=\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))x^{(k)}x^{(k)T} H=m1k=1mhθ(x(k))(1hθ(x(k)))x(k)x(k)T
z T H z = 1 m ∑ k = 1 m h θ ( x ( k ) ) ( 1 − h θ ( x ( k ) ) ) z T x ( k ) x ( k ) T z = 1 m ∑ k = 1 m h θ ( x ( k ) ) ( 1 − h θ ( x ( k ) ) ) ( z T x ( k ) ) 2 ≥ 0 z^THz=\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))z^Tx^{(k)}x^{(k)T}z \\ =\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))(z^Tx^{(k)})^2 \geq 0 zTHz=m1k=1mhθ(x(k))(1hθ(x(k)))zTx(k)x(k)Tz=m1k=1mhθ(x(k))(1hθ(x(k)))(zTx(k))20,命题得证。

猜你喜欢

转载自blog.csdn.net/qq_23096319/article/details/128949390