6.6 Applications to linear models

本文为《Linear algebra and its applications》的读书笔记

For easy application of the discussion to real problems that you may encounter later in your career, we choose notation that is commonly used in the statistical analysis of scientific and engineering data. Instead of A x = b A\boldsymbol x =\boldsymbol b Ax=b, we write X β = y X\boldsymbol \beta=\boldsymbol y Xβ=y and refer to X X X as the design matrix (设计矩阵), β \boldsymbol \beta β as the parameter vector (参数向量), and y \boldsymbol y y as the observation vector (参数向量).

Least-Squares Lines 最小二乘直线

The simplest relation between two variables x x x and y y y is the linear equation y = β 0 + β 1 x y=\beta_0+\beta_1 x y=β0+β1x. Experimental data often produce points ( x 1 , y 1 ) , . . . , ( x n , y n ) (x_1, y_1),...,(x_n, y_n) (x1,y1),...,(xn,yn) that, when graphed, seem to lie close to a line. We want to determine the parameters β 0 \beta_0 β0 and β 1 \beta_1 β1 that make the line as “close” to the points as possible.

Suppose β 0 \beta_0 β0 and β 1 \beta_1 β1 are fixed, and consider the line y = β 0 + β 1 x y=\beta_0+\beta_1 x y=β0+β1x in Figure 1. Corresponding to each data point ( x j , y j ) (x_j , y_j) (xj,yj) there is a point ( x j , β 0 + β 1 x j ) (x_j , \beta_0+\beta_1 x_j) (xj,β0+β1xj) on the line with the same x x x-coordinate. We call y j y_j yj the o b s e r v e d observed observed value of y y y and β 0 + β 1 x j \beta_0+\beta_1 x_j β0+β1xj the p r e d i c t e d predicted predicted y-value. The difference between an observed y y y-value
and a predicted y y y-value is called a r e s i d u a l ( 余 差 ) residual(余差) residual().

在这里插入图片描述

There are several ways to measure how “close” the line is to the data. The usual choice (primarily because the mathematical calculations are simple) is to add the squares of the residuals. The least-squares line is the line y = β 0 + β 1 x y=\beta_0+\beta_1 x y=β0+β1x that minimizes the sum of the squares of the residuals. This line is also called a line of regression of y y y on x x x ( y y y x x x 的回归直线), because any errors in the data are assumed to be only in the y y y-coordinates. The coefficients β 0 , β 1 \beta_0, \beta_1 β0,β1 of the line are called (linear) regression coefficients(回归系数).

If the measurement errors are in x x x instead of y y y, simply interchange the coordinates of the data before plotting the points and computing the regression line. If both coordinates are subject to possible error, then you might choose the line that minimizes the sum of the squares of the orthogonal (perpendicular) distances from the points to the line.

If the data points were on the line, the parameters β 0 \beta_0 β0 and β 1 \beta_1 β1 would satisfy the equations

在这里插入图片描述
We can write this system as

在这里插入图片描述
This is a least-squares problem. The square of the distance between the vectors X X X and y \boldsymbol y y is precisely the sum of the squares of the residuals. Computing the least-squares solution of X β = y X\boldsymbol \beta=\boldsymbol y Xβ=y is equivalent to finding the β \boldsymbol \beta β that determines the least-squares line in Figure 1.

A common practice before computing a least-squares line is to compute the average x ‾ \overline x x of the original x x x-values and form a new variable x ∗ = x − x ‾ x^* = x -\overline x x=xx. The new x x x-data are said to be in mean-deviation form (平均偏差形式). In this case, the two columns of the design matrix will be orthogonal. Solution of the normal equations is simplified.

EXERCISES 14
Show that the least-squares line for the data ( x 1 , y 1 ) , . . . , ( x n , y n ) (x_1, y_1),...,(x_n, y_n) (x1,y1),...,(xn,yn) must pass through ( x ‾ , y ‾ ) (\overline x,\overline y) (x,y). That is, show that x ‾ \overline x x and y ‾ \overline y y satisfy the linear equation y ‾ = β ^ 0 + β ^ 1 x ‾ \overline y =\hat\beta_0+\hat\beta_1\overline x y=β^0+β^1x.
SOLUTION
Derive this equation from the vector equation y = X β ^ + ϵ \boldsymbol y=X\hat\boldsymbol \beta +\boldsymbol \epsilon y=Xβ^+ϵ. Denote the first column of X X X by 1 \boldsymbol 1 1. Use the fact that the residual vector ϵ \boldsymbol \epsilon ϵ is orthogonal to the column space of X X X and hence is orthogonal to 1 \boldsymbol 1 1. Thus ∑ i = 1 i = n ϵ i = 0 \sum_{i=1}^{i=n}\epsilon_i=0 i=1i=nϵi=0.
∵ y i = β ^ 0 + x i β ^ 1 + ϵ i ∴ ∑ i = 1 i = n y i = n β ^ 0 + β ^ 1 ∑ i = 1 i = n x i ∴ y ‾ = β ^ 0 + β ^ 1 x ‾ \begin{aligned}\because y_i&=\hat\beta_{0}+x_i\hat\beta_{1}+\epsilon_i\\\therefore \sum_{i=1}^{i=n}y_i&=n\hat\beta_{0}+\hat\beta_{1}\sum_{i=1}^{i=n}x_i\\\therefore \overline y &=\hat\beta_0+\hat\beta_1\overline x\end{aligned} yii=1i=nyiy=β^0+xiβ^1+ϵi=nβ^0+β^1i=1i=nxi=β^0+β^1x


Given data for a least-squares problem, ( x 1 , y 1 ) , . . . , ( x n , y n ) (x_1, y_1),...,(x_n, y_n) (x1,y1),...,(xn,yn), the following abbreviations are helpful:
∑ x = ∑ i = 1 n x i , ∑ x 2 = ∑ i = 1 n x i 2 , ∑ y = ∑ i = 1 n y i , ∑ x y = ∑ i = 1 n x i y i \sum x=\sum_{i=1}^{n}x_i,\sum x^2=\sum_{i=1}^{n}x_i^2,\\\sum y=\sum_{i=1}^{n}y_i,\sum xy=\sum_{i=1}^{n}x_iy_i x=i=1nxi,x2=i=1nxi2,y=i=1nyi,xy=i=1nxiyi

The normal equations for a least-squares line y = β ^ 0 + β ^ 1 x y = \hat\beta_0 +\hat\beta_1x y=β^0+β^1x is X T X β = X T y X^TX\boldsymbol \beta=X^T \boldsymbol y XTXβ=XTy.
∵ X T X = [ 1 T x T ] [ 1 x ] = [ n ∑ x ∑ x ∑ x 2 ] \because X^TX=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\begin{bmatrix}\boldsymbol 1&\boldsymbol x\end{bmatrix}=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix} XTX=[1TxT][1x]=[nxxx2]The normal equation may be written in the form
[ n ∑ x ∑ x ∑ x 2 ] β ^ = [ 1 T x T ] y = [ ∑ y ∑ x y ] \begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}\hat\boldsymbol \beta=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\boldsymbol y=\begin{bmatrix}\sum y\\\sum xy\end{bmatrix} [nxxx2]β^=[1TxT]y=[yxy] ∴ n β ^ 0 + β ^ 1 ∑ x = ∑ y        ,      β ^ 0 ∑ x + β ^ 1 ∑ x 2 = ∑ x y \therefore n\hat\beta_0+\hat\beta_1\sum x=\sum y\ \ \ \ \ \ ,\ \ \ \ \hat\beta_0\sum x+\hat\beta_1\sum x^2=\sum xy nβ^0+β^1x=y      ,    β^0x+β^1x2=xy

If X X X has 2 linearly independent columns, then
β ^ = [ n ∑ x ∑ x ∑ x 2 ] − 1 [ ∑ y ∑ x y ] = 1 n ∑ x 2 − ( ∑ x ) 2 [ ∑ x 2 − ∑ x − ∑ x n ] [ ∑ y ∑ x y ] \begin{aligned}\hat\boldsymbol \beta&=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}^{-1}\begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \\&=\frac{1}{n\sum x^2-(\sum x)^2}\begin{bmatrix}\sum x^2&-\sum x\\-\sum x&n\end{bmatrix} \begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \end{aligned} β^=[nxxx2]1[yxy]=nx2(x)21[x2xxn][yxy] ∴ β ^ 0 = ∑ x 2 ∑ y − ∑ x ∑ x y n ∑ x 2 − ( ∑ x ) 2 , β ^ 1 = n ∑ x y − ∑ x ∑ y n ∑ x 2 − ( ∑ x ) 2 \therefore \hat\beta_0=\frac{\sum x^2\sum y-\sum x \sum xy}{n\sum x^2-(\sum x)^2},\hat\beta_1=\frac{n\sum xy-\sum x\sum y}{n\sum x^2-(\sum x)^2} β^0=nx2(x)2x2yxxy,β^1=nx2(x)2nxyxy


Consider the following numbers.

(i) ∥ X β ^ ∥ 2 \left\|X\hat\boldsymbol \beta\right\|^2 Xβ^2—the sum of the squares of the “regression term.” Denote this number by S S ( R ) SS(R) SS(R).
(ii) ∥ y − X β ^ ∥ 2 \left\|\boldsymbol y-X\hat\boldsymbol \beta\right\|^2 yXβ^2—the sum of the squares for error term. Denote this number by S S ( E ) SS(E) SS(E).
(iii) ∥ y ∥ 2 \left\|\boldsymbol y\right\|^2 y2—the “total” sum of the squares of the y y y-values. Denote this number by S S ( T ) SS(T) SS(T).

Every statistics text that discusses regression and the linear model y = X β + ϵ \boldsymbol y = X\boldsymbol \beta+\epsilon y=Xβ+ϵ introduces these numbers.

EXERCISES 19
Justify the equation S S ( T ) = S S ( R ) + S S ( E ) SS(T) = SS(R) + SS(E) SS(T)=SS(R)+SS(E). This equation is extremely important in statistics, both in regression theory and in the analysis of variance.
SOLUTION
This follows from the Pythagorean Theorem (in Section 6.1).

Then S S ( E ) = S S ( T ) − S S ( R ) = ∥ y ∥ 2 − ∥ X β ^ ∥ 2 = y T y − β ^ T X T X β ^ = y T y − ( β ^ T X T X β ^ + β ^ T X T ϵ ) = y T y − β ^ T X T ( X β ^ + ϵ ) = y T y − β ^ T X T y \begin{aligned}SS(E)&=SS(T)-SS(R)\\&= \left\|\boldsymbol y\right\|^2- \left\|X\hat\boldsymbol \beta\right\|^2\\&= \boldsymbol y^T\boldsymbol y-\hat\boldsymbol \beta^TX^TX\hat\boldsymbol \beta\\&=\boldsymbol y^T\boldsymbol y-(\hat\boldsymbol \beta^TX^TX\hat\boldsymbol \beta+\hat\boldsymbol \beta^TX^T\boldsymbol \epsilon)\\&= \boldsymbol y^T\boldsymbol y-\hat\boldsymbol \beta^TX^T(X\hat\boldsymbol \beta+\boldsymbol \epsilon)\\&=\boldsymbol y^T\boldsymbol y-\hat\boldsymbol \beta^TX^T\boldsymbol y\end{aligned} SS(E)=SS(T)SS(R)=y2Xβ^2=yTyβ^TXTXβ^=yTy(β^TXTXβ^+β^TXTϵ)=yTyβ^TXT(Xβ^+ϵ)=yTyβ^TXTy

This is the standard formula for S S ( E ) SS(E) SS(E).


The General Linear Model

In some applications, it is necessary to fit data points with something other than a straight line. In the examples that follow, the matrix equation is still X β = y X\boldsymbol \beta=\boldsymbol y Xβ=y, but the specific form of X X X changes from one problem to the next. Statisticians usually introduce a residual vector(余差向量) ϵ \epsilon ϵ, defined by ϵ = y − X β \epsilon = \boldsymbol y - X\boldsymbol \beta ϵ=yXβ, and write
y = X β + ϵ \boldsymbol y = X\boldsymbol \beta+\epsilon y=Xβ+ϵ

Any equation of this form is referred to as a linear model. Once X X X and y \boldsymbol y y are determined, the goal is to minimize the length of ϵ \epsilon ϵ, which amounts to finding a least-squares solution of X β = y X\boldsymbol \beta=\boldsymbol y Xβ=y. In each case, the least-squares solution β ^ \hat\boldsymbol \beta β^ is a solution of the normal equations
X T X β = X T y X^TX\boldsymbol \beta=X^T\boldsymbol y XTXβ=XTy

Least-Squares Fitting of Other Curves 其他曲线的最小二乘拟合

The next example show how to fit data by curves that have the general form
y = β 0 f 0 ( x ) + β 1 f 1 ( x ) + . . . + β k f k ( x )       ( 2 ) y=\beta_0f_0(x)+\beta_1f_1(x)+...+\beta_kf_k(x)\ \ \ \ \ (2) y=β0f0(x)+β1f1(x)+...+βkfk(x)     (2)

where f 0 , . . . , f k f_0,..., f_k f0,...,fk are known functions and β 0 , . . . , β k \beta_0,...,\beta_k β0,...,βk are parameters that must be determined.

As we will see, equation (2) describes a linear model because it is linear in the unknown parameters.

EXAMPLE 2
Suppose we wish to approximate the data by an equation of the form
y = β 0 + β 1 x + β 2 x 2       ( 3 ) y=\beta_0+\beta_1x+\beta_2x^2\ \ \ \ \ (3) y=β0+β1x+β2x2     (3)Describe the linear model that produces a “least-squares fit” of the data by equation (3).
SOLUTION
在这里插入图片描述

The design matrix above is a Vandermonde matrix (范德蒙德矩阵)

Example 5 in Section 2.1 and Theorem 14 in Section 6.5 shows that if at least 3 3 3 of the values x 1 , … , x n x_1, …, x_n x1,,xn are distinct, then the least-squares solution β ^ \hat\boldsymbol \beta β^ will be unique.

Multiple Regression 多重回归

Suppose an experiment involves two independent variables(独立变量)—say, u u u and v v v—and one dependent variable(相关变量), y y y. A simple equation for predicting y y y from u u u and v v v has the form
y = β 0 + β 1 u + β 2 v       ( 4 ) y =\beta_0 +\beta_1u +\beta_2v\ \ \ \ \ (4) y=β0+β1u+β2v     (4)

A more general prediction equation might have the form
y = β 0 + β 1 u + β 2 v + β 3 u 2 + β 4 u v + β 5 v 2       ( 5 ) y =\beta_0 +\beta_1u +\beta_2v+\beta_3u^2 +\beta_4uv+\beta_5v^2\ \ \ \ \ (5) y=β0+β1u+β2v+β3u2+β4uv+β5v2     (5)

Equations (4) and (5) both lead to a linear model because they are linear in the unknown parameters (even though u u u and v v v are multiplied). In general, a linear model will arise whenever y y y is to be predicted by an equation of the form
y 0 = β 0 f 0 ( u , v ) + β 1 f 1 ( u , v ) + . . . + β k f k ( u , v ) y_0=\beta_0f_0(u, v)+\beta_1f_1(u, v)+...+\beta_kf_k(u, v) y0=β0f0(u,v)+β1f1(u,v)+...+βkfk(u,v)

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_42437114/article/details/108931367
6.6