本文为《Linear algebra and its applications》的读书笔记

For easy application of the discussion to real problems that you may encounter later in your career, we choose notation that is commonly used in the statistical analysis of scientific and engineering data. Instead of $A\boldsymbol x =\boldsymbol b$ , we write $X\boldsymbol \beta=\boldsymbol y$ and refer to $X$ as the design matrix (设计矩阵), $\boldsymbol \beta$ as the parameter vector (参数向量), and $\boldsymbol y$ as the observation vector (参数向量).

Least-Squares Lines 最小二乘直线

The simplest relation between two variables $x$ and $y$ is the linear equation $y=\beta_0+\beta_1 x$ . Experimental data often produce points $x_1, y_1),...,(x_n, y_n)$ that, when graphed, seem to lie close to a line. We want to determine the parameters $\beta_0$ and $\beta_1$ that make the line as “close” to the points as possible.

Suppose $\beta_0$ and $\beta_1$ are fixed, and consider the line $y=\beta_0+\beta_1 x$ in Figure 1. Corresponding to each data point $x_j , y_j)$ there is a point $(x_j , \beta_0+\beta_1 x_j)$ on the line with the same $x$ -coordinate. We call $y_j$ the $o b s e r v e d$ value of $y$ and $\beta_0+\beta_1 x_j$ the $p r e d i c t e d$ y-value. The difference between an observed $y$ -value
and a predicted $y$ -value is called a $r e s i d u a l (余差)$ .

在这里插入图片描述

There are several ways to measure how “close” the line is to the data. The usual choice (primarily because the mathematical calculations are simple) is to add the squares of the residuals. The least-squares line is the line $y=\beta_0+\beta_1 x$ that minimizes the sum of the squares of the residuals. This line is also called a line of regression of $y$ on $x$ ( $y$ 对 $x$ 的回归直线), because any errors in the data are assumed to be only in the $y$ -coordinates. The coefficients $\beta_0, \beta_1$ of the line are called (linear) regression coefficients(回归系数).

If the measurement errors are in $x$ instead of $y$ , simply interchange the coordinates of the data before plotting the points and computing the regression line. If both coordinates are subject to possible error, then you might choose the line that minimizes the sum of the squares of the orthogonal (perpendicular) distances from the points to the line.

If the data points were on the line, the parameters $\beta_0$ and $\beta_1$ would satisfy the equations

在这里插入图片描述
We can write this system as

在这里插入图片描述
This is a least-squares problem. The square of the distance between the vectors $X$ and $\boldsymbol y$ is precisely the sum of the squares of the residuals. Computing the least-squares solution of $X\boldsymbol \beta=\boldsymbol y$ is equivalent to finding the $\boldsymbol \beta$ that determines the least-squares line in Figure 1.

A common practice before computing a least-squares line is to compute the average $\overline x$ of the original $x$ -values and form a new variable $x^* = x -\overline x$ . The new $x$ -data are said to be in mean-deviation form (平均偏差形式). In this case, the two columns of the design matrix will be orthogonal. Solution of the normal equations is simplified.

EXERCISES 14
Show that the least-squares line for the data $x_1, y_1),...,(x_n, y_n)$ must pass through $(\overline x,\overline y)$ . That is, show that $\overline x$ and $\overline y$ satisfy the linear equation $\overline y =\hat\beta_0+\hat\beta_1\overline x$ .
SOLUTION
Derive this equation from the vector equation $\boldsymbol y=X\hat\boldsymbol \beta +\boldsymbol \epsilon$ . Denote the first column of $X$ by $\boldsymbol 1$ . Use the fact that the residual vector $\boldsymbol \epsilon$ is orthogonal to the column space of $X$ and hence is orthogonal to $\boldsymbol 1$ . Thus $\sum_{i=1}^{i=n}\epsilon_i=0$ .
$\begin{aligned}\because y_i&=\hat\beta_{0}+x_i\hat\beta_{1}+\epsilon_i\\\therefore \sum_{i=1}^{i=n}y_i&=n\hat\beta_{0}+\hat\beta_{1}\sum_{i=1}^{i=n}x_i\\\therefore \overline y &=\hat\beta_0+\hat\beta_1\overline x\end{aligned}$

Given data for a least-squares problem, $x_1, y_1),...,(x_n, y_n)$ , the following abbreviations are helpful:
$\sum x=\sum_{i=1}^{n}x_i,\sum x^2=\sum_{i=1}^{n}x_i^2,\\\sum y=\sum_{i=1}^{n}y_i,\sum xy=\sum_{i=1}^{n}x_iy_i$

The normal equations for a least-squares line $\hat\beta_0 +\hat\beta_1x$ is $X^TX\boldsymbol \beta=X^T \boldsymbol y$ .
$\because X^TX=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\begin{bmatrix}\boldsymbol 1&\boldsymbol x\end{bmatrix}=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}$ The normal equation may be written in the form
$\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}\hat\boldsymbol \beta=\begin{bmatrix}\boldsymbol 1^T\\\boldsymbol x^T\end{bmatrix}\boldsymbol y=\begin{bmatrix}\sum y\\\sum xy\end{bmatrix}$ $\therefore n\hat\beta_0+\hat\beta_1\sum x=\sum y\ \ \ \ \ \ ,\ \ \ \ \hat\beta_0\sum x+\hat\beta_1\sum x^2=\sum xy$

If $X$ has 2 linearly independent columns, then
$\begin{aligned}\hat\boldsymbol \beta&=\begin{bmatrix}n&\sum x\\\sum x&\sum x^2\end{bmatrix}^{-1}\begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \\&=\frac{1}{n\sum x^2-(\sum x)^2}\begin{bmatrix}\sum x^2&-\sum x\\-\sum x&n\end{bmatrix} \begin{bmatrix}\sum y\\\sum xy\end{bmatrix} \end{aligned}$ $\therefore \hat\beta_0=\frac{\sum x^2\sum y-\sum x \sum xy}{n\sum x^2-(\sum x)^2},\hat\beta_1=\frac{n\sum xy-\sum x\sum y}{n\sum x^2-(\sum x)^2}$

Consider the following numbers.

(i) $\left\|X\hat\boldsymbol \beta\right\|^2$ —the sum of the squares of the “regression term.” Denote this number by $S S (R)$ .
(ii) $\left\|\boldsymbol y-X\hat\boldsymbol \beta\right\|^2$ —the sum of the squares for error term. Denote this number by $S S (E)$ .
(iii) $\left\|\boldsymbol y\right\|^2$ —the “total” sum of the squares of the $y$ -values. Denote this number by $S S (T)$ .

Every statistics text that discusses regression and the linear model $\boldsymbol y = X\boldsymbol \beta+\epsilon$ introduces these numbers.

EXERCISES 19
Justify the equation $S S (T) = S S (R) + S S (E)$ . This equation is extremely important in statistics, both in regression theory and in the analysis of variance.
SOLUTION
This follows from the Pythagorean Theorem (in Section 6.1).

Then $\begin{aligned}SS(E)&=SS(T)-SS(R)\\&= \left\|\boldsymbol y\right\|^2- \left\|X\hat\boldsymbol \beta\right\|^2\\&= \boldsymbol y^T\boldsymbol y-\hat\boldsymbol \beta^TX^TX\hat\boldsymbol \beta\\&=\boldsymbol y^T\boldsymbol y-(\hat\boldsymbol \beta^TX^TX\hat\boldsymbol \beta+\hat\boldsymbol \beta^TX^T\boldsymbol \epsilon)\\&= \boldsymbol y^T\boldsymbol y-\hat\boldsymbol \beta^TX^T(X\hat\boldsymbol \beta+\boldsymbol \epsilon)\\&=\boldsymbol y^T\boldsymbol y-\hat\boldsymbol \beta^TX^T\boldsymbol y\end{aligned}$

This is the standard formula for $S S (E)$ .

The General Linear Model

In some applications, it is necessary to fit data points with something other than a straight line. In the examples that follow, the matrix equation is still $X\boldsymbol \beta=\boldsymbol y$ , but the specific form of $X$ changes from one problem to the next. Statisticians usually introduce a residual vector(余差向量) $\epsilon$ , defined by $\epsilon = \boldsymbol y - X\boldsymbol \beta$ , and write
$\boldsymbol y = X\boldsymbol \beta+\epsilon$

Any equation of this form is referred to as a linear model. Once $X$ and $\boldsymbol y$ are determined, the goal is to minimize the length of $\epsilon$ , which amounts to finding a least-squares solution of $X\boldsymbol \beta=\boldsymbol y$ . In each case, the least-squares solution $\hat\boldsymbol \beta$ is a solution of the normal equations
$X^TX\boldsymbol \beta=X^T\boldsymbol y$

Least-Squares Fitting of Other Curves 其他曲线的最小二乘拟合

The next example show how to fit data by curves that have the general form
$y=\beta_0f_0(x)+\beta_1f_1(x)+...+\beta_kf_k(x)\ \ \ \ \ (2)$

where $f_0,..., f_k$ are known functions and $\beta_0,...,\beta_k$ are parameters that must be determined.

As we will see, equation (2) describes a linear model because it is linear in the unknown parameters.

EXAMPLE 2
Suppose we wish to approximate the data by an equation of the form
$y=\beta_0+\beta_1x+\beta_2x^2\ \ \ \ \ (3)$ Describe the linear model that produces a “least-squares fit” of the data by equation (3).
SOLUTION
在这里插入图片描述

The design matrix above is a Vandermonde matrix (范德蒙德矩阵)

Example 5 in Section 2.1 and Theorem 14 in Section 6.5 shows that if at least $3$ of the values $x_1, …, x_n$ are distinct, then the least-squares solution $\hat\boldsymbol \beta$ will be unique.

Multiple Regression 多重回归

Suppose an experiment involves two independent variables(独立变量)—say, $u$ and $v$ —and one dependent variable(相关变量), $y$ . A simple equation for predicting $y$ from $u$ and $v$ has the form
$=\beta_0 +\beta_1u +\beta_2v\ \ \ \ \ (4)$

A more general prediction equation might have the form
$=\beta_0 +\beta_1u +\beta_2v+\beta_3u^2 +\beta_4uv+\beta_5v^2\ \ \ \ \ (5)$

Equations (4) and (5) both lead to a linear model because they are linear in the unknown parameters (even though $u$ and $v$ are multiplied). In general, a linear model will arise whenever $y$ is to be predicted by an equation of the form
$y_0=\beta_0f_0(u, v)+\beta_1f_1(u, v)+...+\beta_kf_k(u, v)$

在这里插入图片描述

6.6 Applications to linear models

目录

Least-Squares Lines 最小二乘直线

The General Linear Model

Least-Squares Fitting of Other Curves 其他曲线的最小二乘拟合

Multiple Regression 多重回归

猜你喜欢