Python 算法:线性回归及相关公式推导

0 前言

本文以一个小小的案例展开,主要讲解了线性回归的步骤、常用的两种求最优解的方法(最小二乘法和sklearn回归算法及算法原理)及相关函数、公式的过程推导。

相关环境:
Windows 64位
Python3.9
scikit-learn==1.0.2
pandas==1.4.2
numpy==1.21.5
matplotlib==3.5.1

1 案例数据及数据关系

假设有一组数如下,问(10,27)是否合理?

x y
1 5
2 6
3 9
4 11
5 13

要回答这个问题,可以分三步走:
1、确认x和y的关系;
2、拟合模型,并根据模型进行预测;
3、判断(10,27)是否合理。

要确定关系,可以将数据通过散点图绘制出来

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
    
    'x':list(range(1,6)),'y':[5,6,9,11,13]})
plt.scatter(df.x,df.y)
plt.show()

image.png
由散点图可见,这是一个线性回归模型,故令线性模型函数为 f ( x i ) = a x i + b f \left( {x}_i \right)=ax_i+b f(xi)=axi+b,使得 f ( x i ) ≈ y i f \left( x_i \right) \approx y_i f(xi)yi
这时,一开始的问题就转化成了求系数a和截距b。

2 拟合模型,求最优解

When do a and b reach the optimal solution, that is, the best fitting effect?
According to the past mathematical experience, it is to draw a straight line and pass as many scattered points as possible or close to the scattered points, but it is not sure whether this line is optimal. combined with the results to determine. To judge whether the fitting result is good or bad, there is a common judgment method: the value obtained by the least square method (LS) is as small as possible.
In layman's terms, the least squares method is to take the actual value of each point yi y_iyiAnd the predicted value of the line f ( xi ) f \left( x_i \right)f(xi) are subtracted, then squared, then summed.
In special cases, when the result value is 0, it means that the actual point and the point of the straight line are completely coincident, and they are actually the points taken out from the straight line.
The formula of the least square method is as follows:

LS = ∑ i = 1 m ( f ( xi ) − yi ) 2 , where f ( xi ) = axi + b (2.1) LS=\sum_{i=1}^m{ \left ( f \left( x_i \right) - y_i \right) ^2 }, where f \left({x}_i \right)=ax_i+b \text{(2.1)}LS=i=1m(f(xi)yi)2 , wheref(xi)=axi+b(2.1)


均分误差(MSE,也称平方损失)的公式如下:

M S E = 1 m ∑ i = 1 m ( f ( x i ) − y i ) 2 ,其中 f ( x i ) = a x i + b (2.2) MSE=\dfrac{1}{m}\sum_{i=1}^m{ \left( f \left( x_i \right) - y_i \right) ^2 },其中f \left( {x}_i \right)=ax_i+b \text{(2.2)} MSE=m1i=1m(f(xi)yi)2,其中f(xi)=axi+b(2.2)
注:m 表示点的总数,i 表示每个点。

二者差别在于最小二乘法没有除以m,均方差除以m。在某些场景会有变形,比如加权最小二乘法和加权均方差,权重是对应值出现的概率。
在周志华的机器学习一书讲到的一个概念是“最小二乘法是基于均方误差最小化来进行模型求解的方法”,即:
m i n i m i z e M S E ⇒ 模型 L S \boxed{ minimize MSE \xRightarrow{模型} LS} minimizeMSE模型 LS
介绍完LS和MSE,再回到一开始的问题,什么时候a和b达到最优解,可以通过求导方式求解,即在均方差函数上分别对相关未知变量求导,再令导函数 f ′ = 0 f' =0 f=0进行求解;也可以通过算法进行定向的搜索求解,即先分别给a和b先赋一个初始值,然后通过梯度下降算法,不断更新a和b的值,最终收敛到极值,得到最优解。

2.1 均方差求导求解

如何在均方差 E ( a , b ) = 1 m ∑ i = 1 m ( f ( x i ) − y i ) 2 E\left( a,b \right)=\dfrac{1}{m} \textstyle \sum_{i=1}^m{ \left( f \left( x_i \right) - y_i \right) ^2 } E(a,b)=m1i=1m(f(xi)yi)2上,分别对a和b求导呢?
先把 f ( x i ) = a x i + b f \left( {x}_i \right)=ax_i+b f(xi)=axi+b代入上式,得到

E ( a , b ) = 1 m ∑ i = 1 m ( a x i + b − y i ) 2 ( 2.3 ) = 1 m ∑ i = 1 m ( x i 2 a 2 + 2 ( b − y i ) x i a + ( b − y i ) 2 ) ( 2.4 ) = 1 m ∑ i = 1 m ( b 2 + 2 ( a x i − y i ) b + ( a x i − y i ) 2 ) ( 2.5 ) \begin{aligned} E\left( a,b \right) &= \dfrac{1}{m} \displaystyle \sum_{i=1}^m{ \left( ax_i+b - y_i \right) ^2 } &{(2.3)} \\ &= \dfrac{1}{m} \displaystyle \sum_{i=1}^m{\left( x_i^2a^2 + 2 \left(b-y_i \right)x_ia + \left(b-y_i \right)^2 \right)} &{(2.4)} \\ &= \dfrac{1}{m} \displaystyle \sum_{i=1}^m{\left( b^2 + 2 \left(ax_i-y_i \right)b + \left(ax_i-y_i \right)^2 \right)} &{(2.5)} \end{aligned} E(a,b)=m1i=1m(axi+byi)2=m1i=1m(xi2a2+2(byi)xia+(byi)2)=m1i=1m(b2+2(axiyi)b+(axiyi)2)(2.3)(2.4)(2.5)

When deriving a, it can be changed to the structure of formula (2.4) for viewing; when deriving b, it can be changed to the structure of formula (2.5) for viewing. The result of derivation is:

∂ ∂ a E ( a , b ) = 2 m ∑ i = 1 m ( a x i + b − y i ) x i ( 2.6 ) ∂ ∂ b E ( a , b ) = 2 m ∑ i = 1 m ( a x i + b − y i ) ( 2.7 ) \begin{aligned} \dfrac{\partial}{\partial a} E_{(a,b)} &=\dfrac{2}{m} \displaystyle \sum_{i=1}^m{ \left( ax_i+b-y_i \right)x_i } &{(2.6)} \\ \dfrac{\partial}{\partial b} E_{(a,b)} &=\dfrac{2}{m} \displaystyle \sum_{i=1}^m{\left( ax_i+b-y_i \right)} &{(2.7)} \end{aligned} aE(a,b)bE(a,b)=m2i=1m(axi+byi)xi=m2i=1m(axi+byi)(2.6)(2.7)


Let (2.6) and (2.7) both be 0, and combine the two newly obtained equations to get the solution:

a = ∑ i = i m ( y 1 − y ‾ ) x i ∑ i = i m ( x 1 − x ‾ ) x i 或 ∑ i = 1 m ( x i − x ‾ ) y i ∑ i = i m ( x 1 − x ‾ ) x i 或 ∑ i = i m ( y 1 − y ‾ ) ( x i − x ‾ ) ∑ i = i m ( x 1 − x ‾ ) 2 或 = ∑ i = i m x i y 1 − m x ‾ y ‾ ∑ i = i m x i 2 − m x ‾ 2 (2.9) a=\dfrac{ \displaystyle \sum_{i=i}^m{\left( y_1 - \overline{y} \right) x_i}}{\displaystyle \sum_{i=i}^m{\left( x_1 - \overline{x} \right) x_i}} 或 \dfrac{ \displaystyle \sum_{i=1}^m{\left( x_i-\overline{x} \right)y_i} }{\displaystyle \sum_{i=i}^m{\left( x_1 - \overline{x} \right) x_i}} 或 \dfrac{ \displaystyle \sum_{i=i}^m{\left( y_1 - \overline{y} \right) \left( x_i -\overline{x} \right)}}{\displaystyle \sum_{i=i}^m{\left( x_1 - \overline{x} \right)^2}} 或=\dfrac{ \displaystyle \sum_{i=i}^m{ x_iy_1 } -m\overline{x}\overline{y} }{ \displaystyle \sum_{i=i}^m{ x_i^2 } -m\overline{x}^2 } \text{(2.9)} a=i=im(x1x)xii=im(y1y)xiori=im(x1x)xii=1m(xix)yiori=im(x1x)2i=im(y1y)(xix)or=i=imxi2mx2i=imxiy1mxy(2.9)


b = 1 m ∑ i = 1 m ( y i − a x i ) = y ‾ − a x ‾ (2.10) b=\dfrac{1}{m} \displaystyle \sum_{i=1}^m{\left( y_i-ax_i \right)}=\overline{y}-a\overline{x} \text{(2.10)} b=m1i=1m(yiaxi)=yax(2.10)

Note: The 4 numerators of a get the same value, as do the 3 denominators, and there can be many combinations. The specific derivation process of the above formula can be found in the formula derivation section.

The solution obtained from this set of data is a=2.1; b=2.5.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
    
    'x':list(range(1,6)),'y':[5,6,9,11,13]})
plt.figure(figsize=(8,6))
plt.scatter(df.x,df.y)
plt.plot(df.x,df.x*2.1+2.5)
plt.show()

print(2.1*10+2.5)           # 结果为23.5

The fitting results are as follows:
image.png

Interlude: The Role of Derivatives

The role of the derivative: The derivative describes the rate of change of a function at a certain point, and the magnitude of this rate of change needs to be defined according to the result of the derivative.
For example, a straight line function (such as y = xy=xy=x , its derivative function isy ′ = 1 y'=1y=1 ) the derivative is a natural number (number 1 number 1Number 1 ), then the derivative of any point on the whole line is this value, and the rate of change remains unchanged.
If it is a curve function (such asy = x 2 y=x^2y=x2 , its derivative function isy ′ = 2 x y'=2xy=2 x ), its derivative will give a rate of change (y ′ y'y ), also expressed as the slope of the tangent line at the point, the smaller the slope of the tangent line, the closer to the global or local maximum value (maximum or minimum value), when the tangent line slope is 0, it reaches the maximum value (may be global or possible is local, combined with function judgment, the highest power is 2 times, then there is only one maximum value), as shown in Figure 1 below.
The role of the derivative is to help us find the maximum value when the slope of the tangent line is equal to 0. Generally, this value is the optimal solution of the original function. Of course, if a function has multiple tangent slopes equal to 0, a local optimal solution may be obtained instead of a global optimal solution. (as shown in Figure 2 below)
Derivative tangent.gif
image.png

2.2 Algorithm fitting solution

Next, let's look at the way the algorithm is implemented. The process of algorithm implementation is through directional search.
First look at a picture, as shown in the figure below, which is also a search process, directly looking for a line with a better fitting effect on the original scattered points. However, it is difficult for this method to judge when the optimal solution can be reached.
image.png
At this time, a more scientific way is needed to solve the above problems. The predecessors have created many algorithms for us to search for the optimal solution. One of these algorithms is the classic gradient descent method . This algorithm has some similarities with the previous derivative function. When looking for the extremum, the gradient corresponding to the function is continuously iterated and searched, and finally converges at the extremum point. As shown in the figure below:
Derivative tangent 1.gif
This is a two-dimensional display. If it is a three-dimensional display, you can refer to a classic picture of Wu Enda's machine learning course:
Note: J ( θ 0 , θ 1 ) J \left( \theta_0,\theta_1 \right)J( i0,i1) is the cost function, same as J ( a , b ) J \left( a,b \right)in this paperJ(a,b ) ,θ 0 \theta_0i0Sum θ 1 \theta_1i1Then analogy a and b is the optimal solution of the cost function (discussed below).
image.png
The algorithm logic of gradient descent is very simple, but it will be a little complicated to express it with formulas and solutions.

Before talking about the algorithm, introduce another concept: loss function (also called cost function ). As the algorithm continues to evolve, so does the loss function. This article builds a more classic one, which is based on the evolution of the mean square error, that is, the calculation result of the mean square error is divided by 2. After reading a lot of information, the introduction of this 2 is mainly used to reduce the coefficient 2 after the mean square error is derived. , for the convenience of calculation, let’s understand it this way.
The loss function introduced this time is as follows:

J ( a , b ) = 1 2 m ∑ i = 1 m ( f ( xi ) − yi ) 2 (2.11) J\left( a,b \right)=\dfrac{1 }{2m}\sum_{i=1}^m{ \left( f \left( x_i \right) - y_i \right) ^2 } \text{(2.11)}J(a,b)=2 m1i=1m(f(xi)yi)2(2.11)

Note: The difference between the loss function and the cost function is that the loss function is defined in a single sample, which is the error of a sample, and the cost function is defined in the entire training set, which is the average error of all samples.

Let us generate a continuous equation:

ai + 1 = ai − α ∂ ∂ ai J ( a , b ) ( 2.12 ) bi + 1 = bi − α ∂ ∂ bi J ( a , b ) ( 2.13 ) \begin{aligned} . a_{i+1} &= a_i - \alpha \dfrac{\partial}{\partial a_i} J_{(a,b)} &{(2.12)} \\b_{i+1} &= b_i - \ alpha \dfrac{\partial}{\partial b_i} J_{(a,b)} &{(2.13)} \end{aligned}ai+1bi+1=aiaaiJ(a,b)=biabiJ(a,b)(2.12)(2.13)

These two formulas are iterative formulas, which are the results obtained by subtracting the following long list of formulas from the a and b values ​​obtained this time for the next a and b values.
α \alphaα is the learning rate, or step size, which is the value you define for the algorithm, how much you expect to move each time;∂ ∂ a J ( a , b ) \dfrac{\partial}{\partial a} J_{(a, b)}aJ(a,b) ∂ ∂ b J ( a , b ) \dfrac{\partial}{\partial b} J_{(a,b)} bJ(a,b)是求导公式,即在损失函数上,对a和b分别求导。
补充说明:可能有的小伙伴对于对a和b分别求导有点无厘头,特别是多个位置参数放在一起的时候,更容易混淆。无论如何复杂,认准一点,对谁求导,谁就是自变量,其他的全部视为常数,然后根据求导的规则获取导函数。比如: f ( x ) = a x 2 + b x + c f \left( x \right)=ax^2+bx+c f(x)=ax2+bx+c对a求导就是 f ′ ( a ) = x 2 f' \left( a \right)=x^2 f(a)=x2,对b求导就是 f ′ ( b ) = x f' \left( b \right)=x f(b)=x,对c求导就是 f ′ ( c ) = 1 f' \left( c \right)=1 f(c)=1

如何在代价函数 J ( a , b ) = 1 2 m ∑ i = 1 m ( f ( x i ) − y i ) 2 J\left( a,b \right)=\dfrac{1}{2m} \textstyle \sum_{i=1}^m{ \left( f \left( x_i \right) - y_i \right) ^2 } J(a,b)=2m1i=1m(f(xi)yi)2上,分别对a和b求导呢?这个和上文讲均方差求导时有介绍过类似内容,只不过前面是均方差,此处是代价函数,多除以2,其他基本一致。
先把 f ( x i ) = a x i + b f \left( {x}_i \right)=ax_i+b f(xi)=axi+b代入上式,得到

J ( a , b ) = 1 2 m ∑ i = 1 m ( a x i + b − y i ) 2 ( 2.14 ) = 1 2 m ∑ i = 1 m ( x i 2 a 2 + 2 ( b − y i ) x i a + ( b − y i ) 2 ) ( 2.15 ) = 1 2 m ∑ i = 1 m ( b 2 + 2 ( a x i − y i ) b + ( a x i − y i ) 2 ) ( 2.16 ) \begin{aligned} J\left( a,b \right) &= \dfrac{1}{2m} \displaystyle \sum_{i=1}^m{ \left( ax_i+b - y_i \right) ^2 } &{(2.14)} \\ &= \dfrac{1}{2m} \displaystyle \sum_{i=1}^m{\left( x_i^2a^2 + 2 \left(b-y_i \right)x_ia + \left(b-y_i \right)^2 \right)} &{(2.15)} \\ &= \dfrac{1}{2m} \displaystyle \sum_{i=1}^m{\left( b^2 + 2 \left(ax_i-y_i \right)b + \left(ax_i-y_i \right)^2 \right)} &{(2.16)} \end{aligned} J(a,b)=2 m1i=1m(axi+byi)2=2 m1i=1m(xi2a2+2(byi)xia+(byi)2)=2 m1i=1m(b2+2(axiyi)b+(axiyi)2)(2.14)(2.15)(2.16)

When deriving a, it can be changed to the structure of formula (2.15) for viewing; when deriving b, it can be changed to the structure of formula (2.16) for viewing. The result of derivation is:

∂ ∂ a J ( a , b ) = 1 m ∑ i = 1 m ( axi + b − yi ) xi ( 2.17 ) ∂ ∂ b J ( a , b ) = 1 m ∑ i = 1 m ( axi + b − yi ) ( 2.18 ) \begin{aligned} \dfrac{\partial}{\partial a} J_{(a,b)} &=\dfrac{1}{m} \displaystyle \sum_{i =1}^m{ \left( ax_i+b-y_i \right)x_i } &{(2.17)} \\ \dfrac{\partial}{\partial b} J_{(a,b)} &=\dfrac {1}{m} \displaystyle \sum_{i=1}^m{\left( ax_i+b-y_i \right)} &{(2.18)} \end{aligned}aJ(a,b)bJ(a,b)=m1i=1m(axi+byi)xi=m1i=1m(axi+byi)(2.17)(2.18)


Note: For the specific derivation process, please refer to the formula derivation section.

将(2.17)代入(2.12),将(2.18)代入(2.13),便可以得到:

a i + 1 = a i − α 1 m ∑ i = 1 m ( a i x i + b i − y i ) x i ( 2.19 ) b i + 1 = b i − α 1 m ∑ i = 1 m ( a i x i + b i − y i ) ( 2.20 ) \begin{aligned} a_{i+1} &= a_i - \alpha \dfrac{1}{m} \displaystyle \sum_{i=1}^m{ \left( a_ix_i+b_i-y_i \right)x_i } &{(2.19)} \\ b_{i+1} &= b_i - \alpha \dfrac{1}{m} \displaystyle \sum_{i=1}^m{ \left( a_ix_i+b_i-y_i \right) } &{(2.20)} \end{aligned} ai+1bi+1=aiam1i=1m(aixi+biyi)xi=biam1i=1m(aixi+biyi)(2.19)(2.20)

注:右边的a和b的结构都是本次的a和b,左边的a和b是下一次值,二者同步更新。用Python实现过程,或需要新建一个中间变量,以避免在计算完(2.19)式之后,变量a 变成了下一次的值,等到计算(2.20)式时,使用下一次的a 值更新下一次的b 值。

如果通过Python实现们可以直接调用scikit-learn(sklearn)的线性回归模型库实现,具体代码实现步骤如下:

# 读取数据
import pandas as pd
data = pd.DataFrame({
    
    'x':[1,2,3,4,5],'y':[5,6,9,11,13]})
# data.shape,type(data)
x = data.loc[:,'x']
y = data.loc[:,'y']

# 画图查看数据分布
from matplotlib import pyplot as plt
plt.figure(figsize=(8,6))
plt.scatter(x,y)
plt.show()

# 调用sklearn的线性模型
from sklearn.linear_model import LinearRegression
lr_model = LinearRegression()

# 转换维度,一维转为二维
X = x.values.reshape(-1, 1) 

# 训练模型
lr_model.fit(X,y)
# 预测x结果
y_predict = lr_model.predict(X)

# 画图查看y和y_predict
plt.figure(figsize=(8,6))
plt.scatter(x,y)
plt.plot(x,y_predict)
plt.show()

# 打印a、b值
a = lr_model.coef_
b = lr_model.intercept_
print(a,b)                  # 结果为:[2.1] 2.499999999999999

# 查看均方差
from sklearn.metrics import mean_squared_error,r2_score
MSE = mean_squared_error(y,y_predict)
R2 = r2_score(y,y_predict)
print(MSE,R2)               # 结果为:0.13999999999999996 0.984375

# 预测具体某个值的结果
y_p = lr_model.predict([[10]])
print(y_p)                  # 结果为:[23.5]

拟合的曲线为: f ( x i ) = 2.1 x i + 2.5 f \left( {x}_i \right)=2.1x_i+2.5 f(xi)=2.1xi+2.5
拟合结果可视化如下:
image.png

3 判断结果

通过以上两种求解方法得到的结果均为23.5。(10,27)点在拟合直线的上方,27比23.5大3.5。
是否合理呢?可以通过 R 2 R^2 R2来衡量,在算法计算中有计算过该值,为0.984375,也就是说,在该模型的拟合效果上,有98.4%的可信度相信当x=10时,得到23.5。所以27是不合理的。但是,需要注意的是,该可信度是基于已知的5个点训练得到的结果,由于数量较少,对x>5之外的值可能不适用,当然,如果是相同的场景下,或有一定的适用性,权当教学使用。
虽然27不合理,但是在涉及到决策时,需要再结合实际的情况做出是否接受的判断,比如说如果是某地的房价预测是23.5万/平,实际出价是27万/平,对于卖方有利,对于卖方无利。站在卖方角度,一般可接受,站在买方角度,一般不可接受。

4 Formula derivation

4.1 如何证明 ∑ i = 1 m ( x i − x ‾ ) x i = ∑ i = 1 m ( x i − x ‾ ) 2 = ∑ i = i m x i 2 − m x ‾ 2 \textstyle\sum_{i=1}^m{ \left( {x}_i-\overline{x} \right)x_i} =\textstyle\sum_{i=1}^m{ \left( {x}_i-\overline{x} \right)^2} =\textstyle \sum_{i=i}^m{ x_i^2 } -m\overline{x}^2 i=1m(xix)xi=i=1m(xix)2=i=imxi2mx2 established?

∑ i = 1 m x i = x 1 + x 2 + x 3 + ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ + x m = m x ‾ = ∑ i = 1 m x ‾ (4.1) \displaystyle\sum_{i=1}^m{ {x}_i} = x_1+x_2+x_3+··· ···+x_m=m \overline{x} =\displaystyle\sum_{i=1}^m{\overline{x}} \text{(4.1)} i=1mxi=x1+x2+x3+⋅⋅⋅⋅⋅⋅+xm=mx=i=1mx(4.1)

In formula (4.1), the sum of m sample values ​​​​of x is equal to m times the mean value, which can be expressed as∑ i = 1 mx ‾ \textstyle \sum_{i=1}^m{\overline{x}}i=1mx, ie m x ‾ \overline{x}x相加。

∑ i = 1 m ( x ‾ x i ) = x ‾ ∑ i = 1 m x i (4.2) \displaystyle\sum_{i=1}^m{ \left( \overline{x} x_i \right)}=\overline{x}\displaystyle\sum_{i=1}^m{x_i} \text{(4.2)} i=1m(xxi)=xi=1mxi(4.2)

(4.2),x ‾ \overline{x}xis a constant, the formula (4.2) is equivalent to x ‾ x 1 + x ‾ x 2 + ⋅ ⋅ ⋅ + x ‾ xm = x ‾ ( x 1 + x 2 + ⋅ ⋅ ⋅ + xm ) \overline{x} x_1+\ overline{x} x_2+···+\overline{x} x_m=\overline{x} (x_1+x_2+···+x_m)xx1+xx2+⋅⋅⋅+xxm=x(x1+x2+⋅⋅⋅+xm)


证明 ∑ i = 1 m ( x i − x ‾ ) x i = ∑ i = 1 m ( x i − x ‾ ) 2 \textstyle\sum_{i=1}^m{ \left( {x}_i-\overline{x} \right)x_i} =\textstyle\sum_{i=1}^m{ \left( {x}_i-\overline{x} \right)^2} i=1m(xix)xi=i=1m(xix)2

∑ i = 1 m ( x i − x ‾ ) x i = ∑ i = 1 m ( x i 2 − x ‾ x i ) ( 4.3 ) = ∑ i = 1 m ( x i 2 − x ‾ x i + x ‾ x i − x ‾ x i ) ( 4.4 ) = ∑ i = 1 m ( x i 2 − 2 x ‾ x i + x ‾ x i ) ( 4.5 ) = ∑ i = 1 m ( x i 2 − 2 x ‾ x i ) + ∑ i = 1 m ( x ‾ x i ) ( 4.6 ) = ∑ i = 1 m ( x i 2 − 2 x ‾ x i ) + x ‾ ∑ i = 1 m ( x i ) ( 4.7 ) = ∑ i = 1 m ( x i 2 − 2 x ‾ x i ) + x ‾ ⋅ m x ‾ ( 4.8 ) = ∑ i = 1 m ( x i 2 − 2 x ‾ x i ) + m x ‾ 2 ( 4.9 ) = ∑ i = 1 m ( x i 2 − 2 x ‾ x i ) + ∑ i = 1 m x ‾ 2 ( 4.10 ) = ∑ i = 1 m ( x i 2 − 2 x ‾ x i + x ‾ 2 ) ( 4.11 ) = ∑ i = 1 m ( x i − x ‾ ) 2 ( 4.12 ) \begin{aligned} \displaystyle\sum_{i=1}^m{ \left( {x}_i-\overline{x} \right)x_i} &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-\overline{x} x_i \right)} &{(4.3)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-\overline{x} x_i + \overline{x} x_i - \overline{x} x_i \right)} &{(4.4)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-2 \overline{x} x_i + \overline{x} x_i \right)} &{(4.5)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-2 \overline{x} x_i \right)} + \displaystyle\sum_{i=1}^m{ \left( \overline{x} x_i \right)} &{(4.6)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-2 \overline{x} x_i \right)} + \overline{x}\displaystyle\sum_{i=1}^m{ \left( x_i \right)} &{(4.7)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-2 \overline{x} x_i \right)} + \overline{x}·m\overline{x} &{(4.8)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-2 \overline{x} x_i \right)} + m\overline{x}^2 &{(4.9)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-2 \overline{x} x_i \right)} + \displaystyle\sum_{i=1}^m{ \overline{x}^2 } &{(4.10)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i^2-2 \overline{x} x_i + \overline{x}^2 \right) } &{(4.11)} \\ &=\displaystyle\sum_{i=1}^m{ \left( {x}_i- \overline{x} \right)^2 } &{(4.12)} \end{aligned} i=1m(xix)xi=i=1m(xi2xxi)=i=1m(xi2xxi+xxixxi)=i=1m(xi22xxi+xxi)=i=1m(xi22xxi)+i=1m(xxi)=i=1m(xi22xxi)+xi=1m(xi)=i=1m(xi22xxi)+xmx=i=1m(xi22xxi)+mx2=i=1m(xi22xxi)+i=1mx2=i=1m(xi22xxi+x2)=i=1m(xix)2(4.3)(4.4)(4.5)(4.6)(4.7)(4.8)(4.9)(4.10)(4.11)(4.12)


证明 ∑ i = 1 m ( x i − x ‾ ) x i = ∑ i = i m x i 2 − m x ‾ 2 \textstyle\sum_{i=1}^m{ \left( {x}_i-\overline{x} \right)x_i} =\textstyle \sum_{i=i}^m{ x_i^2 } -m\overline{x}^2 i=1m(xix)xi=i=imxi2mx2

∑ i = 1 m ( x i − x ‾ ) x i = ∑ i = 1 m x i 2 − ∑ i = 1 m x ‾ x i ( 4.13 ) = ∑ i = 1 m x i 2 − x ‾ ∑ i = 1 m x i ( 4.14 ) = ∑ i = 1 m x i 2 − x ‾ ⋅ m x ‾ ( 4.15 ) = ∑ i = 1 m x i 2 − m x ‾ 2 ( 4.16 ) \begin{aligned} \displaystyle\sum_{i=1}^m{ \left( {x}_i-\overline{x} \right)x_i} &=\displaystyle\sum_{i=1}^m{ {x}_i^2 }-\displaystyle\sum_{i=1}^m{\overline{x} x_i } &{(4.13)} \\ &=\displaystyle\sum_{i=1}^m{ {x}_i^2 }-\overline{x}\displaystyle\sum_{i=1}^m{ x_i } &{(4.14)} \\ &=\displaystyle\sum_{i=1}^m{ {x}_i^2 }-\overline{x}·m\overline{x} &{(4.15)} \\ &=\displaystyle\sum_{i=1}^m{ {x}_i^2 }-m\overline{x}^2 &{(4.16)} \end{aligned} i=1m(xix)xi=i=1mxi2i=1mxxi=i=1mxi2xi=1mxi=i=1mxi2xmx=i=1mxi2mx2(4.13)(4.14)(4.15)(4.16)
结合(4.1)式代入(4.14)可得到(4.15)式。



4.2 如何证明 ∑ i = 1 m ( x i − x ‾ ) y i = ∑ i = 1 m ( y i − y ‾ ) x i = ∑ i = i m ( y i − y ‾ ) ( x i − x ‾ ) = ∑ i = i m x i y i − m x ‾ y ‾ \textstyle \sum_{i=1}^m{\left( x_i-\overline{x} \right)y_i}=\textstyle \sum_{i=1}^m{\left( y_i-\overline{y} \right)x_i} =\textstyle \sum_{i=i}^m{\left( y_i - \overline{y} \right) \left( x_i -\overline{x} \right)} = \textstyle \sum_{i=i}^m{ x_iy_i } -m\overline{x}\overline{y} i=1m(xix)yi=i=1m(yiy)xi=i=im(yiy)(xix)=i=imxiyimxy成立?


x ‾ = 1 m ∑ i = 1 m x i ( 4.17 ) y ‾ = 1 m ∑ i = 1 m y i ( 4.18 ) \begin{aligned} \overline{x}= \dfrac{1}{m} \displaystyle \sum_{i=1}^m{x_i} &{(4.17)} \\ \overline{y}= \dfrac{1}{m} \displaystyle \sum_{i=1}^m{y_i} &{(4.18)} \end{aligned} x=m1i=1mxiy=m1i=1myi(4.17)(4.18)

证明 ∑ i = 1 m ( x i − x ‾ ) y i = ∑ i = 1 m ( y i − y ‾ ) x i \textstyle \sum_{i=1}^m{\left( x_i-\overline{x} \right)y_i}=\textstyle \sum_{i=1}^m{\left( y_i-\overline{y} \right)x_i} i=1m(xix)yi=i=1m(yiy)xi

∑ i = 1 m ( x i − x ‾ ) y i = ∑ i = 1 m ( x i y i − x ‾ y i ) ( 4.19 ) = ∑ i = 1 m x i y i − ∑ i = 1 m x ‾ y i ( 4.20 ) = ∑ i = 1 m x i y i − x ‾ ⋅ ∑ i = 1 m y i ( 4.21 ) = ∑ i = 1 m x i y i − ( 1 m ∑ i = 1 m x i ) ⋅ ( m y ‾ ) ( 4.22 ) = ∑ i = 1 m x i y i − y ‾ ∑ i = 1 m x i ( 4.23 ) = ∑ i = 1 m x i y i − ∑ i = 1 m y ‾ x i ( 4.24 ) = ∑ i = 1 m ( x i y i − y ‾ x i ) ( 4.25 ) = ∑ i = 1 m ( y i − y ‾ ) x i ( 4.26 ) \begin{aligned} \displaystyle \sum_{i=1}^m{\left( x_i-\overline{x} \right)y_i} &=\displaystyle\sum_{i=1}^m{ \left( x_iy_i-\overline{x}y_i \right)} &{(4.19)} \\ &=\displaystyle\sum_{i=1}^m{ x_iy_i}-\displaystyle\sum_{i=1}^m{ \overline{x}y_i } &{(4.20)} \\ &=\displaystyle\sum_{i=1}^m{ x_iy_i}-\overline{x}·\displaystyle\sum_{i=1}^m{ y_i } &{(4.21)} \\ &=\displaystyle\sum_{i=1}^m{ x_iy_i}-\left( \dfrac{1}{m} \displaystyle \sum_{i=1}^m{x_i} \right) · \left( m \overline{y} \right) &{(4.22)} \\ &=\displaystyle\sum_{i=1}^m{ x_iy_i}-\overline{y}\displaystyle \sum_{i=1}^m{x_i} &{(4.23)} \\ &=\displaystyle\sum_{i=1}^m{ x_iy_i}-\displaystyle \sum_{i=1}^m{\overline{y}x_i} &{(4.24)} \\ &=\displaystyle\sum_{i=1}^m{ \left( x_iy_i- \overline{y}x_i \right)} &{(4.25)} \\ &=\displaystyle \sum_{i=1}^m{\left( y_i-\overline{y} \right)x_i} &{(4.26)} \end{aligned} i=1m(xix)yi=i=1m(xiyixyi)=i=1mxiyii=1mxyi=i=1mxiyixi=1myi=i=1mxiyi(m1i=1mxi)(my)=i=1mxiyiyi=1mxi=i=1mxiyii=1myxi=i=1m(xiyiyxi)=i=1m(yiy)xi(4.19)(4.20)(4.21)(4.22)(4.23)(4.24)(4.25)(4.26)

Substituting (4.17) and (4.18) into (4.21) yields (4.22).
Note: ∑ i = 1 mxiyi / = xi ∑ i = 1 myi \textstyle \sum_{i=1}^m{x_iy_i} \mathrlap{\,/}{=} x_i\textstyle \sum_{i=1}^ m{y_i}i=1mxiyi/=xii=1myibecause xi x_ixiis not a constant, ∑ i = 1 mxiyi \textstyle \sum_{i=1}^m{x_iy_i}i=1mxiyiRepresents two variables xi x_ixisum yi y_iyiThe sum of products is calculated in the order of multiplication and then addition. And xi ∑ i = 1 myi x_i\textstyle \sum_{i=1}^m{y_i}xii=1myiis the variable yi y_iyiMultiply the variable xi x_i after summingxi. fruit xi x_ixiis a constant equality.


证明 ∑ i = 1 m ( x i − x ‾ ) y i = ∑ i = i m ( y i − y ‾ ) ( x i − x ‾ ) \textstyle \sum_{i=1}^m{\left( x_i-\overline{x} \right)y_i} =\textstyle \sum_{i=i}^m{\left( y_i - \overline{y} \right) \left( x_i -\overline{x} \right)} i=1m(xix)yi=i=im(yiy)(xix)

∑ i = i m ( y 1 − y ‾ ) ( x i − x ‾ ) = ∑ i = i m ( x i y i − x ‾ y i − x i y ‾ + x y ‾ ) ( 4.27 ) = ∑ i = i m ( x i − x ‾ ) y i − ∑ i = i m ( x i y ‾ ) + ∑ i = i m ( x y ‾ ) ( 4.28 ) = ∑ i = i m ( x i − x ‾ ) y i − y ‾ ∑ i = i m ( x i ) + y ‾ ∑ i = i m ( x ‾ ) ( 4.29 ) = ∑ i = i m ( x i − x ‾ ) y i − m x y ‾ + − m x y ‾ ( 4.30 ) = ∑ i = i m ( x i − x ‾ ) y i ( 4.31 ) \begin{aligned} \displaystyle \sum_{i=i}^m{\left( y_1 - \overline{y} \right) \left( x_i -\overline{x} \right)} &=\displaystyle \sum_{i=i}^m{\left( x_iy_i -\overline{x}y_i - x_i\overline{y} +\overline{xy} \right)} &{(4.27)} \\&=\displaystyle \sum_{i=i}^m{\left( x_i -\overline{x} \right)y_i} -\displaystyle \sum_{i=i}^m{\left( x_i\overline{y} \right) }+\displaystyle \sum_{i=i}^m{\left(\overline{xy} \right)} &{(4.28)} \\&=\displaystyle \sum_{i=i}^m{\left( x_i -\overline{x} \right)y_i} -\overline{y}\displaystyle \sum_{i=i}^m{\left( x_i \right) }+\overline{y}\displaystyle \sum_{i=i}^m{\left(\overline{x} \right)} &{(4.29)} \\&=\displaystyle \sum_{i=i}^m{\left( x_i -\overline{x} \right)y_i} -m\overline{xy}+-m\overline{xy} &{(4.30)} \\&=\displaystyle \sum_{i=i}^m{\left( x_i -\overline{x} \right)y_i} &{(4.31)} \end{aligned} i=im(y1y)(xix)=i=im(xiyixyixiy+xy)=i=im(xix)yii=im(xiy)+i=im(xy)=i=im(xix)yiyi=im(xi)+yi=im(x)=i=im(xix)yimxy+mxy=i=im(xix)yi(4.27)(4.28)(4.29)(4.30)(4.31)
Substitute (4.1) and (4.17) into (4.29) to get (4.30).


证明 ∑ i = 1 m ( x i − x ‾ ) y i = ∑ i = i m x i y i − m x ‾ y ‾ \textstyle \sum_{i=1}^m{\left( x_i-\overline{x} \right)y_i} = \textstyle \sum_{i=i}^m{ x_iy_i } -m\overline{x}\overline{y} i=1m(xix)yi=i=imxiyimxy: Combining (4.18) and (4.21), the result can be deduced.



4.3 为什么 J ( a , b ) = 1 2 m ∑ i = 1 m ( a x i + b − y i ) 2 J_{(a,b)} =\dfrac{1}{2m} \textstyle\sum_{i=1}^m{\left( ax_i+b-y_i \right)^2} J(a,b)=2 m1i=1m(axi+byi)2对a 求导之后是 1 m ∑ i = 1 m ( a x i + b − y i ) x i \dfrac{1}{m} \textstyle \sum_{i=1}^m{\left( ax_i+b-y_i\right) x_i} m1i=1m(axi+byi)xi, and after derivation to b is 1 m ∑ i = 1 m ( axi + b − yi ) \dfrac{1}{m} \textstyle \sum_{i=1}^m{\left( ax_i+b-y_i \right)}m1i=1m(axi+byi) ?

atE ( a , b ) E(a,b)And ( to ,b)函数上对a和b求导的解法和该解法相类似,不赘述。
J ( a , b ) = 1 2 m ∑ i = 1 m ( a x i + b − y i ) 2 ( 4.32 ) = 1 2 m ∑ i = 1 m ( x i 2 a 2 + 2 ( b − y i ) x i a + ( b − y i ) 2 ) ( 4.33 ) = 1 2 m ∑ i = 1 m ( b 2 + 2 ( a x i − y i ) b + ( a x i − y i ) 2 ) ( 4.34 ) \begin{aligned} J\left( a,b \right) &= \dfrac{1}{2m} \displaystyle \sum_{i=1}^m{ \left( ax_i+b - y_i \right) ^2 } &{(4.32)} \\ &= \dfrac{1}{2m} \displaystyle \sum_{i=1}^m{\left( x_i^2a^2 + 2 \left(b-y_i \right)x_ia + \left(b-y_i \right)^2 \right)} &{(4.33)} \\ &= \dfrac{1}{2m} \displaystyle \sum_{i=1}^m{\left( b^2 + 2 \left(ax_i-y_i \right)b + \left(ax_i-y_i \right)^2 \right)} &{(4.34)} \end{aligned} J(a,b)=2 m1i=1m(axi+byi)2=2 m1i=1m(xi2a2+2(byi)xia+(byi)2)=2 m1i=1m(b2+2(axiyi)b+(axiyi)2)(4.32)(4.33)(4.34)

Use formula (4.33) to derive derivatives for a; use formula (4.34) to derive derivatives for b.

∂ J ( a , b ) ∂ a = 1 2 m ∑ i = 1 m ( 2 x i 2 a + 2 ( b − y i ) x i + 0 ) ( 4.35 ) = 1 2 m ∑ i = 1 m 2 x i ( x i a + ( b − y i ) ) ( 4.36 ) = 1 m ∑ i = 1 m x i ( x i a + b − y i ) ( 4.37 ) = 1 m ∑ i = 1 m ( a x i + b − y i ) x i ( 4.38 ) ∂ J ( a , b ) ∂ b = 1 2 m ∑ i = 1 m ( 2 b + 2 ( a x i − y i ) + 0 ) ( 4.39 ) = 1 2 m ∑ i = 1 m 2 x i ( x i a + ( b − y i ) ) ( 4.40 ) = 1 m ∑ i = 1 m x i ( x i a + b − y i ) ( 4.41 ) = 1 m ∑ i = 1 m ( a x i + b − y i ) x i ( 4.42 ) \begin{aligned} \dfrac{\partial J_{(a,b)}}{\partial a} &=\dfrac{1}{2m} \displaystyle \sum_{i=1}^m{\left( 2x_i^2a + 2 \left(b-y_i \right)x_i + 0 \right)} &{(4.35)} \\&=\dfrac{1}{2m} \displaystyle \sum_{i=1}^m{ 2x_i \left( x_ia + \left(b-y_i \right) \right) } &{(4.36)} \\&=\dfrac{1}{m} \displaystyle \sum_{i=1}^m{ x_i \left( x_ia + b-y_i \right) } &{(4.37)} \\&=\dfrac{1}{m} \displaystyle \sum_{i=1}^m{\left( ax_i+b-y_i\right) x_i} &{(4.38)} \end{aligned} \\ \begin{aligned} \dfrac{\partial J_{(a,b)}}{\partial b} &= \dfrac{1}{2m} \displaystyle \sum_{i=1}^m{\left( 2b + 2 \left(ax_i-y_i \right) + 0 \right)} &{(4.39)} \\&=\dfrac{1}{2m} \displaystyle \sum_{i=1}^m{ 2x_i \left( x_ia + \left(b-y_i \right) \right) } &{(4.40)} \\&=\dfrac{1}{m} \displaystyle \sum_{i=1}^m{ x_i \left( x_ia + b-y_i \right) } &{(4.41)} \\&=\dfrac{1}{m} \displaystyle \sum_{i=1}^m{\left( ax_i+b-y_i\right) x_i} &{(4.42)} \end{aligned} aJ(a,b)=2 m1i=1m( 2x _i2a+2(byi)xi+0)=2 m1i=1m2x _i(xia+(byi))=m1i=1mxi(xia+byi)=m1i=1m(axi+byi)xi(4.35)(4.36)(4.37)(4.38)bJ(a,b)=2 m1i=1m( 2b _+2(axiyi)+0)=2 m1i=1m2x _i(xia+(byi))=m1i=1mxi(xia+byi)=m1i=1m(axi+byi)xi(4.39)(4.40)(4.41)(4.42)



4.4 The formula and deformation derivation of sklearn.metrics.r2_score

Basics:
Total Sum of Squares: TSS = ∑ i = 1 m ( yi − y ‾ ) 2 TSS= \textstyle \sum_{i=1}^m{ \left( y_i-\overline{y } \right)^2}TSS=i=1m(yiy)2 (the sum of squares of the difference between the actual value and the mean)
regression sum of squares (Explained Sum of Squares):ESS = ∑ i = 1 m ( y ′ − y ‾ ) 2 ESS=\textstyle \sum_{i=1}^ m{ \left( y'-\overline{y} \right)^2}ESS=i=1m(yy)2 (the sum of squares of the difference between the predicted value and the mean)
residual sum of squares (Residual Sum of Squares):RSS = ∑ i = 1 m ( yi − y ′ ) 2 RSS=\textstyle \sum_{i=1}^ m{ \left( y_i-y' \right)^2}RSS=i=1m(yiy)2 (the sum of squares of the difference between actual and predicted values)
TSS = ESS + RSS TSS=ESS+RSSTSS=ESS+RSS
均方差(Mean Square Error): M S E = 1 m ∑ i = 1 m ( y i − y ′ ) 2 = 1 m R S S MSE=\dfrac{1}{m} \textstyle \sum_{i=1}^m{ \left( y_i-y' \right)^2}=\dfrac{1}{m}RSS MSE=m1i=1m(yiy)2=m1RSS
y的方差(Variance): v a r y = 1 m ∑ i = 1 m ( y i − y ‾ ) 2 = 1 m T S S var_y=\dfrac{1}{m} \textstyle \sum_{i=1}^m{ \left( y_i-\overline{y} \right)^2}=\dfrac{1}{m}TSS w a ry=m1i=1m(yiy)2=m1TSS
x的方差(Variance): v a r x = 1 m ∑ i = 1 m ( x i − x ‾ ) 2 var_x=\dfrac{1}{m} \textstyle \sum_{i=1}^m{ \left( x_i-\overline{x} \right)^2} w a rx=m1i=1m(xix)2
standard deviation (Standard Deviation):std = var std=\sqrt{var}std=w a r

R 2 = 1 − RSSTSS (ratio of residual sum of squares to population sum of squares) = 1 − MSE vary (ratio of mean square error to variance of yi) = astd ( xi ) std ( yi ) (linear coefficient a multiplied by xi and The ratio of the standard deviation of yi) \begin{aligned} R^2 &=1-\dfrac{RSS}{TSS}(The ratio of the residual sum of squares to the overall sum of squares) \\&=1-\dfrac{MSE} {var_y} (the ratio of the mean square error to the variance of y_i) \\&=a\dfrac{std(x_i)}{std(y_i)} (the ratio of the linear coefficient a times the standard deviation of x_i and y_i) \end{ aligned}R2=1TSSRSS( the ratio of the residual sum of squares to the overall sum of squares )=1w a ryMSE( mean square error and yiThe ratio of the variance of=astd(yi)std(xi)( The linear coefficient a multiplied by xiJapanese _iThe ratio of the standard deviation of )

For R 2 R^2R2 Derivation from top to bottom:
(4.43)~(4.45) Proof:R 2 = 1 − RSSTSS = 1 − MSE vary \begin{aligned} R^2=1-\dfrac{RSS}{TSS}=1-\ dfrac{MSE}{var_y}\end{aligned}R2=1TSSRSS=1w a ryMSE
(4.46)~(4.54)证明: R 2 = 1 − R S S T S S = a s t d ( x i ) s t d ( y i ) \begin{aligned} R^2=1-\dfrac{RSS}{TSS} =a\dfrac{std(x_i)}{std(y_i)} \end{aligned} R2=1TSSRSS=astd(yi)std(xi)
注: y ′ = a x i + b y'=ax_i+b y=axi+b代入(4.48), b = y ‾ − a x ‾ b=\overline{y}-a\overline{x} b=yaxSubstitute (4.49)

R 2 = 1 − R S S T S S ( 4.43 ) = 1 − 1 m R S S 1 m T S S ( 4.44 ) = 1 − M S E v a r y ( 4.45 ) = T S S − R S S T S S ( 4.46 ) = E S S T S S ( 4.47 ) = ∑ i = 1 m ( y ′ − y ‾ ) 2 ∑ i = 1 m ( y i − y ‾ ) 2 ( 4.48 ) = ∑ i = 1 m ( a x i + b − y ‾ ) 2 ∑ i = 1 m ( y i − y ‾ ) 2 ( 4.49 ) = ∑ i = 1 m ( a x i + ( y ‾ − a x ‾ ) − y ‾ ) 2 ∑ i = 1 m ( y i − y ‾ ) 2 ( 4.50 ) = ∑ i = 1 m ( a x i − a x ‾ ) 2 ∑ i = 1 m ( y i − y ‾ ) 2 ( 4.51 ) = a 2 ∑ i = 1 m ( x i − x ‾ ) 2 ∑ i = 1 m ( y i − y ‾ ) 2 ( 4.52 ) = a 2 1 m ∑ i = 1 m ( x i − x ‾ ) 2 1 m ∑ i = 1 m ( y i − y ‾ ) 2 ( 4.53 ) = a 2 v a r ( x i ) v a r ( y i ) ( 4.54 ) \begin{aligned} R^2 &=1-\dfrac{RSS}{TSS} &{(4.43)} \\&=1-\dfrac{\dfrac{1}{m}RSS}{\dfrac{1}{m}TSS} &{(4.44)} \\&=1-\dfrac{MSE}{var_y} &{(4.45)} \\&=\dfrac{TSS-RSS}{TSS} &{(4.46)} \\&=\dfrac{ESS}{TSS} &{(4.47)} \\&=\dfrac{\displaystyle \sum_{i=1}^m{ \left( y'-\overline{y} \right)^2}}{\displaystyle \sum_{i=1}^m{ \left( y_i-\overline{y} \right)^2}} &{(4.48)} \\&=\dfrac{\displaystyle \sum_{i=1}^m{ \left( ax_i+b-\overline{y} \right)^2}}{\displaystyle \sum_{i=1}^m{ \left( y_i-\overline{y} \right)^2}} &{(4.49)} \\&=\dfrac{\displaystyle \sum_{i=1}^m{ \left( ax_i+\left( \overline{y}-a\overline{x}\right)-\overline{y} \right)^2}}{\displaystyle \sum_{i=1}^m{ \left( y_i-\overline{y} \right)^2}} &{(4.50)} \\&=\dfrac{\displaystyle \sum_{i=1}^m{ \left( ax_i-a\overline{x} \right)^2}}{\displaystyle \sum_{i=1}^m{ \left( y_i-\overline{y} \right)^2}} &{(4.51)} \\&=\dfrac{a^2\displaystyle \sum_{i=1}^m{ \left( x_i-\overline{x} \right)^2}}{\displaystyle \sum_{i=1}^m{ \left( y_i-\overline{y} \right)^2}} &{(4.52)} \\&=a^2\dfrac{ \dfrac{1}{m} \displaystyle \sum_{i=1}^m{ \left( x_i-\overline{x} \right)^2}}{\dfrac{1}{m}\displaystyle \sum_{i=1}^m{ \left( y_i-\overline{y} \right)^2}} &{(4.53)} \\&=a^2\dfrac{var(x_i)}{var(y_i)} &{(4.54)} \end{aligned} R2=1TSSRSS=1m1TSSm1RSS=1w a ryMSE=TSSTSSRSS=TSSESS=i=1m(yiy)2i=1m(yy)2=i=1m(yiy)2i=1m(axi+by)2=i=1m(yiy)2i=1m(axi+(yax)y)2=i=1m(yiy)2i=1m(axiax)2=i=1m(yiy)2a2i=1m(xix)2=a2m1i=1m(yiy)2m1i=1m(xix)2=a2v a r ( yi)v a r ( xi)(4.43)(4.44)(4.45)(4.46)(4.47)(4.48)(4.49)(4.50)(4.51)(4.52)(4.53)(4.54)

5 Reference:

Machine Learning-Zhou Zhihua-P54
Language Sparrow Latex: https://www.yuque.com/yuque/gpvawt/brzicb
Symbol Katex: https://katex.org/docs/supported.html


Small bonus: For the derivation of the formula in part 4, you can click this link to download the relevant Markdown document.




2022-12-17 Add a problem transformation process to better understand the whole process of problem solving.
Question conversion clue:
Is (10,27) reasonable
-> find f ( xi ) = axi + bf \left( {x}_i \right)=ax_i+bf(xi)=axi+The coefficient a and intercept b of b (a and b reach the optimal solution);
-> Find the mean square errorMSE = 1 m ∑ i = 1 m ( f ( xi ) − yi ) 2 MSE=\dfrac{1}{m }\sum_{i=1}^m{ \left( f \left( x_i \right) - y_i \right) ^2 }MSE=m1i=1m(f(xi)yi)The value of a and b when 2 is the smallest;
-> Take the derivative of a and b respectively, take∂ ∂ a E ( a , b ) = 0 , ∂ ∂ b E ( a , b ) = 0 \dfrac{\partial} {\partial a} E_{(a,b)}=0, \dfrac{\partial}{\partial b} E_{(a,b)}=0aE(a,b)=0bE(a,b)=0 , solve the values ​​of a and b;
-> substitutef ( xi ) = axi + bf \left( {x}_i \right)=ax_i+bf(xi)=axi+b , solutionxi = 10 {x}_i=10xi=The function value of 10 is compared with 27.

Guess you like

Origin blog.csdn.net/qq_45476428/article/details/127957261