unary linear regression
- 1. Regression analysis
- 2. One-element linear regression model
- 3. a , ba,ba,b和σ 2 \sigma^2p2 estimates
- 4. A model that can be transformed into a linear regression
-
- 1. Y = α and β x ⋅ ε , ln ε ~ N ( 0 , σ 2 ) Y=\alpha e^{\beta x }\cdot\varepsilon,\,\ln\varepsilon\td N(0,\sigma^2);Y=a eβx⋅eh _lne~N(0,p2)
- 2. Y = α + β h ( x ) + ε , ε ~ N ( 0 , σ 2 ) Y=\alpha+\beta h(x)+\varepsilon,\,\varepsilon\newcommand{\td}{\, . \text{\texttitle}\,}\td N(0,\sigma^2)Y=a+βh(x)+eh _e~N(0,p2)
1. Regression analysis
Relationships between variables may be deterministic (functional relationships) or statistical dependencies (correlations). In the correlation relationship, the "dependent variable" is a random variable, and its value has uncertainty. It cannot be analyzed by examining the functional relationship, but by statistical methods. There are two ways to examine the correlation relationship: when the independent variable is a non-random variable that can be measured and controlled, regression analysis is used ; if the independent variable is also a random variable or an uncontrollable variable, correlation analysis is used .
2. One-element linear regression model
Regression function : set xxx is a controllable variable,YYY is the random variable associated with it. When the argumentxxWhen x takes a certain value, YYY has a definite(conditional) distributioncorresponding to it. IfYYThe mathematical expectation of Y exists, then its value varies withxxdepends on the value of x , so it is xxThe function of x , denoted asμ ( x ) \mu(x)μ(x),即 μ ( x ) = E ( Y ∣ x ) \mu(x)=E(Y|x) m ( x )=E ( Y ∣ x ) , abbreviated asμ ( x ) \mu(x)μ(x)为 Y Y Y aboutxxThe regression function of x . Functional relationship: x determines the value of Y‾ uniquely determines the regression analysis: x determines the distribution of Y‾ uniquely determines \textcolor{orange}{\text{functional relationship:}x\text{determined}\longrightarrow Y\text{ \underline{value} is uniquely determined}}\\ \textcolor{green}{\text{regression analysis: }x\text{determined}\longrightarrow Y\text{\underline{distribution} is uniquely determined}}Functional relationship: x determined⟶Y 'svalueuniquely determinedRegression analysis: x determined⟶Y 'sdistributionThe only definite basic task of regression analysis is to use experimental data to estimateYYY aboutxxRegression function of x μ ( x ) \mu(x)m ( x ) .
Univariate linear regression problem : Let YYY aboutxxThe regression function of x is μ ( x ) \mu(x)μ ( x ),若μ ( x ) \mu(x)μ ( x ) is a linear functionμ ( x ) = a + bx \mu(x)=a+bxm ( x )=a+b x , then estimateμ ( x ) \mu(x)The problem of μ ( x ) is called a linear regression problem of one variable.
Univariate linear regression model : Let xxx is the controllable variable,YYY is dependent onxxInstead of x , let { Y = a + bx + ε ε ~ N ( 0 , σ 2 ) \newcommand{\td}{\,\text{\textasciitilde}\,}\begin{case} Y=a+ bx+\valuepsilon\\ \valuepsilon\td N(0,\sigma^2) \end{cases}{ Y=a+bx+ee~N(0,p2)Among them, the unknown parameters a , b , σ 2 a,b,\sigma^2a,b,p2 does not depend onxxx , the model is called a linear regression model.
样本: ( x 1 , Y 1 ) , ( x 2 , Y 2 ) , ⋯ , ( x n , Y n ) (x_1,Y_1),(x_2,Y_2),\cdots,(x_n,Y_n) (x1,Y1),(x2,Y2),⋯,(xn,Yn)(Y 1 , Y 2 , ⋯ , Y n Y_1,Y_2,\cdots,Y_nY1,Y2,⋯,Ynare independent random variables)
sample values : ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( xn , yn ) (x_1,y_1),(x_2,y_2),\cdots,(x_n ,y_n)(x1,y1),(x2,y2),⋯,(xn,yn)
A sample form of a linear regression model : { Y i = a + bxi + ε i ε i ~ N ( 0 , σ 2 ) ( i = 1 , 2 , ⋯ , n ), 且 ε 1 , ε 2 , ⋯ , ε n mutually independent \newcommand{\td}{\,\text{\textasciitilde}\,}\begin{cases} Y_i=a+bx_i+\varepsilon_i\\ \varepsilon_i\td N(0,\sigma^2) \end {cases}\;(i=1,2,\cdots,n),\,\text{且}\varepsilon_1,\varepsilon_2,\cdots,\varepsilon_n\text{ mutually independent}{ Yi=a+bxi+eiei~N(0,p2)(i=1,2,⋯,n),且e1,e2,⋯,enMutually independent empirical regression linear equation: if by( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( xn , yn ) (x_1,y_1),(x_2,y_2),\cdots,(x_n, y_n)(x1,y1),(x2,y2),⋯,(xn,yn) get the unknown parametersa , ba,ba,Estimated value of b a ^ , b ^ \hat{a},\hat{b}a^,b^ , then for a givenxxx , we can takey ^ = a ^ + b ^ x \hat{y}=\hat{a}+\hat{b}xy^=a^+b^ xworksμ ( x ) = a + bx \mu(x)=a+bxm ( x )=a+The estimated value of b x , and the equationy ^ = a ^ + b ^ x \hat{y}=\hat{a}+\hat{b}xy^=a^+b^ xcalledYYY aboutxxThe empirical regression equation for x .
3. a , ba,ba,b和σ 2 \sigma^2p2 estimates
Least squares method : known variables x , Y x,Yx,Y 'snnn pairs of test data( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( xn , yn ) (x_1,y_1),(x_2,y_2),\cdots,(x_n,y_n)(x1,y1),(x2,y2),⋯,(xn,yn) , of whichxi x_ixiNot all the same, make the sum of squared deviations Q ( a , b ) = ∑ i = 1 n ( yi − a − bxi ) 2 Q(a,b)=\sum\limits_{i=1}^n{(y_i- a-bx_i)}^2Q(a,b)=i=1∑n(yi−a−bxi)2 , choose parametersa, ba, ba,bUse Q ( a , b) Q(a,b)Q(a,b ) the smallest, this method is called the least squares method.
In order to find Q ( a , b ) Q(a,b)Q(a,The minimum value of b ) needs to make ∂ Q ∂ a , ∂ Q ∂ b \cfrac{\partial Q}{\partial a},\cfrac{\partial Q}{\partial b}∂a∂Q,∂b∂QBoth are 0 00,即 { ∂ Q ∂ a = − 2 ∑ i = 1 n ( y i − a − b x i ) = 0 ∂ Q ∂ b = − 2 ∑ i = 1 n ( y i − a − b x i ) x i = 0 \begin{cases} \cfrac{\partial Q}{\partial a}=-2\sum\limits_{i=1}^n{(y_i-a-bx_i)}=0\\ \cfrac{\partial Q}{\partial b}=-2\sum\limits_{i=1}^n{(y_i-a-bx_i)}x_i=0 \end{cases} ⎩ ⎨ ⎧∂a∂Q=−2i=1∑n(yi−a−bxi)=0∂b∂Q=−2i=1∑n(yi−a−bxi)xi=0得方程组 { n a + b ∑ i = 1 n x i = ∑ i = 1 n y i a ∑ i = 1 n x i + b ∑ i = 1 n x i 2 = ∑ i = 1 n x i y i \begin{cases} na+b\sum\limits_{i=1}^n x_i=\sum\limits_{i=1}^n y_i\\ a\sum\limits_{i=1}^n x_i+b\sum\limits_{i=1}^n x_i^2=\sum\limits_{i=1}^n x_iy_i \end{cases} ⎩ ⎨ ⎧already+bi=1∑nxi=i=1∑nyiai=1∑nxi+bi=1∑nxi2=i=1∑nxiyi上式称为正规方程组。其系数行列式为 ∣ n ∑ i = 1 n x i ∑ i = 1 n x i ∑ i = 1 n x i 2 ∣ = n ∑ i = 1 n x i 2 − ( ∑ i = 1 n x i ) 2 = n ∑ i = 1 n ( x i − x ‾ ) 2 \begin{vmatrix} n&\sum\limits_{i=1}^n x_i\\ \sum\limits_{i=1}^n x_i&\sum\limits_{i=1}^n x_i^2 \end{vmatrix}=n\sum\limits_{i=1}^n x_i^2-{\left(\sum\limits_{i=1}^n x_i\right)}^2=n\sum\limits_{i=1}^n{\left(x_i-\overline{x}\right)}^2 ni=1∑nxii=1∑nxii=1∑nxi2 =ni=1∑nxi2−(i=1∑nxi)2=ni=1∑n(xi−x)2 becausexi x_ixiNot exactly the same, so the coefficient determinant is not 0 00 , so the normal system of equations has a unique solution, we geta , ba,ba,b的估计值为 { b ^ = S x y S x x = x ˉ y ˉ − x y ‾ x 2 ‾ − x ‾ 2 a ^ = y ‾ − b ^ x ‾ \begin{cases} \hat{b}=\cfrac{S_{xy}}{S_{xx}}=\cfrac{\bar{x}\bar{y}-\overline{xy}}{\overline{x^2}-\overline{x}^2}\\ \hat{a}=\overline{y}-\hat{b}\overline{x} \end{cases} ⎩ ⎨ ⎧b^=SxxSxy=x2−x2xˉyˉ−xya^=y−b^x其中 S x y = ∑ i = 1 n ( x i − x ‾ ) ( y i − y ‾ ) S_{xy}=\sum\limits_{i=1}^n \left(x_i-\overline{x}\right)\left(y_i-\overline{y}\right) Sxy=i=1∑n(xi−x)(yi−y), S x x = ∑ i = 1 n ( x i − x ‾ ) 2 S_{xx}=\sum\limits_{i=1}^n{\left(x_i-\overline{x}\right)}^2 Sxx=i=1∑n(xi−x)2。
Therefore, the empirical regression equation sought is y ^ = a ^ + b ^ x \hat{y}=\hat{a}+\hat{b}xy^=a^+b^ x. Ifa ^ = y ‾ − b ^ x ‾ \hat{a}=\overline{y}-\hat{b}\overline{x}a^=y−b^xSubstituting into the empirical regression equation, the empirical regression linear equation is y ^ = y ‾ − b ^ x ‾ + b ^ x = y ‾ + b ^ ( x − x ‾ ) \hat{y}=\overline{y}- \hat{b}\overline{x}+\hat{b}x=\overline{y}+\hat{b}\left(x-\overline{x}\right)y^=y−b^x+b^x=y+b^(x−x) , which shows that the empirical regression line equation passes through the geometric center of the scatter plot( x ‾ , y ‾ ) \left(\overline{x},\overline{y}\right)(x,y)。
Find σ 2 \sigma^2 belowp2的估计。令 y ^ i = a ^ + b ^ x i ( i = 1 , 2 , ⋯ , n ) \hat{y}_i=\hat{a}+\hat{b}x_i\,(i=1,2,\cdots,n) y^i=a^+b^xi(i=1,2,⋯,n),称 y i − y ^ i y_i-\hat{y}_i yi−y^ifor xi x_ixi处的残差,称 Q e = ∑ i = 1 n ( y i − y ^ i ) 2 = ∑ i = 1 n ( y i − a ^ − b ^ x i ) 2 Q_e=\sum\limits_{i=1}^n{\left(y_i-\hat{y}_i\right)}^2=\sum\limits_{i=1}^n{\left(y_i-\hat{a}-\hat{b}x_i\right)}^2 Qe=i=1∑n(yi−y^i)2=i=1∑n(yi−a^−b^xi)2 is the residual sum of squares. In factQ e = Q ( a ^ , b ^ ) Q_e=Q\!\left(\hat{a},\hat{b}\right)Qe=Q(a^,b^ )isQ ( a , b ) Q(a,b)Q(a,b ) the minimum value. It can be proved thatσ 2 \sigma^2pThe unbiased estimator of 2 is σ 2 ^ = Q en − 2 \hat{\sigma^2}=\dfrac{Q_e}{n-2}p2^=n−2Qe。
Q e Q_e QeIt can also be calculated in another way: Q e = S yy − ( b ^ ) 2 S xx Q_e=S_{yy}-{\left(\hat{b}\right)}^2S_{xx}Qe=Syy−(b^)2Sxx,其中 S y y = ∑ i = 1 n ( y i − y ‾ ) 2 S_{yy}=\sum\limits_{i=1}^n{\left(y_i-\overline{y}\right)}^2 Syy=i=1∑n(yi−y)2。
Equations: y^ = a^ + b^xb^ = S xy S xxa^ = y ‾ − b^ x ‾ σ 2^ = Q en − 2 = 1 n − 2 [ S yy − ( b ^ ) 2 S xx ] \boxed{\begin{aligned} \hat{y}&=\hat{a}+\hat{b}x\\ \hat{b}&=\cfrac{S_{xy}}{S_{ xx}}\\ \hat{a}&=\overline{y}-\hat{b}\overline{x}\\ \hat{\sigma^2}&=\cfrac{Q_e}{n-2} =\cfrac{1}{n-2}\left[S_{yy}-{\left(\hat{b}\right)}^2S_{xx}\right] \end{aligned}}y^b^a^p2^=a^+b^x=SxxSxy=y−b^x=n−2Qe=n−21[Syy−(b^)2Sxx]
4. A model that can be transformed into a linear regression
1. Y = α and β x ⋅ ε , ln ε ~ N ( 0 , σ 2 ) Y=\alpha e^{\beta x }\cdot\varepsilon,\,\ln\varepsilon\td N(0,\sigma^2);Y=a eβx⋅eh _lne~N(0,p2)
Let α , β , σ 2 \alpha,\beta,\sigma^2a ,b ,p2 is withxxThe unknown parameter of x is irrelevant.
Determine the infinitesimal equation ln Y = ln α + β x + ln ε Y ′ = a + bx + ε ′ \begin{aligned} \ln Y&=\ln\alpha+\beta x+\ln\varepsilon \\ Y'&=a+bx+\varepsilon' \end{aligned}lnYY′=lna+βx+lne=a+bx+e′For Y ′ = ln Y , a = ln α , b = β , ε ′ = ln ε Y'=\ln Y,\,a=\ln\alpha,\,b=\beta,\,\ varepsilon'=\ln\varepsilonY′=lnY,a=lna ,b=b ,e′=lneh .
2. Y = α + β h ( x ) + ε , ε ~ N ( 0 , σ 2 ) Y=\alpha+\beta h(x)+\varepsilon,\,\varepsilon\newcommand{\td}{\, . \text{\texttitle}\,}\td N(0,\sigma^2)Y=a+βh(x)+eh _e~N(0,p2)
Let α , β , σ 2 \alpha,\beta,\sigma^2a ,b ,p2 is withxxx- independent unknown parameter,h ( x ) h(x)h ( x ) isxxKnown function of x .
令 a = α , b = β , x ′ = h ( x ) a=\alpha,\,b=\beta,\,x'=h(x) a=a ,b=b ,x′=h ( x ) , then transformed into a linear regression model Y = a + bx ′ + ε Y=a+bx'+\varepsilonY=a+bx′+e