Chapter III watermelon book - linear regression model

This article is mainly for the notes, "Zhou Zhihua watermelon book", "pumpkin book" summary of combing the idea.

Linear Model : attribute linear combination
\ [f (\ boldsymbol {x }) = w_ {1} x_ {1} + w_ {2} x_ {2} + \ ldots + w_ {d} x_ {d} + b = \ omega ^ Tx + b \]
the basic idea contains:

  • Many powerful non-linear model is based on a linear model based on the introduced 层级结构or 高维映射obtained

  • Weight \ (\ omega \) to visually express 各属性the importance of forecasting

Data sets, said:

  • Data set \ (D = {(x ^ {(1)}, y ^ {(1)}), (x ^ {(2)}, y ^ {(2)}, \ dots, (x ^ {( i)}, y ^ {( i)}))} \) with a representative sample of the top right corner
  • Handling of discrete attributes
    • Ordered attribute: Discretization interpolation (e.g.: high, medium and low, can be transformed into {1.0,0.5,0.0})
    • Unordered attributes: one-hot technology (e.g.: watermelon, pumpkin, melon, may be converted to (0,0,1), (0,1,0), (1,0,0))

Unit linear regression:

For regression problems: the goal is to make \ (Loss function \) is minimized

\(Loss \ function:\) 最小二乘误差
\[ \begin{aligned}\left(w^{*}, b^{*}\right) &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \\ &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned} \]
先对\(b\)求偏导:
\[ 2\sum_{i=1}^m(y_i-\omega x_i-b)(-1)=0\\ \Rightarrow b={} \frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) \]
再对\(\omega\)求偏导:
\[ \begin{aligned} 0 &=w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i} \\ \Rightarrow & w \sum_{i=1}^{m} x_{i}^{2}\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m} b x_{i} \end{aligned}\\ \]
\(b\)代入上式中
\[ \Rightarrow w \sum_{i=1}^{m} x_{i}^{2}=\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m}(\bar{y}-w \bar{x}) x_{i}\\ \Rightarrow w\left(\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}\right)=\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i} \\ \Rightarrow w=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\bar{y} \sum_{i=1}^{m} x_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}} \]

By the following two equations can be converted to the formula watermelon book: [Tips]
\ [\ begin {aligned} \ bar {y} \ sum_ {i = 1} ^ {m} x_ {i} = {} & \ frac {1} {m} \ sum_ {i = 1} ^ {m} y_ {i} \ sum_ {i = 1} ^ {m} x_ {i} = \ bar {x} \ sum_ {i = 1} ^ {m} y_ {i} \\ \ bar {x} \ sum_ {i = 1} ^ {m} x_ {i} = {} & \ frac {1} {m} \ sum_ {i = 1} ^ {m} x_ {i} \ sum_ {i = 1} ^ {m} x_ {i} = \ frac {1} {m} \ left (\ sum_ {i = 1} ^ {m} x_ {i} \ right) ^ {2} \ end {aligned} \]

最终可得:
\[ \Rightarrow w=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} \]
可以求解得:\(\omega\)\(b\) 最优解的闭式解
\[ \begin{aligned} w={} &\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}}\\ b={} &\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) \end{aligned} \]

Further it can \ (\ omega \) to quantify easy programming []

\(\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}=\bar{x} \sum_{i=1}^{m} x_{i}\)代入分母可得:
\[ \begin{aligned} w &=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\bar{x} \sum_{i=1}^{m} x_{i}} \\ &=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \bar{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}\right)} \end{aligned} \]
由以下两个等式:【技巧】
\[ \bar{y} \sum_{i=1}^{m} x_{i}=\bar{x} \sum_{i=1}^{m} y_{i}=\sum_{i=1}^{m} \bar{y} x_{i}=\sum_{i=1}^{m} \bar{x} y_{i}=m \bar{x} \bar{y}=\sum_{i=1}^{m} \bar{x} \bar{y} \\ \sum_{i=1}^mx_i\bar{x}=\bar{x} \sum_{i=1}^{m} x_{i}=\bar{x} \cdot m \cdot \frac{1}{m} \cdot \frac{1}{m} \cdot \sum_{i=1}^{m} x_{i}=m \bar{x}^{2}=\sum_{i=1}^{m} \bar{x}^{2} \]

Can \ (\ omega \) expression into:

\[ \begin{aligned} w &=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \bar{x}-x_{i} \bar{y}+\bar{x} \bar{y}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \bar{x}-x_{i} \bar{x}+\bar{x}^{2}\right)} \\ &=\frac{\sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sum_{i=1}^{m}\left(x_{i}-\bar{x}\right)^{2}} \end{aligned} \]
\(\boldsymbol{x}_{d}=\left(x_{1}-\bar{x}, x_{2}-\bar{x}, \ldots, x_{m}-\bar{x}\right)^{T}\)\(\boldsymbol{y}_{d}=\left(y_{1}-\bar{y}, y_{2}-\bar{y}, \dots, y_{m}-\bar{y}\right)^{T}\)

The quantization result is:
\ [W = \ FRAC {\ boldsymbol {X} _ {D} ^ {T} \ boldsymbol {Y} _ {D}} {\ boldsymbol {X} _ {D} ^ {T} \ boldsymbol {x} _ {d} } \]

Multiple linear regression

Derivation: means for linear regression, we are the first to (B \) \ partial derivative, then \ (B \) is substituted for \ (\ Omega \) of the deflector

For polyols can not be solved directly by the deflector \ (Loss \ function \) extremum, rewrite the expression
\ [f (x_ {i} ) = \ omega ^ T x_i + b = \ beta ^ TX \]

Wherein
\ [\ beta = (\ omega , b) \\ X = (x_i, 1) \]

Details derived as follows :( handwriting too lazy to hand fight)

The loss function derivative:

For the above formula, if \ ((X ^ TX) \ ) full rank matrix or a positive definite matrix, an inverse matrix exists

For example, microarray bioinformatics data in the often thousands of properties, but only tens or hundreds of samples. At this time, a plurality of solvable \ (\ Hat {\} Omega \) , which can make the mean square error is minimized. It is common practice to introduce the regularization term solution to this problem .

Higher-order cognitive (deduced from the whiteboard)

Geometry [the LSE]

:( That probability point of view to consider the issue from the perspective of model generation)

Guess you like

Origin www.cnblogs.com/wangjs-jacky/p/11790058.html