[Practical Machine Learning] 3.3 Linear Model

linear regression

Example of Linear Regression - House Price Prediction

  • Suppose there are three features: x 1 = x_1 =x1= #beds, x 2 = x_2 = x2= #baths, x 3 = x_3 = x3= #living sqft,
  • Suppose the predicted value is the weighted sum of all input features: y = w 1 x 1 + w 2 x 2 + w 3 x 3 + by=w_1 x_1+w_2 x_2+w_3 x_3+by=w1x1+w2x2+w3x3+b
  • Weight w 1 , w 2 , w 3 w_1,w_2,w_3w1,w2,w3and offset bbb will be learned from the training data

General form of linear regression

In general, given data x = [ x 1 , x 2 , … , xp ] \mathbf{x}=\left[x_1, x_2, \ldots, x_p\right]x=[x1,x2,,xp] , that is, each sample is expressed asppp- dimensional features, then the linear model predicts

y ^ = w 1 x 1 + w 2 x 2 + … + w p x p + b = ⟨ w , x ⟩ + b \hat{y}=w_1 x_1+w_2 x_2+\ldots+w_p x_p+b=\langle\mathbf{w}, \mathbf{x}\rangle+b y^=w1x1+w2x2++wpxp+b=w,x+b

其实 w , x \mathbf{w}, \mathbf{x} w,x are all long asppthe vector of p ,w \mathbf{w}w andbbb are parameters that can be learned.

If the above formula is to be implemented through code, it can be written as follows:

# weight w has shape (p,1)
# bias b is a scalar
# data x has shape (p,1)
y_hat = (x*w).sum + b

objective function

Suppose the collection of nnnUndersubjective valueX= [ x 1 , x 2 , ... , xn ] T ∈ R n × p \mathbf{X}=\left[\mathbf{x}_1, \mathbf{x}_2, \ldots, \ mathbf{x}_n\right]^T\n\mathbb{R}^{n\times p}X=[x1,x2,,xn]TRn×p X \mathbf{X} X is annn rowppA matrix of p columns, whileX \mathbf{X}X input functiony = [ y 1 , ... , yn ] T ∈ R n \mathbf{y}=\left[y_1, \ldots, y_n\right]^T \in \mathbb{R}^ny=[y1,,yn]TRn

Goal: Minimize the mean square error (MSE)

w ∗ , b ∗ = argmin ⁡ w , b l ( X , y , w , b ) = argmin ⁡ w , b 1 n ∑ i = 1 n ( yi − ⟨ xi , w ⟩ − b ) 2 \begin{aligned }\mathbf{w}^*, \mathbf{b}^* & =\subset{\mathbf{w}, b}{\operatorname{argmin}} \ell(\mathbf{X}, \mathbf{y} , \mathbf{w}, b) \\& =\underset{\mathbf{w}, b}{\operatorname{argmin}} \frac{1}{n}\sum_{i=1}^n\left (y_i-\left\angle\mathbf{x}_i, \mathbf{w}\right\angle-b\right)^2\end{aligned}w,b=w,bargmin(X,y,w,b)=w,bargminn1i=1n(yixi,wb)2

linear classification

The output of regression is a continuous real number, but for classification the output is a prediction of the class.

Multi-category classification:

  • If we want to output multiple types of objects, we can output a vector. Specifically, the output can be mm in lengthThe vector of m , the iiin the vectorThe i element reflects the classification as theiiThe confidence (probability) of i , the higher the value, the more likely it belongs to this category, and the lower the less likely it is.
  • We can use a linear model oi = ⟨ x , wi ⟩ + bi o_i=\left\langle\mathbf{x}, \mathbf{w}_i\right\rangle+b_ioi=x,wi+bi, where x \mathbf{x}x is the feature of the data,wi \mathbf{w}_iwiis a long ppA vector of p , representing the corresponding classiiThe parameters that i can learn,bi b_ibiIndicates this type of offset. Then for class iii , the degree of confidence isoi o_ioi. because mmm each category, so there are a total ofmmmpcsoi o_i_oi
  • 标签y = [ y 1 , y 2 , … , ym ] \mathbf{y}=\left[y_1, y_2, \ldots, y_m\right]y=[y1,y2,,ym] , where only oneyi = 1 y_i=1yi=1 , all others are0 00 , which is one-hot encoding (one-hot encoding), indicates that it belongs to theiiclass i .
  • We want to minimize the mean squared error (MSE) 1 m ∥ o − y ∥ 2 2 \frac{1}{m}\|\mathbf{o}-\mathbf{y}\|_2^2m1oy22
  • The predicted category is argmax ⁡ i { oi } i = 1 m \operatorname{argmax}_i\left\{o_i\right\}_{i=1}^margmaxi{ oi}i=1m, indicating that the predicted result is the iiclass i , thisiii need to useoi o_ioimaximum.
Untitled

Softmax regression (Softmax regression)

Because our goal is to make all ooo sumyyy is the same, but when actually doing the classification, we don't care aboutoo'so , we only hope that the confidence of the true category is large and strong enough. So in order to make our model more focused on the correct class and not caring about other classes, we propose to use softmax.

  • First, you need to convert the predicted score into a probability. oi o_ioiThe output is a real number, between negative infinity and positive infinity. If you want to convert it into a probability, you must make it greater than or equal to 0, and all probabilities add up to 1.

y ^ = softmax ⁡ ( o )  where  y ^ i = exp ⁡ ( o i ) ∑ k = 1 m exp ⁡ ( o k ) \hat{\mathbf{y}}=\operatorname{softmax}(\mathbf{o}) \text { where } \hat{y}_i=\frac{\exp \left(o_i\right)}{\sum_{k=1}^m \exp \left(o_k\right)} y^=softmax(o) where y^i=k=1mexp(ok)exp(oi)

O_exp = torch.exp(O)
partition = O_exp.sum(1, keepdim=True)
Y = O_exp / partition

Although nonlinear changes are used here, it is still a linear model. Because the model is looking for the largest y ^ i \hat y_i when making a decisiony^iEquivalent to the largest oi o_ioi,即 argmax ⁡ i y ^ i = argmax ⁡ i o i \operatorname{argmax}_i \hat{y}_i=\operatorname{argmax}_i o_i argmaxiy^i=argmaxioi

  • If you want to compare probabilities y ^ \hat yy^yyThe difference between y , we can use cross-entropy (cross-entropy):

H ( y , y ^ ) = ∑ i − yi log ⁡ ( y ^ i ) = − log ⁡ y ^ y H(\mathbf{y}, \hat{\mathbf{y}})=\sum_i-y_i \ log \left(\hat{y}_i\right)=-\log \hat{y}_yH(y,y^)=iyilog(y^i)=logy^y

Because actually yi y_iyiOnly one of them is 1, and the rest are all 0, so the cross entropy can be simplified as − log ⁡ y ^ y -\log \hat{y}_ylogy^y − log ⁡ -\log log is a decreasing function, so in order to minimize the cross entropy, lety ^ y \hat{y} _yy^ymaximum. Then this model ultimately only cares about our predicted probability on the correct category , and doesn't care much about other values.

References

3.3 The simplest and most commonly used linear model [Stanford 21 Fall: Practical Machine Learning Chinese Edition]_哔哩哔哩_bilibili

3.1. Linear Regression — Dive into Deep Learning 1.0.0-beta0 documentation

https://c.d2l.ai/stanford-cs329p/_static/pdfs/cs329p_slides_4_3.pdf

[Machine Learning] Re-understanding Linear Regression - 1 - Maximum Likelihood Estimation_哔哩哔哩_bilibili

Guess you like

Origin blog.csdn.net/weixin_46421722/article/details/129667003