Chapter 1: Part1 Supervised Learning: Regression

Linear regression (linear regression model)

linear regression model

Regression: can predict a number as output

is a special supervised learning model

Example: The price of feet can be obtained by fitting the curve with the known house price

image-20221130201900860

Distinguish between regression and classification : the output of classification is generally discrete, and the number of

  • the term

Training set (train set) :

image-20221130202324962

Input variable: X (X=2014)

Output variable: Y : (Y=232)

Total number of training samples: M

A training sample: (X,Y)
( xi , yi ): i represents the first row in the training set (x^i,y^i): i represents the first few rows in the training set(xi,yi):i represents the first row in the training set

  • generate rules

A function f is obtained through the training set , which can obtain the predicted y by inputting x

y^: is recorded as the predicted y

image-20221130204448643

  • representation of f

If the default is straight

f w , b ( x ) = w x + b f_{w,b}(x)=wx+b fw,b(x)=wx+b

Generate the image:

Linear regression equation with only one input variable

image-20221130204752166

Cost function formula

In order to implement the regression algorithm, the first step is to construct the cost function

Different w and b represent different weights (input x):

image-20221130213530462

Different w and b will constitute different functions, we need the function to fit better as possible

Cost function: Measures how well a line fits the training data

m: the number of samples in the training set

Multiply by 1/2m: in order to make the later variance smaller

J ( w , b ) = 1 / 2 m ∑ i = 0 m ( y ^ − y ) 2 : The most commonly used cost function formula J(w,b)=1/2m\sum_{i=0}^{m } (\hat{y}-y)^2: \text{The most commonly used cost function formula}J(w,b)=1/2 mi=0m(y^y)2 :The most commonly used cost function formula

That is, it can be written as:
J ( w , b ) = 1 2 m ∑ i = 0 m ( fw , b ( x ( i ) ) − y ) 2 J(w,b)=\frac{1}{2m}\ sum_{i=0}^{m} (f_{w,b}(x^{(i)})-y)^2J(w,b)=2 m1i=0m(fw,b(x(i))y)2

Understand the cost function

First, set b=0 and observe how w affects f(x) and j(w)

f(x): Given a w, the variable is x

The image when w=1 is as follows:

image-20221130220019926

The w=0.5 image is as follows:

image-20221130220519270

j(w): the variable is w

Calculate different j(w) through the above figure

image-20221130220202261

image-20221130220604411

Then plot j(w) according to the value

image-20221130220708509

According to the image, the smaller j(w) is, the better the function fit

So the purpose of linear regression : to find the parameter w/b, so that the value of j(w) is the smallest

Visualize the cost function

Observe how w,b affect fw,b(x) and j(w,b)

f(x): Given a w&b, the variable is x

The image when w=0.06, b=50 is as follows:

image-20221202100534714

Then plot j(w,b) against the values:

x-axis (b), y-axis (w), z-axis (J)

image-20221202100752655

The above image can be converted into a contour map (two-dimensional), with the center point being the minimum value:

For example: take b=800, w=-0.15, the cost function is larger

image-20221202101649756

The fitting function of the above values ​​is not very good

image-20221202101815906

Through the above, we understand that the cost function is related to w and b. Next, how to find w and b to minimize the cost function . The method is: gradient descent algorithm


gradient descent

is an algorithm that can minimize an arbitrary function such that the function falls in the way that the slope falls the fastest

For example, the figure below: Simulate a villain, and find the lowest point step by step from the highest point of J (cost function)

image-20221202204155957

And choosing different starting points may find different local optimal solutions

Implementation of gradient descent method

  • Formula (same applies to b)

α: Learning rate (0~1), representing the magnitude of decline
w = w − α d J ( w , b ) dww=w-\alpha {d{J(w,b)}\over dw}w=wadwdJ(w,b)

Fine-tune the current w

d J ( w , b ) dw : represents which direction to descend from {d{J(w,b)}\over dw}:\textcolor{red}{represents which direction to descend from}dwdJ(w,b):In which direction does it descend

Note: The updates of w and b need to be synchronized

image-20221203131810544

Understanding Gradient Descent Algorithm

You can set b to 0, then the cost function is a curve:

image-20221203132532029

d J ( w , b ) dw : the slope of the tangent {d{J(w,b)}\over dw}:\textcolor{red}{the slope of the tangent}dwdJ(w,b):slope of tangent

If the slope is positive , then the value of w will gradually decrease , and the value will move to the left, gradually approaching the minimum value in the middle

image-20221203133018874

Understanding Learning Rate α

  • If the learning rate is too small , it takes many steps to reach the minimum point of w
image-20221203133838943
  • If the learning rate is too large , the minimum point may be missed , or the interference to noise is not strong
image-20221203134040752

Also, if the gradient descent algorithm has reached a local minimum , it terminates

Because its current slope is already 0

image-20221203134447835

Gradient Descent for Linear Regression

Default expression for linear regression:

image-20221203195305820

Substituting into the gradient descent formula, we can get:

The following formula: use the above J(w,b) to find the partial differential

image-20221203195524930

image-20221203195634975

This function can then be substituted into the

The actual simulated image can be as shown in the figure:

It can be seen that given any value of w and b, this method will find the **local minimum value of the cost function (fitting locally is the best)** to fit the curve

image-20221203200428551

  • Glossary
    • Batch Gradient Descent : Considering all training set samples

Guess you like

Origin blog.csdn.net/weixin_66261421/article/details/130031925