Introduction to SGD

SGD (Stochastic Gradient Descent), translated as stochastic gradient descent, is a commonly used function optimization method in deep learning.

1.Citation

Introducing SGD SGDBefore SG D, let me first introduce an example. There are three people on the top of the mountain who are thinking about how to go down the mountain quickly. The boss, the second child and the third child respectively put forward three different opinions.

  • The boss said: Starting from the top of the mountain, every time you walk a certain distance, you will look for all the nearby mountain roads and choose the steepest mountain road to continue moving forward. As the name suggests, the boss always chooses the steepest mountain road to walk.

  • The second child said: Starting from the top of the mountain, every time you walk a certain distance, you will randomly search for some nearby mountain roads, and choose the steepest mountain road to continue moving forward. As the name suggests, the second child will randomly search for some mountain roads, and then take the steepest one.

  • The third child said: Starting from the top of the mountain, just randomly choose the mountain road and walk until you reach the bottom of the mountain.

Although the boss 's walking method is optimal for every road, it will consume a lot of time in the process of finding the steepest mountain road.
Insert image description here
Although the second child ’s walking method does not guarantee that the path will be optimal every time, it can ensure that the path will be better every time, and it does not need to spend a lot of time looking for the steepest mountain road.
Insert image description here

The third child 's moves are more random, and the path he takes each time may be the best or the worst.
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Vp2WZCwN-1676107686927)(/image_editor_upload/20220730/20220730050337_83545.png)]

So who do you think reaches the bottom of the mountain first? After studying SGD SGDAfter SG D , you will get your answer.

2. Introduction to SGD

2.1 Introducing problems

Give you xy xyx y coordinate system, with some points on it, gives you a straight line through the originy = wxy=wxy=w x , how to fit these points in the fastest way?
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-OOYBhgNT-1676107686929)(/image_editor_upload/20220730/20220730050742_20671.png)]

In order to solve this problem, we need to define a goal for the problem, which is to minimize the deviation of all points from the straight line. Our commonly used error function is the mean square error , for a point p 1 p_1p1In other words, its mean square error with the straight line can be defined as e 1 e_1e1
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-c26Kyylh-1676107686933)(/image_editor_upload/20220730/20220730050907_67003.png)]

e 1 = ( y 1 − w x 1 ) 2 = ( w x 1 − y 1 ) 2 e_1=(y_1-wx_1)^2=(wx_1-y_1)^2 e1=(y1wx1)2=(wx1y1)2
Complete square expansion:
e 1 = w 2 x 1 2 − 2 ( wx 1 y 1 ) + y 1 2 e_1=w^2{x_1}^2-2(wx_1y_1)+{y_1}^2e1=w2x _122(wx1y1)+y12 e 1 = x 1 2 w 2 − 2 ( x 1 y 1 ) w + y 1 2 e_1={x_1}^2w^2-2(x_1y_1)w+{y_1}^2 e1=x12w _22(x1y1)w+y12Similarly
, pointp 2 p2p2 p 3 p3 p3,. _ _ . . ...... p n pn This is true for p n
: e 2 = x 2 2 w 2 − 2 ( x 2 y 2 ) w + y 2 2 e_2={x_2}^2w^2-2(x_2y_2)w+{y_2}^2e2=x22w _22(x2y2)w+y22 e 3 = x 3 2 w 2 − 2 ( x 3 y 3 ) w + y 3 2 e_3={x_3}^2w^2-2(x_3y_3)w+{y_3}^2 e3=x32w _22(x3y3)w+y32 en = xn 2 w 2 − 2 ( xnyn ) w + yn 2 e_n={x_n}^2w^2-2(x_ny_n)w+{y_n}^2en=xn2w _22(xnyn)w+yn2And
our final errore = ( ∑ e 1 + e 2 + . . . + en ) / ne=(\sum{e_1+e_2+...+e_n})/ne=(e1+e2+...+en) / nBy
merging similar items:
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-TEdFVq6M-1676107686938)(/image_editor_upload/20220730/20220730033337_23748.png)]
finally get:
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-O3OoFB8m-1676107686942)(/image_editor_upload/20220730/20220730033844_37099.png)]

Because a = x 1 2 + . . . + xn 2 a={x_1}^2+...+{x_n}^2a=x12+...+xn2 , soa > 0 a>0a>0 , soeee is an upward parabola.
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-KEOyQ3GV-1676107686945)(/image_editor_upload/20220730/20220730035607_63923.png)]
After defining the error function, we can start to calculate the gradient.
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-XAaZ1Nsf-1676107686948)(/image_editor_upload/20220730/20220730040028_16349.png)]
Obviously, whenew ewew is the lowest point in the image,eee is the smallest, wwat this timew is optimal.
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-9Z6I2XFP-1676107686951)(/image_editor_upload/20220730/20220730041214_11905.png)]
How to quickly move from the yellow point on the right to the lowest point? This is stochastic gradient descent. Starting from its own position, it explores every other distance and randomly selects a direction with the largest gradient to move until it reaches the lowest point.
How often should we explore? This is the learning ratelearning learninglearning r a t e rate rate了,当 l e a r n i n g learning learning r a t e = 0.1 rate=0.1 rate=0.1
Insert image description here
l e a r n i n g learning learning r a t e = 0.2 rate=0.2 rate=At 0.2,
Insert image description here
a good learning rate can quickly minimize the points

2.2 Calculation steps of SGD

Insert image description here

Going back to the problem of climbing the mountain just now, we learned through a large amount of data experiments that the second child’s SGD SGDThe SG D method can reach the bottom of the mountain the fastest.

3. Code implementation of SGD

from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
X_scaler = StandardScaler()
y_scaler = StandardScaler()
X = [[50],[100],[150],[200],[250],[300],[50],[100],[150],[200],[250],[300],[50],[100],[150],[200],[250],[300],[50],[100],[150],[200],[250],[300],[50],[100],[150],[200],[250],[300],[50],[100],[150],[200],[250],[300],[50],[100],[150],[200],[250],[300],[50],[100],[150],[200],[250],[300]]
y = [[150],[200],[250],[280],[310],[330],[150],[200],[250],[280],[310],[330],[150],[200],[250],[280],[310],[330],[150],[200],[250],[280],[310],[330],[150],[200],[250],[280],[310],[330],[150],[200],[250],[280],[310],[330],[150],[200],[250],[280],[310],[330],[150],[200],[250],[280],[310],[330]]
#plt.show()
X = X_scaler.fit_transform(X) #用什么方法标准化数据?
y = y_scaler.fit_transform(y)
X_test = [[40],[400]] # 用来做最终效果测试
X_test = X_scaler.transform(X_test) 
model = SGDRegressor()
model.fit(X, y.ravel())
y_result = model.predict(X_test)
plt.title('single variable')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.plot(X, y, 'k.')
plt.plot(X_test, y_result, 'g-')
plt.show()

result:
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-DvmfgnmY-1676107686960)(/image_editor_upload/20220730/20220730050240_27787.png)]

Guess you like

Origin blog.csdn.net/qq_55126913/article/details/128986125