For mmFor m samples, use MSE as the loss function.
If the sample has only one attribute xi x_ixi, Then the loss function is:
L = 1 m ∑ i (w ∗ xi + yi) 2 L=\frac{1}{m}\sum_i(w*x_i+y_i)^2L=m1i∑(w∗xi+Yi)2
Let's draw the image of the loss function:
import numpy as np
import matplotlib.pyplot as plt
def image(b=0):
dense = 400
w = np.linspace(-2,4,dense)
x_train = np.array([0, 1, 2, 3, 4, 5])
y_train = np.array([1.1, 2.2, 3.8, 4.1, 4.9, 5.2])
for i in range(dense):
loss = np.square(w[i] * x_train+b - y_train).sum() / len(x_train)
plt.scatter(w[i],loss,s=1)
plt.title('Loss Func Line')
plt.xlabel('w')
plt.ylabel('loss')
plt.axis([0,3,0,10])
plt.axhline(y=1, ls=":", c="black")
plt.axhline(y=2, ls=":", c="black")
plt.axhline(y=4, ls=":", c="black")
plt.axhline(y=7, ls=":", c="black")
plt.show()
image()
image(0.5) | image(1) |
image(1.7) | image(2.5) |
You can see the bias bbThe change of b directly affects the minimum value of the loss function and the correspondingwww。
Above we are fixing bbb picturewwThe dividends of w and loss. Now we draw the weightww in3D coordinatesw and biasbbb corresponding loss:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # Axes3D 包含的是实现3d绘图的各种方法
x_train = np.array([0, 1, 2, 3, 4, 5])
y_train = np.array([1.1, 2.2, 3.8, 4.1, 4.9, 5.2])
dense = 400
w,b = np.meshgrid(np.linspace(-2,4, dense),np.linspace(-7,10, dense))
# y = wx+b
def get_loss_value(w,b):
return np.square(w*x_train+b - y_train).sum()/len(x_train)
def loss_point():
loss_list = []
for i in range(dense):
loss_list2=[]
for j in range(dense):
loss = get_loss_value(w[i][j], b[i][j])
loss_list2.append(loss)
loss_list.append(loss_list2)
return loss_list
fig = plt.figure()
ax = Axes3D(fig)
loss = np.array(loss_point())
# 添加坐标轴(顺序是Z, Y, X)
ax.set_xlabel('w')
ax.set_ylabel('b')
ax.set_zlabel('L')
ax.plot_surface(w, b, loss, rstride=30,cstride=30, cmap='jet')
plt.show()
import numpy as np
import matplotlib.pyplot as plt
x_train = np.array([0, 1, 2, 3, 4, 5])
y_train = np.array([1.1, 2.2, 3.8, 4.1, 4.9, 5.2])
dense = 400
# y = wx+b
def get_loss_value(w,b):
return np.square(w*x_train+b - y_train).sum()/len(x_train)
w = np.linspace(-2,4,dense)
b = np.linspace(-7,10,dense)
def draw_contour_line(dense,isoheight): #dense表示取值的密度,isoheight表示等高线的值
list_w = []
list_b = []
list_loss = []
for i in range(dense):
for j in range(dense):
loss = get_loss_value(w[i],b[j])
if 1.05*isoheight>loss>0.95*isoheight:
list_w.append(w[i])
list_b.append(b[j])
else:
pass
plt.scatter(list_w,list_b,s=1) #s=0.25比较合适
draw_contour_line(dense,1)
draw_contour_line(dense,4)
draw_contour_line(dense,7)
draw_contour_line(dense,10)
draw_contour_line(dense,20)
draw_contour_line(dense,30)
draw_contour_line(dense,50)
draw_contour_line(dense,100)
draw_contour_line(dense,200)
plt.title('Loss Func Contour Line')
plt.xlabel('w')
plt.ylabel('b')
plt.axis([-2,4,-7,10])
plt.show()
Figure 3 is the same as we expected. Figure 3 is like using a wob wob parallel to Figure 2The w o b plane is a tangent plane along the L axis, and the points touched by the tangent plane have the same loss. It can also be seen from Figure 2 that when loss is less than 20, an elliptical contour can appear, and when the loss is large, it is not elliptical.
So we often see this picture when learning the gradient descent algorithm:
this picture is usually followed by a sentence,the direction of the gradient descent is perpendicular to the tangent direction of the contour line.
It can be seen that the change of the point in the parameter space is equivalent to the "uphill" or "downhill" of the loss function.
Reference article:
https://blog.csdn.net/keeppractice/article/details/105620966
Understanding that the gradient direction is perpendicular to the contour direction