[Ch03-01] mean square error loss function

Series blog, the original author maintained on GitHub: https://aka.ms/beginnerAI ,
click on the star with a star do not mean more stars the author harder.

3.1 mean square error function

MSE - Mean Square Error。

This function is the most intuitive of a loss function, calculating the Euclidean distance between the predicted value and the true value. The predicted value and the true value of the closer, the smaller of the two mean square error.

Mean square error function commonly used in linear regression (linear regression), i.e., the fitting function (function fitting). Formula is as follows:

\ [Loss = {1 \ over 2} (zy) ^ 2 \ tag {} Single Sample \]

\ [J = \ frac {1} {2m} \ sum_ {i = 1} ^ m (z_i-y_i) ^ 2 \ tag {multisample} \]

3.1.1 Works

To get a gap between the predicted value and the actual value of y, the most simple idea is to use \ (Error = a_i-y_i \) .

For a single sample, it is no problem to do so, but the cumulative time of multiple samples, \ (a_i-y_i \) there may be positive or negative, it will lead to error cancel each other out when summed to lose value. So with the absolute value of the difference between the idea that \ (Error = | a_i-y_i | \) . It looks very simple, and very good, then why should the introduction of mean square error loss function? The comparison of the two kinds of loss function shown in Table 3-1.

Table 3-1 Comparison of the absolute value of the mean squared error loss function loss function

Sample tag value Sample predictive value Absolute value loss function Mean square error loss function
\([1,1,1]\) \([1,2,3]\) \((1-1)+(2-1)+(3-1)=3\) \((1-1)^2+(2-1)^2+(3-1)^2=5\)
\([1,1,1]\) \([1,3,3]\) \((1-1)+(3-1)+(3-1)=4\) \((1-1)^2+(3-1)^2+(3-1)^2=8\)
\(4/3=1.33\) \(8/5=1.6\)

You can see 5 to 3 has a lot of big, big double the 8 to 4, and 8 to 5 also amplifies the effects of partial loss of a sample of the global brought with terms, is the "big deviation for some sensitive sample ", so pay sufficient attention to the supervision of the training process, in order to return the error.

3.1.2 actual case

Given a set of data shown in Figure 3-3, we wanted to find a fitting straight line.

Sample data on the plane 3-3 in FIG.

In Figure 3-4, the first three shows a gradual process to find the best fit line.

  • First, with the mean square error function is calculated Loss = 0.53;
  • Second, some of the linear translation up, error calculation Loss = 0.16, much less than one error map;
  • Third, and some pan up, error calculation Loss = 0.048, thereafter may continue to try to translate (to change the value b) or a transformation angle (w change value), to give less loss function value;
  • Fourth, away from the optimal position error value Loss = 0.18, this case, the algorithm will attempt to reverse the direction downwards.

Figure 3-4 loss function value of the linear relationship between the position

FIG third with minimal loss function value. Comparison of the second and fourth graph, due to the loss of function of the variance of both are positive, how to judge is to move up or move down it?

In the actual training process, it is not necessary to calculate the value of the loss function, because the loss of function values ​​will be reflected in the process of back-propagation. We look at derivatives mean square error function:

\[ \frac{\partial{J}}{\partial{a_i}} = a_i-y_i \]

Although \ ((a_i-y_i) ^ 2 \) is always positive, but \ (a_i-y_i \) it can be positive (straight line at the point below) or negative (straight line at the top of the point), the positive or negative Back propagation is reversed during the previous calculations, the training process will attempt to boot in the right direction.

In the above example, we have two variables, a w, a B, a change will affect the final loss of function of the value of these two values.

We assume that the equation of the fitted line is y = 2x + 3, when we fixed w = 2, the b value changes from 2 to 4, see the changes in value of the loss function shown in Figure 3-5.

FIG fixing 3-5 W, resulting in loss variation value b

我们假设该拟合直线的方程是y=2x+3,当我们固定b=3,把w值从1到3变化时,看看损失函数值的变化如图3-6所示。

图3-6 固定b时,W的变化造成的损失值

3.1.3 损失函数的可视化

损失函数值的3D示意图

横坐标为W,纵坐标为b,针对每一个w和一个b的组合计算出一个损失函数值,用三维图的高度来表示这个损失函数值。下图中的底部并非一个平面,而是一个有些下凹的曲面,只不过曲率较小,如图3-7。

图3-7 W和b同时变化时的损失值形成的曲面

损失函数值的2D示意图

在平面地图中,我们经常会看到用等高线的方式来表示海拔高度值,下图就是上图在平面上的投影,即损失函数值的等高线图,如图3-8所示。

图3-8 损失函数的等高线图

如果还不能理解的话,我们用最笨的方法来画一张图,代码如下:

    s = 200
    W = np.linspace(w-2,w+2,s)
    B = np.linspace(b-2,b+2,s)
    LOSS = np.zeros((s,s))
    for i in range(len(W)):
        for j in range(len(B)):
            z = W[i] * x + B[j]
            loss = CostFunction(x,y,z,m)
            LOSS[i,j] = round(loss, 2)

上述代码针对每个w和b的组合计算出了一个损失值,保留小数点后2位,放在LOSS矩阵中,如下所示:

[[4.69 4.63 4.57 ... 0.72 0.74 0.76]
 [4.66 4.6  4.54 ... 0.73 0.75 0.77]
 [4.62 4.56 4.5  ... 0.73 0.75 0.77]
 ...
 [0.7  0.68 0.66 ... 4.57 4.63 4.69]
 [0.69 0.67 0.65 ... 4.6  4.66 4.72]
 [0.68 0.66 0.64 ... 4.63 4.69 4.75]]

然后遍历矩阵中的损失函数值,在具有相同值的位置上绘制相同颜色的点,比如,把所有值为0.72的点绘制成红色,把所有值为0.75的点绘制成蓝色......,这样就可以得到图3-9。

图3-9 用笨办法绘制等高线图

此图和等高线图的表达方式等价,但由于等高线图比较简明清晰,所以以后我们都使用等高线图来说明问题。

代码位置

ch03, Level1

Guess you like

Origin www.cnblogs.com/woodyh5/p/11956858.html