Machine learning-regression regression

This article is only my understanding, please correct me if there are deviations and errors

Regression: Regression is the first to propose an experimental model, but the key parameters of the model are unknown. A large amount of data is used to find the experimental model that best matches the test data, that is, to find the key parameters of the model. Next, use an example from Li Hongyi's lecture. See specifically what is regression

This is a very interesting example to evaluate the combat ability (CP) of Pokemon after evolution

It contains some other attributes, Xcp, Xs, Xhp, Xw, Xh, etc.

The first step is to determine a Model (model). Here a linear model is determined . It can also be expressed that b, w are parameters and can be any data, Xi is various attributes of Pokemon, w can be understood as weight, and b can be understood Is the deviation.

The second step is to determine which of these functions are good, in fact, it is to determine b, w, first collect the evolution data of Pokemon,

Below are the 10 data collected (feeling so few), and their two-dimensional graphs of data before and after evolution. Does it feel like a thing to learn in probability theory? In fact, it is to find a straight line to make these points as much as possible. Is close, the least square method is used;

Use the least square method to solve b, w; use loss function L, in fact it is the mean square error, the smaller the better; the solution b, w is L

The process of minimization is called the least square method parameter estimation of linear regression model;

Each point in the figure represents a model function, the color represents the mean square error, blue represents good, and red represents poor

The third step is to select the best function, which is the minimum value of L, which uses differential derivation. The method used here is the gradient descent method.

Gradient descent: here we first calculate in one dimension, there is only one unknown w

1. Randomly select an initial point

2. Calculate the derivative. If it is a negative number, increase the value of w, if it is a positive number, decrease w. In fact, this determines the direction of decline. This is called the learning rate. How much wi increases or decreases depends on the derivative and the learning rate.

So get the second point

3. Repeat the second step to find local optical,

In fact, everyone is very puzzled that only local optimal can be found in this way. In fact, there is no local optimal under this model.

Under two parameters: partial derivatives are used, and the others are basically the same as one parameter. See it for yourself (haha). There is also a point that there is no local optimum under two parameters;

This is an illustration of the steps to solve the problem

 

So I found this model: Calculate the average error rate , this is the sum of the vertical distance between the sample and the straight line; we are more concerned about the average error rate of the test, the average error rate of the test is slightly greater than the average error rate, then we should How to do it better:

The first: using the form of quadratic, the effect is obviously better than that of the first time, and we need to do better, we use the third power, the fourth power, and the fifth power:

It is not difficult to find that the training results are getting better and better, but the results of the fourth and fifth tests have become worse, and there are unreasonable results at the fifth power, and negative numbers appear. In fact, this is overfitting.

 

Above we mentioned that we collect too little data. Now we collect a lot of data: it is not difficult to find that our work above seems to be in vain. It is not a simple linear relationship. We can also find that their distribution is also related to their types. It is related, that is to say, it is not enough that we only look at one of his attributes (cp):

We have another way of processing, using the following linear structure:

According to 0 and 1, the following equations are helpful for understanding

Finding the following model is not difficult to find that the effect is better than the above, but there are still points above and below the straight line that we have to consider other attributes:

We can fill in all the attributes we think of, but we find that the training effect is good and the test effect is very poor. We use another method to adjust:

The smaller wi is, the better, which makes the result smoother, which is a certain number.

This is the result of the experiment: see if you can understand, I’m too lazy to write

############## Let me show you the code##########################

This is the code added later

##李弘毅线性回归模型demo
#y_data = b + w*x_data

import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import  numpy as np

x_data = [338.,333.,328.,207.,226.,25.,179.,60.,208.,606.]
y_data = [640.,633.,619.,393.,428.,27.,193.,66.,226.,1591.]

b = -120
w = -4
lr = 1##学习率//这个是添加lr_b,lr_w之后随意设置的

iteration = 100000###迭代次数
##求值过程之中b,w的保存用于画图
b_history = [b]
w_history = [w]
############添加lr_b,lr_w效果更好一些,这涉及到一个方法以后会写
lr_b = 0.0
lr_w = 0.0
############
for i in range(iteration):
    ##偏导
    b_grad = 0.0;
    w_grad = 0.0;

    for n in range(len(x_data)):
        b_grad = b_grad-2.0*(y_data[n]-b-w*x_data[n])*1.0
        w_grad = w_grad-2.0*(y_data[n]-b-w*x_data[n])*x_data[n]
#################################
    lr_b = lr_b + b_grad ** 2
    lr_w = lr_w + w_grad ** 2
################################
##更新b,w的值  b0->b1 w0->w1
    b = b - lr/(np.sqrt(lr_b))*b_grad
    w = w - lr/(np.sqrt(lr_w))*w_grad
##将数据保存,用于画图
    b_history.append(b)
    w_history.append(w)
    
#################作图准备工作########################
x = np.arange(-200,-100,1)##bias
y = np.arange(-5,5,0.1)##weight
Z =np.zeros((len(x),len(y)))
X,Y = np.meshgrid(x,y)
for i in range(len(x)):
    for j in range(len(y)):
        b= x[i]
        w = y[j]
        Z[j][i] = 0
        for n in range(len(x_data)):
            Z[j][i]  = Z[j][i] + (y_data[n] -b -w*x_data[n])**2
        Z[j][i] = Z[j][i]/len(x_data)
###########################作图###########################
plt.contourf(x,y,Z,50,alpha=0.5,cmap=plt.get_cmap('jet'))
plt.plot([-188.4],[2.67],'x',ms=12,markeredgewidth=3,color='orange')
plt.plot(b_history,w_history,'o-',ms=3,lw=1.5,color='black')
plt.xlim(-200,-100)
plt.ylim(-5,5)
plt.xlabel(r'$b$',fontsize=16)
plt.ylabel(r'$w$',fontsize=16)
plt.show()

Effect picture:

Guess you like

Origin blog.csdn.net/DALAOS/article/details/86511620