Machine Learning: How to Use Least Squares in Python

The reason why we say "use" rather than "implement" is because python's related class libraries have already helped us implement specific algorithms, and we just need to learn to use them. With the gradual mastery and accumulation of technology, when the algorithms in the class library can no longer meet our own needs, we can also try to implement various algorithms in our own way.

      Closer to home, what is the "least square method"?

      Definition: Least squares (also known as least squares) is a mathematical optimization technique that finds the best functional match for data by minimizing the sum of squared errors.

      Function: The unknown data can be easily obtained by using the least squares method, and the sum of squares of the errors between the obtained data and the actual data can be minimized.

      Principle: Determine the position of the straight line with the "minimum residual sum of squares" (in mathematical statistics, residual refers to the difference between the actual observed value and the estimated value)

      Mathematical formula:

 

      Basic idea: For a univariate linear regression model, assuming that n groups of observations (X1, Y1), (X2, Y2), ..., (Xn, Yn) are obtained from the population, for these n points in the plane, you can use Countless curves to fit. Linear regression requires the sample regression function to fit this set of values ​​as well as possible, that is, the straight line should be in the center of the sample data as much as possible. Therefore, the criterion for selecting the best fitting curve can be determined as: minimizing the total fitting error (ie, the total residual).

      The implementation code is as follows, and the code has been commented in detail:

copy code
## least squares
import numpy as np ##Scientific computing library
import scipy as sp ##Part of the algorithm library implemented on the basis of numpy
import matplotlib.pyplot as plt ##plotting library
from scipy.optimize import leastsq ##Introduce the least squares algorithm

'''
     Set sample data, real data needs to be processed here
'''
##Sample data (Xi, Yi), need to be converted into array (list) form
Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2])
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3])

'''
    Set Fit and Bias Functions
    The shape determination process of the function:
    1. Draw the sample image first
    2. Determine the function form (straight line, parabola, sine cosine, etc.) according to the approximate shape of the sample image
'''

##The function that needs to be fitted func: specify the shape of the function
def func(p,x):
    k,b=p
    return k*x+b

##Deviation function: x, y are lists: here x, y are in one-to-one correspondence in Xi and Yi above
def error(p,x,y):
    return func(p,x)-y

'''
    Main part: with part description
    1. The return value tuple of the leastsq function, the first element is the result of the solution, and the second is the cost value of the solution (personal understanding)
    2. The original words of the official website (the second value): Value of the cost function at the solution
    3. Example: Para=>(array([ 0.61349535, 1.79409255]), 3)
    4. The number of the first value in the return value tuple is the same as the number of parameters to be solved
'''

The initial value of #k,b can be set arbitrarily. After several experiments, it is found that the value of p0 will affect the value of cost: Para[1]
p0 = [1.20]

#Pack parameters other than p0 in the error function into args (use requirements)
Para=leastsq(error,p0,args=(Xi,Yi))

#read result
k,b=To[0]
print("k=",k,"b=",b)
print("cost:"+str(Para[1]))
print("The fitted straight line solved is: ")
print("y="+str(round(k,2))+"x+"+str(round(b,2)))

'''
   Plot to see the fitting effect.
   matplotlib does not support Chinese by default, if the label is set to Chinese, it needs to be set separately
   If you report an error, you can change it to English
'''

# draw sample points
plt.figure(figsize=(8,6)) ##Specify the image ratio: 8:6
plt.scatter(Xi,Yi,color="green",label="sample data",linewidth=2)

# draw the fitted line
x=np.linspace(0,12,100) ##Draw 100 consecutive points directly at 0-15
y=k*x+b ##Functional formula
plt.plot(x,y,color="red",label="fitted line",linewidth=2)
plt.legend(loc='lower right') #Draw legend
plt.show()
copy code

  The result looks like this: 

  Output result:

      k= 0.900458420439 b= 0.831055638877
      cost:
      The fitted straight line solved by 1 is:
      y=0.9x+0.83

  Plot result:

      

 

 

    Supplementary note: The case of a straight line is simply enumerated. The solution of the curve is similar ( parabola is exemplified in another blog post ), but the curve will have overfitting, which will be mentioned in a future blog.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325386500&siteId=291194637