Least Squares Method for Machine Learning

Basic concepts of least squares method

The least squares method is a mathematical optimization method that mainly uses the concept of minimum error to find the best function.

Suppose there are n pieces of data, as shown below:

(x1,y1),(x2,y2),...(xn,yn)

Now find the following linear function:

y=f(x)=ax+b

Minimize the error:

\varepsilon =(f(x1)-y1)^{2}+(f(x2)-y2)^{2}...(f(xn)-yn)^{2}

Simple enterprise example

omission

Machine learning to build linear equations with error values

A linear equation of one variable:

y=ax+b

The following linear equation of one variable can be established for each data point:

y=ax+b+\varepsilon

The square formula for the sum of three terms:

(a+b+c)^{2}=a^{2}+b^{2}+c^{2}+2ab+2bc+2ac

code show as below:

import matplotlib.pyplot as plt                                  
x = [x for x in range(0, 11)]                   
y = [7.5*y - 3.33 for y in x]
plt.axis([0, 4, 0, 25])
plt.plot(x, y)   
plt.plot(1, 5, '-o')
plt.plot(2, 10, '-o')
plt.plot(3, 20, '-o')
plt.xlabel('Times:unit=100')
plt.ylabel('Voucher:unit=100')
plt.grid()                              # 加网格线
plt.show()

The running results are as follows:

Numpy practices least squares method

Numpy has a polyfit() function, which can be used to calculate direct regression data. The usage of this function is as follows:

ployfit(x,y,deg)

The above deg is the highest power of the polynomial. If it is a linear polynomial, the value is 1.

Use the previous data and the plot() function of the numpy module to calculate the coefficients a and b of the regression direct y=ax+b.

code show as below:

import numpy as np

x = np.array([1, 2, 3])                 # 拜访次数, 单位是100
y = np.array([5, 10, 20])               # 销售考卷数, 单位是100

a, b = np.polyfit(x, y, 1)
print('斜率 a = {0:5.2f}'.format(a))
print('截距 a = {0:5.2f}'.format(b))

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
斜率 a =  7.50
截距 a = -3.33

[Done] exited with code=0 in 0.726 seconds

Plot the regression directly with all points.

code show as below:

import matplotlib.pyplot as plt                                  
import numpy as np

x = np.array([1, 2, 3])                 # 拜访次数, 单位是100
y = np.array([5, 10, 20])               # 销售考卷数, 单位是100

a, b = np.polyfit(x, y, 1)              # 回归直线
print('斜率 a = {0:5.2f}'.format(a))
print('截距 a = {0:5.2f}'.format(b))

y2 = a*x + b
plt.scatter(x, y)                       # 绘制散布图
plt.plot(x, y2)                         # 绘制回归直线
plt.show()    

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
斜率 a =  7.50
截距 a = -3.33

[Done] exited with code=0 in 18.874 seconds

linear regression

y=7.5x-3.33

x is called the independent variable, and y changes with x, so y is called the dependent variable. This type of relationship is then called a linear regression model.

Suppose you want to achieve sales of 2,500 test papers, calculate how many times you need to visit customers, and express it with a chart.

import matplotlib.pyplot as plt                                  
x = [x for x in range(0, 11)]                   
y = [7.5*y - 3.33 for y in x]
voucher = 25                            # unit = 100
ans_x = (25 + 3.33) / 7.5
print('拜访次数 = {}'.format(int(ans_x*100)))
plt.axis([0, 4, 0, 30])
plt.plot(x, y)   
plt.plot(1, 5, '-x')
plt.plot(2, 10, '-x')
plt.plot(3, 20, '-x')
plt.plot(ans_x, 25, '-o')
plt.text(ans_x-0.6, 25+0.2, '('+str(int(ans_x*100))+','+str(2500)+')')
plt.xlabel('Times:unit=100')
plt.ylabel('Voucher:unit=100')
plt.grid()                              # 加网格线
plt.show()             

operation result:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
拜访次数 = 377

[Done] exited with code=0 in 15.969 seconds

Example application

There is a convenience store that records weather temperature and beverage sales, as follows:

Temperature (unit: ℃) 22 26 23 28 27 32 30
Sales volume (unit: cup) 15 35 21 62 48 101 86

Use the above data to calculate the beverage sales when the temperature is 31°C, and mark this chart.

import matplotlib.pyplot as plt                                  
import numpy as np

x = np.array([22, 26, 23, 28, 27, 32, 30])      # 温度
y = np.array([15, 35, 21, 62, 48, 101, 86])     # 饮料销售数量

a, b = np.polyfit(x, y, 1)                      # 回归直线
print('斜率 a = {0:5.2f}'.format(a))
print('截距 a = {0:5.2f}'.format(b))

y2 = a*x + b
plt.scatter(x, y)                               # 绘制散布图
plt.plot(x, y2)                                 # 绘制回归直线

sold = a*31 + b
print('气温31度时的销量 = {}'.format(int(sold)))
plt.plot(31, int(sold), '-o') 
plt.show()         
                  

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
斜率 a =  8.89
截距 a = -186.30
气温31度时的销量 = 89

[Done] exited with code=0 in 12.715 seconds

 

Guess you like

Origin blog.csdn.net/DXB2021/article/details/127155569