Basic concepts of least squares method
The least squares method is a mathematical optimization method that mainly uses the concept of minimum error to find the best function.
Suppose there are n pieces of data, as shown below:
Now find the following linear function:
Minimize the error:
Simple enterprise example
omission
Machine learning to build linear equations with error values
A linear equation of one variable:
The following linear equation of one variable can be established for each data point:
The square formula for the sum of three terms:
code show as below:
import matplotlib.pyplot as plt
x = [x for x in range(0, 11)]
y = [7.5*y - 3.33 for y in x]
plt.axis([0, 4, 0, 25])
plt.plot(x, y)
plt.plot(1, 5, '-o')
plt.plot(2, 10, '-o')
plt.plot(3, 20, '-o')
plt.xlabel('Times:unit=100')
plt.ylabel('Voucher:unit=100')
plt.grid() # 加网格线
plt.show()
The running results are as follows:
Numpy practices least squares method
Numpy has a polyfit() function, which can be used to calculate direct regression data. The usage of this function is as follows:
ployfit(x,y,deg)
The above deg is the highest power of the polynomial. If it is a linear polynomial, the value is 1.
Use the previous data and the plot() function of the numpy module to calculate the coefficients a and b of the regression direct y=ax+b.
code show as below:
import numpy as np
x = np.array([1, 2, 3]) # 拜访次数, 单位是100
y = np.array([5, 10, 20]) # 销售考卷数, 单位是100
a, b = np.polyfit(x, y, 1)
print('斜率 a = {0:5.2f}'.format(a))
print('截距 a = {0:5.2f}'.format(b))
The running results are as follows:
[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
斜率 a = 7.50
截距 a = -3.33
[Done] exited with code=0 in 0.726 seconds
Plot the regression directly with all points.
code show as below:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3]) # 拜访次数, 单位是100
y = np.array([5, 10, 20]) # 销售考卷数, 单位是100
a, b = np.polyfit(x, y, 1) # 回归直线
print('斜率 a = {0:5.2f}'.format(a))
print('截距 a = {0:5.2f}'.format(b))
y2 = a*x + b
plt.scatter(x, y) # 绘制散布图
plt.plot(x, y2) # 绘制回归直线
plt.show()
The running results are as follows:
[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
斜率 a = 7.50
截距 a = -3.33
[Done] exited with code=0 in 18.874 seconds
linear regression
y=7.5x-3.33
x is called the independent variable, and y changes with x, so y is called the dependent variable. This type of relationship is then called a linear regression model.
Suppose you want to achieve sales of 2,500 test papers, calculate how many times you need to visit customers, and express it with a chart.
import matplotlib.pyplot as plt
x = [x for x in range(0, 11)]
y = [7.5*y - 3.33 for y in x]
voucher = 25 # unit = 100
ans_x = (25 + 3.33) / 7.5
print('拜访次数 = {}'.format(int(ans_x*100)))
plt.axis([0, 4, 0, 30])
plt.plot(x, y)
plt.plot(1, 5, '-x')
plt.plot(2, 10, '-x')
plt.plot(3, 20, '-x')
plt.plot(ans_x, 25, '-o')
plt.text(ans_x-0.6, 25+0.2, '('+str(int(ans_x*100))+','+str(2500)+')')
plt.xlabel('Times:unit=100')
plt.ylabel('Voucher:unit=100')
plt.grid() # 加网格线
plt.show()
operation result:
[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
拜访次数 = 377
[Done] exited with code=0 in 15.969 seconds
Example application
There is a convenience store that records weather temperature and beverage sales, as follows:
Temperature (unit: ℃) | 22 | 26 | 23 | 28 | 27 | 32 | 30 |
Sales volume (unit: cup) | 15 | 35 | 21 | 62 | 48 | 101 | 86 |
Use the above data to calculate the beverage sales when the temperature is 31°C, and mark this chart.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([22, 26, 23, 28, 27, 32, 30]) # 温度
y = np.array([15, 35, 21, 62, 48, 101, 86]) # 饮料销售数量
a, b = np.polyfit(x, y, 1) # 回归直线
print('斜率 a = {0:5.2f}'.format(a))
print('截距 a = {0:5.2f}'.format(b))
y2 = a*x + b
plt.scatter(x, y) # 绘制散布图
plt.plot(x, y2) # 绘制回归直线
sold = a*31 + b
print('气温31度时的销量 = {}'.format(int(sold)))
plt.plot(31, int(sold), '-o')
plt.show()
The running results are as follows:
[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
斜率 a = 8.89
截距 a = -186.30
气温31度时的销量 = 89
[Done] exited with code=0 in 12.715 seconds