The machine learning materials found on the Internet are often difficult for novices. They often dump you a bunch of formulas and equations like alien texts when they come up, and then ruthlessly extinguish your enthusiasm for learning, making you think that is it right? It's time to go back to study mathematics and then be a good man. The author has also been abused many times, and I have a little understanding. Looking back, I found that the introduction of machine learning could have been less difficult. Python Machine Learning Quick Start series of articles, hoping to bring everyone into the world of Machine Learning in an easy-to-understand, popular and interesting way.
1. Guessing game
x= 1 , 3, 8, 9, 15 16
y= 2, 6.3, 15, 21, ? ?
According to the relationship between X and Y, we guess that x=15, y may correspond to about 30, x=16, y may correspond to about 32
But we will guess 100, or other larger numbers, because Y is about 2 times that of X judged by the naked eye, although there is a little error, but the overall direction should be correct.
y=2*x
This is a simple linear regression, you find a coefficient of 2, and for any target value you want to predict, you multiply the input value by the coefficient of 2, and you get the predicted result.
2. Upgrade the guessing game
Now we increase the difficulty of guessing the number, and replace the x group of data with a two-dimensional array, that is, x=[0.5, 1] corresponds to y is 1, find the value of y corresponding to x=[6,15]
0.5 1.4 3.7 4 6 6.1
x= 1 , 3, 8, 9, 15 16
y= 2, 6.3, 15, 21, ? ?
This is a typical linear regression problem, we will not answer it for now, I believe it will be answered later.
3. Linear Regression Mathematical Definition
Find a straight line in the lattice, and calculate the distances from all points to this straight line. If the sum of the distances is the shortest, then this straight line is the desired result. For the y value that needs to be predicted, find the y value corresponding to x on the straight line, y=x * w is the predicted result (w is the straight line)
Therefore, if you need to solve the guessing game, you must first find the value of w.
In this scenario, to calculate the value of w, the predecessors have deduced the formula:
Assuming that x and y are both matrices, then this coefficient w is called the regression coefficient, and its value is:
w = (x T * x) -1 * x T * y
As for why the formula is written like this, we don't need to worry about it, just like calculating the area of a circle is (
π
r 2
), we don't need to derive it again.
The popular translation is: (the following paragraph can't understand, and quickly review the series 2)
- Transpose of x matrix multiplied by x matrix
- take the inverse of the result of this new matrix
- Inverse matrix multiplied by transpose of x matrix
- Finally multiply by the y matrix.
4. Python implementation of regression coefficients
This pile of mathematical concepts looks super complicated, but implementing it in python is so easy!
import numpy as np
#xMatrix
xMatrix=np.mat([
[0.5, 1 ],
[1.4 3 ],
[3.7, 8 ],
[4, 9 ]
] )
#yMatrix
yMatirx=np.mat([
[2 ],
[6.3],
[ 15 ],
[21 ]
])
[0.5, 1 ],
[1.4 3 ],
[3.7, 8 ],
[4, 9 ]
] )
#yMatrix
yMatirx=np.mat([
[2 ],
[6.3],
[ 15 ],
[21 ]
])
#implement the formula
w=(xMatrix.T * xMatrix).I * (xMatrix.T * yMatrix)
5. Prediction using regression systems
Arrange the values to be predicted into a matrix
newxMatrix = e.g. mat ([
[6, 15 ],
[6.1 16 ]
[6.1 16 ]
])
#Calculate with regression coefficients
predictYMatrix=newxMatrix * w
print predictYMatrix
Get the result: 47.71369295 , 56.79128631 are the 2 guessed y values
6. Plot to view data distribution and forecast results
import matplotlib.pyplot as plt
newxMatrix = e.g. mat ([
[0.5, 1],
[1.4, 3],
[3.7, 8],
[4, 9],
[6, 15],
[6.1, 16]
])
[0.5, 1],
[1.4, 3],
[3.7, 8],
[4, 9],
[6, 15],
[6.1, 16]
])
#Red is the predicted value
newyMatrix = e.g. mat ([
[2],
[6.3],
[15],
[21],
[2],
[6.3],
[15],
[21],
[47.7 ],
[56.7 ],
[56.7 ],
])
#Convert newxMatrix into a one-dimensional array, divide each value by the average of the column (convenient for drawing performance, no other meaning)
xMeanMatrix= (newxMatrix[:,0]/newxMatrix[:,0].mean() + newxMatrix[:,1]/newxMatrix[:,1].mean())
plt.figure() #Create a chart
x=xMeanMatrix[:,0].flatten().A[0]
y=newyMatrix[:,0].flatten().A[0]
plt.scatter(x,y) #Draw a point
plt.plot(x,y) #Draw a line
x=xMeanMatrix[:,0].flatten().A[0]
y=newyMatrix[:,0].flatten().A[0]
plt.scatter(x,y) #Draw a point
plt.plot(x,y) #Draw a line
(The figure is not a straight line, because the two-dimensional matrix is merged into one-dimensional for drawing)
7. SKLearn brainless implementation of linear regression
sklearn is a very good package for machine learning, so good that users do not need to know any algorithms, algorithm characteristics, formulas, etc. Get started with direct prediction:
step:
- Input x, y data to train
- Input newX, output predicted newY value
import sklearn
#xMatrix
xMatrix=np.mat([
[0.5, 1 ],
[1.4 3 ],
[3.7, 8 ],
[4, 9 ]
] )
#yMatrix
yMatirx=np.mat([
[2 ],
[6.3],
[ 15 ],
[21 ]
])
[0.5, 1 ],
[1.4 3 ],
[3.7, 8 ],
[4, 9 ]
] )
#yMatrix
yMatirx=np.mat([
[2 ],
[6.3],
[ 15 ],
[21 ]
])
#predict the input xp matrix
xpMatrix=np.mat([
[6 , 15 ],
[6.1 , 16 ]
])
classifier=LinearRegression()
classifier.fit(xMatrix,yMatrix)
yPredict=classifier.predict(xpMatrix)
print yPredict
[6 , 15 ],
[6.1 , 16 ]
])
classifier=LinearRegression()
classifier.fit(xMatrix,yMatrix)
yPredict=classifier.predict(xpMatrix)
print yPredict
The calculated values are 51.66894737 and 62.92736842. Although there are some differences with our own predicted values of 47 and 56, the general direction is basically the same, which can also be regarded as mutual verification.