Python Machine Learning Quick Start Series 4: Linear Regression

The machine learning materials found on the Internet are often difficult for novices. They often dump you a bunch of formulas and equations like alien texts when they come up, and then ruthlessly extinguish your enthusiasm for learning, making you think that is it right? It's time to go back to study mathematics and then be a good man. The author has also been abused many times, and I have a little understanding. Looking back, I found that the introduction of machine learning could have been less difficult. Python Machine Learning Quick Start series of articles, hoping to bring everyone into the world of Machine Learning in an easy-to-understand, popular and interesting way.
 
1. Guessing game
 
Before officially entering linear regression, let's play a guessing game:
 
x= 1 ,         3,      8,        9,        15       16
y= 2, 6.3, 15, 21, ? ?
 
According to the relationship between X and Y, we guess that x=15, y may correspond to about 30, x=16, y may correspond to about 32
But we will guess 100, or other larger numbers, because Y is about 2 times that of X judged by the naked eye, although there is a little error, but the overall direction should be correct.
 
y=2*x
 
This is a simple linear regression, you find a coefficient of 2, and for any target value you want to predict, you multiply the input value by the coefficient of 2, and you get the predicted result.
 
 
2. Upgrade the guessing game
 
Now we increase the difficulty of guessing the number, and replace the x group of data with a two-dimensional array, that is, x=[0.5, 1] ​​corresponds to y is 1, find the value of y corresponding to x=[6,15] 
 
        0.5       1.4     3.7      4         6         6.1
x=   1 ,         3,       8,        9,        15       16
        
y= 2, 6.3, 15, 21, ? ?
 
This is a typical linear regression problem, we will not answer it for now, I believe it will be answered later.
 
 
3. Linear Regression Mathematical Definition
 
Find a straight line in the lattice, and calculate the distances from all points to this straight line. If the sum of the distances is the shortest, then this straight line is the desired result. For the y value that needs to be predicted, find the y value corresponding to x on the straight line, y=x * w is the predicted result (w is the straight line)


 
 
Therefore, if you need to solve the guessing game, you must first find the value of w.
 
In this scenario, to calculate the value of w, the predecessors have deduced the formula:
 
Assuming that x and y are both matrices, then this coefficient w is called the regression coefficient, and its value is:
 
w = (x T *  x) -1  * x * y
 
As for why the formula is written like this, we don't need to worry about it, just like calculating the area of ​​a circle is (  π r ), we don't need to derive it again.
 
The popular translation is: (the following paragraph can't understand, and quickly review the series 2)
  • Transpose of x matrix multiplied by x matrix
  • take the inverse of the result of this new matrix
  • Inverse matrix multiplied by transpose of x matrix
  • Finally multiply by the y matrix.
 
 
4. Python implementation of regression coefficients
This pile of mathematical concepts looks super complicated, but implementing it in python is so easy!
 
import numpy as np
 
#xMatrix
xMatrix=np.mat([            
   [0.5, 1 ],
   [1.4 3 ],
   [3.7, 8 ],
   [4, 9 ] 
] )          

#yMatrix
yMatirx=np.mat([
   [2 ],
   [6.3],
   [ 15 ],
   [21 ]
])

#implement the formula
w=(xMatrix.T * xMatrix).I * (xMatrix.T * yMatrix)
 
 
5. Prediction using regression systems
 
Arrange the values ​​to be predicted into a matrix
newxMatrix = e.g. mat ([
   [6,    15   ],
   [6.1  16   ] 
])
 
#Calculate with regression coefficients
predictYMatrix=newxMatrix * w
 
print predictYMatrix
 
Get the result: 47.71369295 , 56.79128631 are the 2 guessed y values
 
 
6. Plot to view data distribution and forecast results
 
import matplotlib.pyplot as plt
 
newxMatrix = e.g. mat ([
   [0.5, 1],
   [1.4, 3],
   [3.7, 8],
   [4, 9],
   [6, 15],
   [6.1, 16]
])
 
#Red is the predicted value
newyMatrix = e.g. mat ([
   [2],
   [6.3],
   [15],
   [21],
   [47.7 ],                
   [56.7 ],
])
 
#Convert newxMatrix into a one-dimensional array, divide each value by the average of the column (convenient for drawing performance, no other meaning)
xMeanMatrix= (newxMatrix[:,0]/newxMatrix[:,0].mean() + newxMatrix[:,1]/newxMatrix[:,1].mean())
 
plt.figure() #Create a chart
x=xMeanMatrix[:,0].flatten().A[0]
y=newyMatrix[:,0].flatten().A[0]
plt.scatter(x,y) #Draw a point
plt.plot(x,y) #Draw a line   
 


 
(The figure is not a straight line, because the two-dimensional matrix is ​​merged into one-dimensional for drawing)
 
 
7. SKLearn brainless implementation of linear regression
 
sklearn is a very good package for machine learning, so good that users do not need to know any algorithms, algorithm characteristics, formulas, etc. Get started with direct prediction:
 
step:
  • Input x, y data to train
  • Input newX, output predicted newY value
 
import sklearn
 
#xMatrix
xMatrix=np.mat([            
   [0.5, 1 ],
   [1.4 3 ],
   [3.7, 8 ],
   [4, 9 ] 
] )          

#yMatrix
yMatirx=np.mat([
   [2 ],
   [6.3],
   [ 15 ],
   [21 ]
])
 
#predict the input xp matrix
xpMatrix=np.mat([
   [6   ,  15   ],
   [6.1 ,  16   ]
])
classifier=LinearRegression()
classifier.fit(xMatrix,yMatrix)

yPredict=classifier.predict(xpMatrix)
print yPredict
 
The calculated values ​​are 51.66894737 and 62.92736842. Although there are some differences with our own predicted values ​​of 47 and 56, the general direction is basically the same, which can also be regarded as mutual verification.
 
 
 
 
 
 
 
 
 
 
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326359092&siteId=291194637
Recommended