Here we first look at the following formula:
I believe we see this formula is no stranger to high school math we come into contact with the statistics, we often use the above formula to calculate the predicted value, the above method is called simple linear regression least square method.
To do this, go back and review a special high school math. The least square method is in fact, find a straight line, making our training data set to this line and minimum distance. Reference specific formula is derived: Least Squares
With the above formula, we can be encapsulated by the formula, API form a binary linear regression algorithm
import numpy as np
class SimpleLinearRegression1:
def __init__(self):
self.a_ = None
self.b_ = None
def fit(self, x_train, y_train):
assert x_train.ndim == 1,\
"Simple LinearRegressor can only solve single feature training data"
assert len(x_train) == len(y_train), \
"the size of x_train must be equal to the size of y_train"
x_mean = np.mean(x_train)
y_mean = np.mean(y_train)
num = (x_train - x_mean).dot(y_train - y_mean)
d = (x_train - x_mean).dot(x_train - x_mean)
self.a_ = num / d
self.b_ = y_mean - self.a_ * x_mean
return self
def predict(self, x_predict):
assert x_predict.ndim == 1, \
"Simple Linear regressor can only solve single feature training data"
assert self.a_ is not None and self.b_ is not None, \
"must fit before predict!"
return np.array([self._predict(x) for x in x_predict])
def _predict(self, x_single):
return self.a_ * x_single + self.b
def __repr__(self):
return"SimpleLinearRegression1()"_
Good package our own linear regression algorithm, we can have a linear characteristic data to predict the binary. When the test data, we often use several indicators to evaluate the predictive effects of our predictions:
MAE (mean absolute error)
is easy to see, the absolute value of the average value even if the sum MAE is the actual value of the prediction that the test data value difference. We can calculate the predicted results and the actual results of the size of the deviation.
MSE (mean square error)
Mean square error of the mean square of the absolute value of the actual fact, is the predicted value of the difference value and the test data summation.
R ^ 2 (coefficient of determination) (best results)
through the following estimate the estimated, R ^ 2 may be expressed as (1-MAE / Var (variance))
where I see a teaching video, Boston employed as prediction data rates ( we can also visit the following Web site, I see instructional videos, talking about feeling good)
Machine Learning Python application
Code Display
import matplotlib.pyplot as plt
from sklearn import *
from Simple_LR_class import SimpleLinearRegression1
boston = datasets.load_boston()
x = boston.data[:, 5]
y = boston.target
x = x[y < 50.0]
y = y[y < 50.0]
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y)
lr_clf = SimpleLinearRegression1()
lr_clf.fit(x_train, y_train)
y_predict = lr_clf.predict(x_test)
plt.scatter(x_train, y_train)
plt.plot(x_train, lr_clf.predict(x_train), color='r')
plt.plot(x_test, lr_clf.predict(x_test), color='g')
MAE = metrics.mean_absolute_error(y_test, y_predict)
MSE = metrics.mean_squared_error(y_test, y_predict)
R_square = metrics.r2_score(y_test, y_predict)
print(MSE)
print(MAE)
print(R_square)
plt.show()
We can see the effect of fitting a straight line and the predicted results:
Readers want to be helpful, like it can look at my public, then I will study notes made in the above, we will be able to learn together!