Simple Linear Regression for Machine Learning

First, import the standard library

In [36]:
# Importing the libraries import library 
import  numpy  as  np 
import  matplotlib.pyplot  as  plt 
import  pandas  as  pd 
# Enable image adjustment 
% matplotlib notebook 
 #Chinese font display   
plt . rc ( 'font' ,  family = 'SimHei' ,  size = 8 )

2. Import data

In [3]:
dataset = pd.read_csv('Salary_Data.csv')
dataset
Out[3]:
  YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
5 2.9 56642.0
6 3.0 60150.0
7 3.2 54445.0
8 3.2 64445.0
9 3.7 57189.0
10 3.9 63218.0
11 4.0 55794.0
12 4.0 56957.0
13 4.1 57081.0
14 4.5 61111.0
15 4.9 67938.0
16 5.1 66029.0
17 5.3 83088.0
18 5.9 81363.0
19 6.0 93940.0
20 6.8 91738.0
21 7.1 98273.0
22 7.9 101302.0
23 8.2 113812.0
24 8.7 109431.0
25 9.0 105582.0
26 9.5 116969.0
27 9.6 112635.0
28 10.3 122391.0
29 10.5 121872.0
In [5]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
X
Out[5]:
array([[  1.1],
       [  1.3],
       [  1.5],
       [  2. ],
       [  2.2],
       [  2.9],
       [  3. ],
       [  3.2],
       [  3.2],
       [  3.7],
       [  3.9],
       [  4. ],
       [  4. ],
       [  4.1],
       [  4.5],
       [  4.9],
       [  5.1],
       [  5.3],
       [  5.9],
       [  6. ],
       [  6.8],
       [  7.1],
       [  7.9],
       [  8.2],
       [  8.7],
       [  9. ],
       [  9.5],
       [  9.6],
       [ 10.3],
       [ 10.5]])
In [6]:
y 
Out[6]:
array([  39343.,   46205.,   37731.,   43525.,   39891.,   56642.,
         60150.,   54445.,   64445.,   57189.,   63218.,   55794.,
         56957.,   57081.,   61111.,   67938.,   66029.,   83088.,
         81363.,   93940.,   91738.,   98273.,  101302.,  113812.,
        109431.,  105582.,  116969.,  112635.,  122391.,  121872.])

三、区分训练集和测试集

In [28]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

四、用简单线性回归训练

In [29]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Out[29]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [30]:
y_pred = regressor.predict(X_test)

五、画图比较

In [31]:
# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red') # 训练的点
plt.plot(X_train, regressor.predict(X_train), color = 'blue') # 训练和训练的结果所画的线
plt.title(u'薪水和工作经验(训练集)')
plt.xlabel(u'经验')
plt.ylabel(u'薪水')
plt.show()
In [32]:
plt.scatter(X_test, y_test, color = 'red')# 测试的点
plt.plot(X_train, regressor.predict(X_train), color = 'blue') # 训练和训练的结果所画的线
plt.title(u'薪水和工作经验(测试集)')
plt.xlabel(u'经验')
plt.ylabel(u'薪水')
plt.show()

六、项目地址

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325723045&siteId=291194637