Programming assignments 1.1 - sklearn machine learning algorithm Series LinearRegression linear regression

Knowledge Point

  • scikit-learn for linear regression provides a relatively large number of class libraries that can be used for linear regression analysis.
  • We can also use the linear regression functions scikit-learn, rather than starting from scratch implementation of these algorithms. We'll linear regression algorithm applies scikit-learn programming assignments 1.1 data and look at its performance.
  • In general, as long as the data that there is a linear relationship, LinearRegression class is our first choice. If you find a good fit or predict, and then consider other linear regression library. If learning is linear regression, the first step in the study of this class began to recommend start.
  • LinearRegression very simple to use, can be divided into two steps:
    1. Use fit (x_train, y_train) for the training set x, y training.
    2. Use predict (x_test) obtained estimator for training input x_test set of prediction. ((X_test) may be a test set, the prediction may be needed data)

process

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



# 导入数据
path = 'D:\BaiduNetdiskDownload\data_sets\ex1data1.txt'

# pd.read_csv 将 TXT 文件读入并转化为数据框形式
# names 添加列名
# header 用指定的行来作为标题(表头),若原来无标题则设为 none
# 用到 Pandas 里面的 head( ) 函数读取数据(只能读取前五行)
data = pd.read_csv(path,header=None,names=['Population','Profit'])
data.head()

    
# 在训练集中插入一列1(其实是x0=1),方便我们可以使用向量化的解决方案来计算代价和梯度。
data.insert(0, 'Ones', 1)


# set X(training set), y(target variable)
# 设置训练集X,和目标变量y的值
cols = data.shape[1] # 获取列数
X = data.iloc[:,0:cols-1] # 输入向量X为前cols-1列
y = data.iloc[:,cols-1:cols] # 目标变量y为最后一列



# 代价函数是应该是 numpy 矩阵,所以我们需要转换X和Y,然后才能使用它们。 我们还需要初始化 theta 。
X = np.array(X.values)
y = np.array(y.values)
theta = np.array([0,0])

Core code:

from sklearn import linear_model

# 需要导入LinearRegression类,并将之实例化,并采用fit()方法已验证这些训练数据。
model = linear_model.LinearRegression()
model.fit(X, y) # fit(X, y)对训练集X, y进行训练

scikit-learn model predicting the performance of:

x = np.array(X[:, 1])
f = model.predict(X).flatten() # .flatten() 默认按行的方向降维

fig, ax = plt.subplots(figsize=(8,5))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()

Reference material

python_sklearn machine learning algorithm Series LinearRegression linear regression

Andrew Ng job machine learning Python implementation (a): Linear regression

scikit-learn linear regression algorithm library Summary

Guess you like

Origin www.cnblogs.com/yangdd/p/12305875.html
Recommended