Wu Enda Machine Learning 2022-Jupyter-Linear Regression with scikitlearn

1 Optional Experiment: Linear Regression Using Scikit-Learn

There is an open source, commercially available machine learning toolkit called scikit-learn. This toolkit contains implementations of many of the algorithms you will use in this course.

1.1 Tools

You'll take advantage of scikit-learn as well as functions from matplotlib and NumPy.

2 Linear regression closed-form solution

Scikit-learn's linear regression model implements a closed-form linear regression.

Let's use the early lab numbers - a 1,000 sq. ft. house sells for $300,000, a 2,000 sq. ft. house sells for $500,000.

2.1 Load the dataset

import numpy as np
np.set_printoptions(precision=2)
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.preprocessing import StandardScaler
from lab_utils_multi import  load_house_data
import matplotlib.pyplot as plt
dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0'; 
plt.style.use('./deeplearning.mplstyle')

X_train = np.array([1.0, 2.0])   #features
y_train = np.array([300, 500])   #target value

2.2 Create and fit the model

The code below performs regression using scikit-learn.

The second step utilizes fit, one of the methods associated with the object. This will perform regression, fitting parameters to the input data. The toolkit requires a 2D X matrix.

linear_model = LinearRegression()
#X must be a 2-D Matrix
linear_model.fit(X_train.reshape(-1, 1), y_train) 

output:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

2.3 Parameter view

The W and b parameters are called "coefficients" and "intercepts" in scikit-learn.

b = linear_model.intercept_
w = linear_model.coef_
print(f"w = {w:}, b = {b:0.2f}")
print(f"'manual' prediction: f_wb = wx+b : {1200*w + b}")

output:

w = [200.], b = 100.00
'manual' prediction: f_wb = wx+b : [240100.]

2.4 Prediction

Call the predict function to generate predictions.

y_pred = linear_model.predict(X_train.reshape(-1, 1))

print("Prediction on training set:", y_pred)

X_test = np.array([[1200]])
print(f"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}")

output:

Prediction on training set: [300. 500.]
Prediction for 1200 sqft house: $240100.00

3 another instance

The second example comes from an early lab with multiple characteristics. The final parameter values ​​and predicted results were very close to those of that lab's unnormalized "long run". That denormalized run took several hours to produce results, whereas it was almost instantaneous. Closed-form solutions work well on such small datasets, but can be computationally demanding on large datasets.

Closed solutions do not require normalization.

# load the dataset
X_train, y_train = load_house_data()
X_features = ['size(sqft)','bedrooms','floors','age']

linear_model = LinearRegression()
linear_model.fit(X_train, y_train) 

b = linear_model.intercept_
w = linear_model.coef_
print(f"w = {w:}, b = {b:0.2f}")

print(f"Prediction on training set:\n {linear_model.predict(X_train)[:4]}" )
print(f"prediction using w,b:\n {(X_train @ w + b)[:4]}")
print(f"Target values \n {y_train[:4]}")

x_house = np.array([1200, 3,1, 40]).reshape(-1,4)
x_house_predict = linear_model.predict(x_house)[0]
print(f" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.2f}")

output:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
w = [  0.27 -32.62 -67.25  -1.47], b = 220.42
Prediction on training set:
 [295.18 485.98 389.52 492.15]
prediction using w,b:
 [295.18 485.98 389.52 492.15]
Target values 
 [300.  509.8 394.  540. ]
 predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = $318709.09

Guess you like

Origin blog.csdn.net/qq_45605440/article/details/131659526