1 Optional Experiment: Linear Regression Using Scikit-Learn

There is an open source, commercially available machine learning toolkit called scikit-learn. This toolkit contains implementations of many of the algorithms you will use in this course.

1.1 Tools

You'll take advantage of scikit-learn as well as functions from matplotlib and NumPy.

2 Linear regression closed-form solution

Scikit-learn's linear regression model implements a closed-form linear regression.

Let's use the early lab numbers - a 1,000 sq. ft. house sells for $300,000, a 2,000 sq. ft. house sells for $500,000.

2.1 Load the dataset

import numpy as np
np.set_printoptions(precision=2)
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.preprocessing import StandardScaler
from lab_utils_multi import  load_house_data
import matplotlib.pyplot as plt
dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0'; 
plt.style.use('./deeplearning.mplstyle')

X_train = np.array([1.0, 2.0])   #features
y_train = np.array([300, 500])   #target value

2.2 Create and fit the model

The code below performs regression using scikit-learn.

The second step utilizes fit, one of the methods associated with the object. This will perform regression, fitting parameters to the input data. The toolkit requires a 2D X matrix.

linear_model = LinearRegression()
#X must be a 2-D Matrix
linear_model.fit(X_train.reshape(-1, 1), y_train)

output:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

2.3 Parameter view

The W and b parameters are called "coefficients" and "intercepts" in scikit-learn.

b = linear_model.intercept_
w = linear_model.coef_
print(f"w = {w:}, b = {b:0.2f}")
print(f"'manual' prediction: f_wb = wx+b : {1200*w + b}")

output:

w = [200.], b = 100.00
'manual' prediction: f_wb = wx+b : [240100.]

2.4 Prediction

Call the predict function to generate predictions.

y_pred = linear_model.predict(X_train.reshape(-1, 1))

print("Prediction on training set:", y_pred)

X_test = np.array([[1200]])
print(f"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}")

output:

Prediction on training set: [300. 500.]
Prediction for 1200 sqft house: $240100.00

3 another instance

The second example comes from an early lab with multiple characteristics. The final parameter values and predicted results were very close to those of that lab's unnormalized "long run". That denormalized run took several hours to produce results, whereas it was almost instantaneous. Closed-form solutions work well on such small datasets, but can be computationally demanding on large datasets.

Closed solutions do not require normalization.

# load the dataset
X_train, y_train = load_house_data()
X_features = ['size(sqft)','bedrooms','floors','age']

linear_model = LinearRegression()
linear_model.fit(X_train, y_train) 

b = linear_model.intercept_
w = linear_model.coef_
print(f"w = {w:}, b = {b:0.2f}")

print(f"Prediction on training set:\n {linear_model.predict(X_train)[:4]}" )
print(f"prediction using w,b:\n {(X_train @ w + b)[:4]}")
print(f"Target values \n {y_train[:4]}")

x_house = np.array([1200, 3,1, 40]).reshape(-1,4)
x_house_predict = linear_model.predict(x_house)[0]
print(f" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.2f}")

output:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

w = [  0.27 -32.62 -67.25  -1.47], b = 220.42

Prediction on training set:
 [295.18 485.98 389.52 492.15]
prediction using w,b:
 [295.18 485.98 389.52 492.15]
Target values 
 [300.  509.8 394.  540. ]
 predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = $318709.09

Wu Enda Machine Learning 2022-Jupyter-Linear Regression with scikitlearn

1 Optional Experiment: Linear Regression Using Scikit-Learn

1.1 Tools

2 Linear regression closed-form solution

2.1 Load the dataset

2.2 Create and fit the model

2.3 Parameter view

2.4 Prediction

3 another instance

Guess you like