Machine Learning - Linear Regression for Univariate

1 Model Representation

1. Housing price prediction training set

Size in feet²(x)	Price ($) in 1000’s(y)
2104	460
1416	232
1534	315
852	178
…	…

In the training set of housing price prediction, the input and output results are given at the same time, that is, the "correct results" marked by humans are given , and the predicted amount is continuous, which belongs to the regression problem in supervised learning.

2. Problem Solving Model

2 Cost Function

3 Cost Function - Intuition 1 (Cost Function - Intuition I)

4 Cost Function - Intuition 2 (Cost Function - Intuition II)

5 Gradient Descent

6 Gradient Descent Intuition

Finally, gradient descent can be used not only for the cost function in linear regression, but also for minimizing other cost functions.

7 Gradient Descent For Linear Regression

In addition, using loop solving, the code is more redundant, and we will talk about how to use **Vectorization** to simplify the code and optimize the calculation, so that the gradient descent runs faster and better.

8 Code Implementation

The whole part of 2 needs to predict the profit of opening a snack bar based on the population of the city. The
data is in ex1data1.txt. The first column is the population of the city, and the second column is the profit of the snack bar in the city.

8.1 Plotting the Data

Read in the data, then display the data

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:

path =  '../ex1data1.txt'
data = pd.read_csv(path, header=None, names=['Population', 'Profit'])
data.head()

Out [2]:

In [3]:

data.plot(kind='scatter', x='Population', y='Profit', figsize=(12,8))
plt.show()

8.2 Gradient descent

In this part, you need to train the parameters θ of linear regression on the existing data set

8.2.1 Official

#这个部分计算J(Ѳ)，X是矩阵
def computeCost(X, y, theta):
    inner = np.power(((X * theta.T) - y), 2)
    return np.sum(inner) / (2 * len(X))
#调用
computeCost(X, y, theta)

8.2.2 Implementation

In [4]:

data.insert(0, 'Ones', 1)

Now let's do some variable initialization.

In [5]:

# 初始化X和y
cols = data.shape[1]
X = data.iloc[:,:-1]#X是data里的除最后列
y = data.iloc[:,cols-1:cols]#y是data最后一列

Observe if X (training set) and y (target variable) are correct.

In [6]:

X.head()#head()是观察前5行

Out [6]:

In [7]:

y.head()

Out [7]:

The cost function is supposed to be a numpy matrix, so we need to transform X and Y before we can use them. We also need to initialize theta.

In [8]:

X = np.matrix(X.values)
y = np.matrix(y.values)
theta = np.matrix(np.array([0,0]))

In [9]:

X.shape, theta.shape, y.shape

Out [9]:

8.2.3 Computing J(θ)

Calculate the cost function (theta initial value is 0), the answer should be 32.07

In [10]:

def computeCost(X, y, theta):
    inner = np.power(((X * theta.T) - y), 2)
    return np.sum(inner) / (2 * len(X))
#这个部分计算J(Ѳ)，X是矩阵
computeCost(X, y, theta)

Out [10]:

32.072733877455676

8.2.4 Gradient descent

In [11]:

def gradientDescent(X, y, theta, alpha, iters):
    temp = np.matrix(np.zeros(theta.shape))
    parameters = int(theta.ravel().shape[1])
    cost = np.zeros(iters)
    
    for i in range(iters):
        error = (X * theta.T) - y
        
        for j in range(parameters):
            term = np.multiply(error, X[:,j])
            temp[0,j] = theta[0,j] - ((alpha / len(X)) * np.sum(term))
            
        theta = temp
        cost[i] = computeCost(X, y, theta)
        
    return theta, cost
#这个部分实现了Ѳ的更新

Initialize some additional variables - the learning rate α and the number of iterations to perform, already mentioned in 2.2.2.

In [12]:

alpha = 0.01
iters = 1500

Now let's run the gradient descent algorithm to fit our parameter θ to the training set.

In [13]:

g, cost = gradientDescent(X, y, theta, alpha, iters)
g

Out [13]:

matrix([[-3.63029144, 1.16636235]])

In [14]:

predict1 = [1,3.5]*g.T
print("predict1:",predict1)
predict2 = [1,7]*g.T
print("predict2:",predict2)
#预测35000和70000城市规模的小吃摊利润

predict1: [[0.45197679]]

predict2: [[4.53424501]]

In [15]:

x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = g[0, 0] + (g[0, 1] * x)

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()
#原始数据以及拟合的直线

8.3 Visualizing J(θ)

It won't be reproduced with python, take a screenshot to mean