Wu Enda machine learning homework--ex1

Univariate linear regression

Question: There are two columns of data in ex1data1.txt. The first column is the population of the city, and the second column is the profit of the city ’s snack bar. The purpose is to predict the profit of the snack bar based on the population of the city.

Formulas needed

achieve

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

numpy is used to perform matrix operations, pandas is often used to perform data operations, matplotlib is mainly used to visualize

Read the data set

path= "D:\\ML\\ML\\Linear Regression with One Variable\\ex1data1.txt"
data = pd.read_csv(path,header=None,names=['Population','Profit'])#文件读取
data.head()#显示数据
data.plot(kind='scatter',x='Population',y='Profit',figsize=(12,8))
plt.show()

Calculate the loss function J (Ѳ)

def computeCost(X,y,theta):
    inner = np.power(((X*theta.T)-y),2)
    return np.sum(inner)/(2*len(X))

Processing the data set

data.insert (0, ' Ones ' , 1) # At 0 Insert a column name Ones 
# initialize variables X and Y 
# data.shape output (the number of rows, columns), the Shape [1] it is columns 
cols = data.shape [. 1] # calculates the number of columns 
# [start line: end line, starting column: end columns] 
X-data.iloc = [:,: -. 1] # except the last one before the closed iloc after opening, starting from 0, extracting specified row and column 
Y = data.iloc [:, cols. 1-: cols] # last one 

X- = np.matrix (X.values) 
Y = np.matrix (y.values) 
Theta = np.matrix (np.array ([0,0])) # array to matrix, depending on the difference between array and matrix

Gradient descent algorithm

def gradientDescent (X, y, theta, alpha, iters): 
    temp = np.matrix (np.zeros (theta.shape)) 
    parameters = int (theta.ravel (). shape [1]) # ravel flat operation changes In one dimension, calculate the number of parameters 
    cost = np.zeros (iters)
     for i in range (iters): 
        error = (X * theta.T) -y
         for j in range (parameters): 
            term = np.multiply ( error, X [:, j]) 
            temp [0, j] = theta [0, j]-((alpha / len (X)) * np.sum (term)) 
        theta = temp 
        cost [i] = computeCost(X,y,theta)
    return theta,cost

Set the learning rate and number of iterations and execute

alpha = 0.01
iters = 1500
g,cost = gradientDescent(X,y,theta,alpha,iters)

Visualization

#可视化
x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = g[0, 0] + (g[0, 1] * x)
fig, ax = plt.subplots(figsize=(12,8))#拟合曲线
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()
plt.plot(range(0,1500),cost)
plt.show()

pandas:

pd.read_csv (filename): import data from a CSV file

data.head (n) displays the first 5 lines of data by default

data.plot(kind='scatter',x='Population',y='Profit',figsize=(12,8))画图

data.shape () gets the number of rows and columns of data, which is a tuple

data.shape [1] gets the number of data columns

data.iloc [:,:-1] Get part of the data of the data, the meaning is [start line: end line, start column: end column], the data contained is closed before opening

data.insert (0, 'Ones', 1) insert in the 0th column, the value is 1 

numpy:

np.power (data, 2) squares the elements in the matrix

np.sum (data) sums the elements in the matrix

np.matrix (X.values) is converted into a matrix

data.ravel () to flatten the data

np.multiply corresponds to position multiplication, * is matrix multiplication

x = np.linspace (data.Population.min (), data.Population.max (), 100) generates an equally spaced sequence of 100 elements. The first two parameters are the beginning and end of the sequence.

Guess you like

Origin www.cnblogs.com/Sunqingyi/p/12712442.html