[Machine Learning] Ng Enda Homework 3.0, python implements logistic regression handwriting multi-classification problem

3.0 Multiple logistic regression case: handwriting multi-classification problem

Recognize handwritten digits (from 0 to 9) using logistic regression andneural network. Logistic regression and applying it to one-vs-all classification.

Data: The data is stored in .mat format. The mat format is the data storage format of matlab. It is saved according to the matrix and is compatible with the numpy data format. It is suitable for various mathematical operations, so it is mainly usednumpy performs operations.

There are 5000 training examples in ex3data1, where each training example is a number of 20 pixels × 20 pixels grayscale image , each pixel is represented by a floating point number, which represents the grayscale intensity at that location. Each 20×20 pixel grid is expanded into a 400-dimensional vector. Each of these training examples becomes a row in the data matrix X. This results in a 5000×400 matrix X, in which each row is a training example of a handwritten digit image.

The second part of the training set is a 5000-dimensional vector y containing the training set labels, the numbers "0" are labeled "10", and the numbers "1" to "9" are labeled "1" to "9" in natural order ".

 one-vs_all

 

Replenish 

 Involving python syntax

1, a=np.insert(arr, obj, values, axis)

arr is the original array, which can be one or more, obj is the position of the inserted element, values ​​is the inserted content, and axis is inserted by row and column (0: row, 1: column).

2, a.flatten() fall

3, .shape[0]The number of rows of the matrix, .shape[1]The number of columns of the matrix

4, for i in range ():

range() is a function, for i in range () is to assign a value to i:

range(start, stop[, step]), which are the start, end and step length respectively, range(3) is: from 0 to 3, excluding 3, that is, 0,1,2

5 ,np.argmax()

Get the index corresponding to the maximum value of the element in the array (the index value starts from 0 by default)

For a two-dimensional matrix, a[0][1] will have two index directions. The first direction is a[0]. By default, the maximum search is in the column direction.

6,np.power(x, y) function, calculates the yth power of x.

7, from scipy.optimize import minimize built-in function, The parameters passed into it must be one-dimensional

 Code

 1Read data: sio.loadmat After reading mat, it is dict type

import numpy as np
import matplotlib.pyplot as plt
import scipy.io as sio

# 读入数据
path = 'ex3data1.mat'
data = sio.loadmat(path)
print(data)
print(type(data))
print(data.keys())
raw_X = data['X']
raw_Y = data['y']
# (5000, 400)
print(raw_X.shape)
# (5000, 1)
print(raw_Y.shape)

 2 Draw pictures of the numbers in the data set

def plot_an_image(X):
    pick_one = np.random.randint(5000)
    image = X[pick_one, :]
    fig, ax = plt.subplots(figsize=(1, 1))#设置图片尺寸
    ax.imshow(image.reshape(20, 20).T, cmap='gray_r')
    plt.xticks([])
    plt.yticks([])


plot_an_image(raw_X)
plt.show()


def plot_100_images(X):
    sample_index = np.random.choice(len(X), 100)#随机选取数据集里100个数据
    images = X[sample_index, :]
    print(images.shape)
    #定义10*10的子画布
    fig, ax = plt.subplots(ncols=10, nrows=10, figsize=(8, 8), sharex=True, sharey=True)
    #在每个子画布中画出一个数字
    for r in range(10):#行
        for c in range(10):#列
            ax[r, c].imshow(images[10 * r + c].reshape(20, 20).T, cmap='gray_r')
    #去掉坐标轴
    plt.xticks([])
    plt.yticks([])
    plt.show()

plot_100_images(raw_X)
plt.show()

 3 Loss function gradient

# 损失函数,找出最小的损失函数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def Cost_Function(theta, X, y, lamda):
    A = sigmoid(X @ theta)
    first = y * np.log(A)
    second = (1 - y) * np.log(1 - A)
    reg = np.sum(np.power(theta[1:], 2)) * (lamda / (2 * len(X)))
    return -np.sum(first + second) / len(X) + reg


def gradient_reg(theta, X, y, lamda):
    reg = theta[1:] * (lamda / len(X))
    reg = np.insert(reg, 0, values=0, axis=0)#插入第一行0
    first = (X.T @ (sigmoid(X @ theta) - y)) / len(X)
    return first + reg

 4 Data processing

X = np.insert(raw_X, 0, values=1, axis=1)#在X中插入一列1
# (5000, 401)
print(X.shape)
y = raw_Y.flatten()#对y进行降维
# (5000,)
print(y.shape)

5 one-to-many classification

Use a for loop to learn a regularized logistic regression classifier for each type of number, and then combine the parameters of the 10 classifiers into a parameter matrix theta_all and return it

# 利用内置函数求最优化
from scipy.optimize import minimize

# K为标签个数
def one_vs_all(X, y, lamda, K):
    n = X.shape[1]#X的列数401
    theta_all = np.zeros((K, n))#(10,401)
   #第0列到第9列分别对应类别1到10
    for i in range(1, K + 1):#遍历到k 1-k 对应1-10
        
        theta_i = np.zeros(n, )#传入minimize的必须是一维(401,)

        res = minimize(fun=Cost_Function,
                       x0=theta_i,
                       args=(X, y == i, lamda),
                       method='TNC',
                       jac=gradient_reg
                       )
        theta_all[i - 1, :] = res.x #将字典中x(theta)的值赋给theta
        #[i-1,:]与索引对应(0,9)
    return theta_all


lamda = 1
K = 10
theta_final = one_vs_all(X, y, lamda, K)
print(theta_final)

 6 predictions

Get a 5000 times 10 prediction probability matrix, find the value position with the highest probability in each row, get the predicted category, and then compare it with the expected value y to get the accuracy.

def predict(X, theta_final):
    # (5000,401) (10,401) => (5000,10)
    h = sigmoid(X @ theta_final.T)#假设函数,输出h为1的概率
    h_argmax = np.argmax(h, axis=1)#按行返回概率最大的数字索引
    return h_argmax + #索引+1对应数字


y_pred = predict(X, theta_final)
acc = np.mean(y_pred == y)
# 0.9446
print(acc)

Complete code

import numpy as np
import matplotlib.pyplot as plt
import scipy.io as sio

# 读入数据
path = 'ex3data1.mat'
data = sio.loadmat(path)
print(data)
print(type(data))
print(data.keys())
raw_X = data['X']
raw_Y = data['y']
# (5000, 400)
print(raw_X.shape)
# (5000, 1)
print(raw_Y.shape)


def plot_an_image(X):
    pick_one = np.random.randint(5000)
    image = X[pick_one, :]
    fig, ax = plt.subplots(figsize=(1, 1))#设置图片尺寸
    ax.imshow(image.reshape(20, 20).T, cmap='gray_r')
    plt.xticks([])
    plt.yticks([])


plot_an_image(raw_X)
plt.show()


def plot_100_images(X):
    sample_index = np.random.choice(len(X), 100)#随机选取数据集里100个数据
    images = X[sample_index, :]
    print(images.shape)
    #定义10*10的子画布
    fig, ax = plt.subplots(ncols=10, nrows=10, figsize=(8, 8), sharex=True, sharey=True)
    #在每个子画布中画出一个数字
    for r in range(10):#行
        for c in range(10):#列
            ax[r, c].imshow(images[10 * r + c].reshape(20, 20).T, cmap='gray_r')
    #去掉坐标轴
    plt.xticks([])
    plt.yticks([])
    plt.show()


plot_100_images(raw_X)
plt.show()


# 损失函数,找出最小的损失函数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def Cost_Function(theta, X, y, lamda):
    A = sigmoid(X @ theta)
    first = y * np.log(A)
    second = (1 - y) * np.log(1 - A)
    reg = np.sum(np.power(theta[1:], 2)) * (lamda / (2 * len(X)))
    return -np.sum(first + second) / len(X) + reg


def gradient_reg(theta, X, y, lamda):
    reg = theta[1:] * (lamda / len(X))
    reg = np.insert(reg, 0, values=0, axis=0)#插入第一行0
    first = (X.T @ (sigmoid(X @ theta) - y)) / len(X)
    return first + reg


X = np.insert(raw_X, 0, values=1, axis=1)
# (5000, 401)
print(X.shape)
y = raw_Y.flatten()
# (5000,)
print(y.shape)

# 利用内置函数求最优化
from scipy.optimize import minimize


# K为标签个数
def one_vs_all(X, y, lamda, K):
    n = X.shape[1]#X的列数401
    theta_all = np.zeros((K, n))#(10,401)
   #第0列到第9列分别对应类别1到10
    for i in range(1, K + 1):#遍历到k 1-k 对应1-10
        
        theta_i = np.zeros(n, )#传入minimize的必须是一维(401,)

        res = minimize(fun=Cost_Function,
                       x0=theta_i,
                       args=(X, y == i, lamda),
                       method='TNC',
                       jac=gradient_reg
                       )
        theta_all[i - 1, :] = res.x #将字典中x(theta)的值赋给theta
        #[i-1,:]与索引对应(0,9)
    return theta_all


lamda = 1
K = 10
theta_final = one_vs_all(X, y, lamda, K)
print(theta_final)


def predict(X, theta_final):
    # (5000,401) (10,401) => (5000,10)
    h = sigmoid(X @ theta_final.T)#假设函数,输出h为1的概率
    h_argmax = np.argmax(h, axis=1)#按行返回概率最大的数字索引
    return h_argmax + #索引+1对应数字


y_pred = predict(X, theta_final)
acc = np.mean(y_pred == y)
# 0.9446
print(acc)

Summarize

Read the data - Visualize the data set - Loss function - Gradient - Data processing (X plus bias term, y dimensionality reduction) - One-to-many classifier - Use the optimal function to get the optimal parameters - Prediction

Guess you like

Origin blog.csdn.net/m0_51933492/article/details/123892518#comments_29976850