该部分的神经网络仅使用训练好的参数，目的是感受一下神经网络的前向传播过程

# 此处为代码所需要的包
import matplotlib.pyplot as plt
import numpy as np
import scipy.io as sio
import matplotlib
import scipy.optimize as opt
from sklearn.metrics import classification_report  # 这个包是评价报告

1导入数据

数据文件包含图片像素化后的数据信息以及标签信息（即该图像是哪一个数字）
重要信息如下：
1. 标签包含5000条图像的标签
2. 数据信息是一个5000x400的矩阵（原图像是20x20像素，每一个图片就是400个特征点），由于原始数据的组织结构原因，这里在读取数据时需要进行一次transpose操作（见代码）

# 定义导入数据函数
def load_data(path, transpose=True):
    data = sio.loadmat(path)
    y = data.get('y')  # (5000,1)
    y = y.reshape(y.shape[0])  # make it back to column vector

    X = data.get('X')  # (5000,400)

    if transpose:
        # for this dataset, you need a transpose to get the orientation right
        X = np.array([im.reshape((20, 20)).T for im in X])

        # and I flat the image again to preserve the vector presentation
        X = np.array([im.reshape(400) for im in X])

    return X, y

# 读取数据
X, y = load_data('ex3data1.mat')
print('X.shape:', X.shape) # X.shape: (5000, 400)
print('y.shape:', y.shape) # y.shape: (5000,)，是一个列向量

2绘制数字图像

使用Python Matplotlib.axes.Axes.matshow()将图像矩阵绘制为图像
1. 第一个参数imag：要显示的矩阵
2. 第二个参数cmap：代表一个颜色映射方式
随机选择一个图像序列，将其画出来，如图1所示。

# 绘图函数
def plot_an_image(image):
    #     """
    #     image : (400,)
    #     """
    fig, ax = plt.subplots(figsize=(1, 1))
    ax.matshow(image.reshape((20, 20)), cmap=matplotlib.cm.binary)
    plt.xticks(np.array([]))  # just get rid of ticks
    plt.yticks(np.array([]))
    
pick_one = np.random.randint(0, 5000)
image_to_draw = X[pick_one, :]
print("image_to_draw.shape:", image_to_draw.shape)
plot_an_image(image_to_draw)
plt.show()
print('this should be {}'.format(y[pick_one]))

在这里插入图片描述

图1

绘制100个数字的图像，如图2所示。

def plot_100_image(X):
    """ sample 100 image and show them
    assume the image is square

    X : (5000, 400)
    """
    size = int(np.sqrt(X.shape[1])) # 即每个图片的原始像素宽度

    # sample 100 image, reshape, reorg it
    sample_idx = np.random.choice(np.arange(X.shape[0]), 100)  # 随机选取100*400
    sample_images = X[sample_idx, :]

    fig, ax_array = plt.subplots(nrows=10, ncols=10, sharey=True, sharex=True, figsize=(8, 8))

    for r in range(10):
        for c in range(10):
            ax_array[r, c].matshow(sample_images[10 * r + c].reshape((size, size)),
                                   cmap=matplotlib.cm.binary)
            plt.xticks(np.array([]))
            plt.yticks(np.array([]))
            
plot_100_image(X)
plt.show()

在这里插入图片描述

图2

3准备数据

准备数据X：和线性回归一样，需要在数据中加入偏置项 $x_0=1$ 。将该偏置项插入到数据的第一列。
准备数据y：读取进来的y的维度为(5000,)的列向量，其中每个元素的值表示该图像所代表的数字；为了便于完成多分类任务，以及与逻辑回归相符合，需要将其映射为10行的向量，若第 $k$ 行的元素为1，则该图像表示的数字就是 $k$ 。原始标签的转换过程如图3所示。

在这里插入图片描述

图3

# 准备数据X
raw_X, raw_y = load_data('ex3data1.mat')
print('raw_X.shape:', raw_X.shape)
print('raw_y.shape:', raw_y.shape)

# 在线性回归中有一个偏置项x_0=1,add intercept=1 for x0
X = np.insert(raw_X, 0, values=np.ones(raw_X.shape[0]), axis=1)  # 插入了第一列（全部为1）
print('X.shape:', X.shape)

# 准备数据y
# y have 10 categories here. 1..10, they represent digit 0 as category 10 because matlab index start at 1
# I'll ditit 0, index 0 again
y_matrix = []

for k in range(1, 11):
    y_matrix.append((raw_y == k).astype(int))  # 见配图 "向量化标签.png"，化

# last one is k==10, it's digit 0, bring it to the first position，最后一列k=10，都是0，把最后一列放到第一列
y_matrix = [y_matrix[-1]] + y_matrix[:-1]
y = np.array(y_matrix)

print('y.shape:', y.shape)  # (10,5000)
print('扩展后的标签集如下：')
print(y)
print(type(y))

4构建多个逻辑回归分类器

4.1构建代价函数和梯度下降函数

首先构建不带正则化项的，构建方法与线性回归一样

# 定义代价函数和对应的梯度下降函数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def cost(theta, X, y):
    ''' cost fn is -l(theta) for you to minimize'''
    return np.mean(-y * np.log(sigmoid(X @ theta)) - (1 - y) * np.log(1 - sigmoid(X @ theta)))


def gradient(theta, X, y):
    '''just 1 batch gradient'''
    return (1 / len(X)) * X.T @ (sigmoid(X @ theta) - y)

接着构建带正则化项的：

# 在此基础上定义带正则化项的代价函数和梯度下降函数
def regularized_cost(theta, X, y, l=1):
    '''you don't penalize theta_0'''
    theta_j1_to_n = theta[1:]
    regularized_term = (l / (2 * len(X))) * np.power(theta_j1_to_n, 2).sum()

    return cost(theta, X, y) + regularized_term


def regularized_gradient(theta, X, y, l=1):
    '''still, leave theta_0 alone'''
    theta_j1_to_n = theta[1:]
    regularized_theta = (l / len(X)) * theta_j1_to_n

    # by doing this, no offset is on theta_0
    regularized_term = np.concatenate([np.array([0]), regularized_theta])

    return gradient(theta, X, y) + regularized_term

构建预测函数

def predict(x, theta):
    prob = sigmoid(x @ theta)
    return (prob >= 0.5).astype(int)

4.2构建逻辑回归主函数

该逻辑回归函数将作为神经网络的一个分类器使用

# 定义逻辑回归主函数
def logistic_regression(X, y, l=1):
    """generalized logistic regression
    args:
        X: feature matrix, (m, n+1) # with incercept x0=1
        y: target vector, (m, )
        l: lambda constant for regularization

    return: trained parameters
    """
    # init theta
    theta = np.zeros(X.shape[1])

    # train it
    res = opt.minimize(fun=regularized_cost,
                       x0=theta,
                       args=(X, y, l),
                       method='TNC',
                       jac=regularized_gradient,
                       options={
    
    'disp': True})
    # get trained parameters
    final_theta = res.x

    return final_theta

测试逻辑回归：

print('y[0].shape:', y[0].shape)  # y[0].shape: (5000,)
t0 = logistic_regression(X, y[0])  # 使用第一组5000个样本测试一下逻辑回归
print('t0.shape:', t0.shape)  # t0.shape: (401,)
y_pred = predict(X, t0)
print('Accuracy={}'.format(np.mean(y[0] == y_pred)))  # Accuracy=0.9974

4.3训练

直接训练k维的模型，即一次性将十组参数全部训练出来

# 训练k维的模型
k_theta = np.array([logistic_regression(X, y[k]) for k in range(10)])  # 循环调用逻辑回归，将十组数据都训练完成
print('k_theta.shape:', k_theta.shape)  # k_theta.shape: (10, 401)

预测：由于是预测十组数据，因此不能直接将最终的参数向量传入预测函数，还需进行转置，过程如式(1)所示。
1. 最终的(5000,10)的结果含义：每一条数据在每一组参数上的预测结果

$X\theta^T=(X)_{(5000,401)}(\theta^T)_{(401,10)}=()_{(5000,10)} \tag{1}$

# 预测
prob_matrix = sigmoid(X @ k_theta.T)
np.set_printoptions(suppress=True) # 压制numpy的输出精度，取消科学技术法，便于观察
print('prob_matrix:\n',prob_matrix)
y_pred = np.argmax(prob_matrix, axis=1)  # 返回沿轴axis最大值的索引，axis=1代表行
print('y_pred:', y_pred) # y_pred: [0 0 0 ... 9 9 7]
print('y_pred.shape:',y_pred.shape) # y_pred.shape: (5000,)

4.4评价

使用机器学习包里面提供的评价方法进行预测结果的评价，结果如图4所示。

y_answer = raw_y.copy()
y_answer[y_answer == 10] = 0  # 原始数据集中的0用10来表示，因此需要先处理一下，变回来
print(classification_report(y_answer, y_pred))

在这里插入图片描述

图4

5使用神经网络进行分类

1、逻辑回归是线性分类器，无法拟合更加复杂的假设函数

2、而神经网络则能够拟合非线性的假设函数，从而表示更复杂的模型

3、因此下面使用神经网络进行预测，只不过暂时使用的是**训练好的神经网络的权重**，直接进行前向传播过程，进行预测。

5.1神经网络结构

共3层，输入层（即400维的向量，当然还有一个偏置）、一个隐藏层、一个输出层。网络结构如图5所示。
1. 隐藏层有25个神经元
2. 输出层有10个神经元，对应于10个数字类别

在这里插入图片描述

图5

5.2读取网络权重

读取代码如下：

# 加载训练好的神经网络权重
def load_weight(path):
    data = sio.loadmat(path)
    return data['Theta1'], data['Theta2']


theta1, theta2 = load_weight('ex3weights.mat')
print('theta1.shape:', theta1.shape) # theta1.shape: (25, 401)
print('theta2.shape:', theta2.shape) # theta2.shape: (10, 26)

由图5可知，经由 $\theta^{(1)}$ 计算之后，隐藏层需要增加一个偏置单元（在原来25个基础上增加一个神经元），因此theta1.shape[0]=25，而theta2.shape[1]=26。

5.3读取数据

# 读取数据-由于网络权重训练时没有转置数据，因此这里读取数据时就没有转置了
X, y = load_data('ex3data1.mat', transpose=False)

X = np.insert(X, 0, values=np.ones(X.shape[0]), axis=1)  # intercept
print('X.shape:', X.shape) # X.shape: (5000, 401)
print('y.shape:', y.shape) # y.shape: (5000,)

5.4前向传播过程

# 前向传播过程
# 1-第一层神经元的输出
a1 = X
# 2-第一层与第一组参数运算，得到第二层神经元的输入
z2 = a1 @ theta1.T  # (5000, 401) @ (25,401).T = (5000, 25)
print('z2.shape:', z2.shape)
# 3-加入隐藏层的偏置单元，并输入到激活函数中，得到第二层的输出
z2 = np.insert(z2, 0, values=np.ones(z2.shape[0]), axis=1)
a2 = sigmoid(z2)
print('a2.shape:', a2.shape)  # a2.shape: (5000, 26)
# 4-计算第三层的输入
z3 = a2 @ theta2.T
print('z3.shape:', z3.shape)  # z3.shape: (5000, 10)
# 5-计算第三层的输出
a3 = sigmoid(z3)
print('a3.shape:', a3.shape)
print(a3)

5.5预测结果比较

# 预测结果评价
# 原始数据由Matlab保存而来，索引为1-10，返回沿轴axis最大值的索引，axis=1代表行
y_pred = np.argmax(a3, axis=1) + 1  
print('y_pred.shape:', y_pred.shape) # y_pred.shape: (5000,)
print(classification_report(y, y_pred))

在这里插入图片描述

5.2-神经网络-手写数字识别

1导入数据

2绘制数字图像

3准备数据

4构建多个逻辑回归分类器

4.1构建代价函数和梯度下降函数

4.2构建逻辑回归主函数

4.3训练

4.4评价

5使用神经网络进行分类

5.1神经网络结构

5.2读取网络权重

5.3读取数据

5.4前向传播过程

5.5预测结果比较

猜你喜欢