Python_BP神经网络实现（向量化运算、鸢尾花分类测试）

简介

人工神经网络模型种类很多，其中根据网络内数据流向进行分类可以分为前馈网络、反馈网络和自组织网络。

通过对Andrew Ng的深度学习课程的学习，本文总结其中浅层神经网络一章的知识点(本文重点不在于公式讲解，而是算法的简单实现，具体理论可看——深度学习工程师)。在此介绍和利用Python实现BP神经网络，BP神经网络是一种典型的前馈神经网络。

结构

BP神经网络分为三层分别是输入层、隐层和输出层，其中隐层的层数可以扩展，且每一层的神经元个数也可以增减。每一层中神经元与前后层神经元相连接，但是同一层神经元之间无连接。可看下方示意图。

原理

当我们使用BP神经网络来对数据进行分类或者预测的时候，每对有连接的神经元之间都有一个权重，记为w；同时还有偏移量，记为b。每个神经元中还有一个激活函数，记为σ（x），要注意的是这不是一个函数，有多个函数可以作为激活函数：Sigmoid、tanh、Relu等。

在每一次迭代计算中，正向运算（输入层开始），我们会计算出一个值，然后计算出该值与标准值的误差；反向运算（输出层开始），按照减小误差方向，修正各连接权。通过一次次的迭代计算，直到误差减小到给定的极小值，就可以结束迭代，完成训练。

Python实现思路

通过python实现BP神经网络，主要有以下几个步骤：

神经网络结构确定
权重和偏移量参数初始化
正向传播计算
成本函数计算
反向传播计算
权重和偏移量参数更新

神经网络结构确定

该函数主要是为了获取输入量x的矩阵大小，以及标签y的矩阵大小。

def layer_size(X, Y):
"""
:param X: input dataset of shape (input size, number of examples)  (输入数据集大小（几个属性，样本量）)
:param Y: labels of shape (output size, number of exmaples) (标签数据大小（标签数，样本量）)
:return: 
n_x: the size of the input layer
n_y: the size of the output layer
"""
n_x = X.shape[0]
n_y = Y.shape[0]

return (n_x, n_y)

权重和偏移量参数初始化

该函数主要是为了初始化我们的连接权重w和偏移量b。要注意的是确保参数矩阵大小正确。

def initialize_parameters(n_x, n_h, n_y):
"""
initialize_parameters
(参数初始化)
:param n_x: size of the input layer 
:param n_h: size of the hidden layer
:param n_y: size of the output layer
:return: 
W1: weight matrix of shape (n_h, n_x) (第1层的权重矩阵(n_h, n_x))
b1: bias vector of shape (n_h, 1) (第1层的偏移量向量(n_h, 1))
W2: weight matrix of shape (n_y, n_h) (第2层的权重矩阵(n_y, n_h))
b2: bias vector of shape (n_y, 1) (第2层的偏移量向量(n_y, 1))
"""
# np.random.seed(2)  #Random initialization (随机种子初始化参数)

W1 = np.random.randn(n_h, n_x) * 0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y, n_h) * 0.01
b2 = np.zeros((n_y, 1))

parameters = {
    'W1': W1,
    'b1': b1,
    'W2': W2,
    'b2': b2,
}

return parameters

正向传播计算

该函数为正向传播计算，需要注意的是，中间层的激活函数为sigmoid，输出层的激活函数为tanh。

def forward_propagation(X, parameters):
"""
forward_propagation
(正向传播)
:param X: input data of size (n_x, m)  (输入数据集X)
:param parameters: python dictionary containing your parameters (output of initialization function) (字典类型， 权重以及偏移量参数)
:return: 
A2: The sigmoid output of the second activation (第2层激活函数sigmoid函数输出向量)
cache: a dictionary containing "Z1", "A1", "Z2" and "A2" (字典类型,包含"Z1", "A1", "Z2", "A2")
"""
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']

Z1 = np.dot(W1, X) + b1
A1 = np.tanh(Z1)            #第1层激活函数选择tanh
Z2 = np.dot(W2, A1) + b2
A2 = sigmoid(Z2)            #第2层激活函数选择sigmod


assert (A2.shape == (1, X.shape[1])) #若A2的大小和((1, X.shape[1])) 则直接报异常

cache = {
    'Z1': Z1,
    'A1': A1,
    'Z2': Z2,
    'A2': A2,
}

return A2, cache

成本函数计算

该函数主要是为了计算成本函数，注意一个样本的期望输出和实际输出的误差的平方用来定义损失函数，在向量化的计算过程中，这里使用了成本函数。详细定义可见深度学习工程师。

def compute_cost(A2, Y, parameters):
"""
compute cost
(计算成本函数)
:param A2: The sigmoid output of the second activation, of shape (1, number of examples) (第2层激活函数sigmoid函数输出向量)
:param Y: "true" labels vector of shape (1, number of examples) (正确标签向量)
:param parameters: python dictionary containing your parameters W1, b1, W2 and b2 (字典类型，权重以及偏移量参数)
:return: 
cost: cross-entropy cost 
"""
m = Y.shape[1]  # number of example

W1 = parameters['W1']
W2 = parameters['W2']

logprobs = np.multiply(np.log(A2), Y)
cost = - np.sum(np.multiply(np.log(A2), Y) + np.multiply(np.log(1. - A2), 1. - Y)) / m
# cost = np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2))/(-m)

cost = np.squeeze(cost) #squeeze()函数的功能是：从矩阵shape中，去掉维度为1的。例如一个矩阵是的shape是（5， 1），使用过这个函数后，结果为（5，）。

assert (isinstance(cost, float)) #若cost不是float型 则直接报异常

return cost

反向传播计算

该函数为方向传播计算。

def backward_propagation(parameters, cache, X, Y):
"""
backward propagation
(反向传播)
:param parameters: python dictionary containing our parameters
:param cache: a dictionary containing "Z1", "A1", "Z2" and "A2"
:param X: input data of shape (2,number of examples)
:param Y: "ture" labels vector of shape (1, number of examples)
:return: 
grads: python dictionary containing your gradients with respect to different parameters (字典类型，梯度微分参数)
"""
m = X.shape[1]

W1 = parameters['W1']
W2 = parameters['W2']

A1 = cache['A1']
A2 = cache['A2']

dZ2 = A2 - Y
dW2 = np.dot(dZ2, A1.T) / m
db2 = np.sum(dZ2, axis=1, keepdims=True) / m
dZ1 = np.dot(W2.T, dZ2) * (1 - A1 ** 2)
dW1 = np.dot(dZ1, X.T) / m
db1 = np.sum(dZ1, axis=1, keepdims=True) / m

grads = {
    'dW1': dW1,
    'db1': db1,
    'dW2': dW2,
    'db2': db2,
}

return grads

权重和偏移量参数更新

该函数为更新权重和偏移量参数。

def update_parameters(parameters, grads, learning_rate):
"""
update parameters
(更新权重和偏移量参数)
:param parameters: python dictionary containing your parameters
:param grads: python dictionary containing your gradients 
:param learning_rate (学习速率)
:return: 
:parameters:  python dictionary containing your updated parameters 
"""
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']

dW1 = grads['dW1']
db1 = grads['db1']
dW2 = grads['dW2']
db2 = grads['db2']

W1 = W1 - learning_rate * dW1
b1 = b1 - learning_rate * db1
W2 = W2 - learning_rate * dW2
b2 = b2 - learning_rate * db2

parameters = {
    "W1": W1,
    "b1": b1,
    "W2": W2,
    "b2": b2,
}

return parameters

BP神经网络

选择我们将上面的几个函数组合起来，就可以得到一个两层的BP神经网络模型。

def nn_model(X, Y, n_h, num_iterations, learning_rate, print_cost=False):
"""
Forward Neural Network model
(前向神经网络模型)
:param X: input dataset of shape (input size, number of examples)  (输入数据集大小（几个属性，样本量）)
:param Y: labels of shape (output size, number of exmaples) (标签数据大小（标签数，样本量）)
:param n_h: size of the hidden layer (隐层神经元数量)
:param num_iterations:  Number of iterations in gradient descent loop (迭代次数)
:param learning_rate (学习速率)
:param print_cost: if True, print the cost every 1000 iterations (是否打印显示)
:return: 
parameters: parameters learnt by the model. They can then be used to predict (训练完成后的参数)
"""

# np.random.seed(4)
n_x = layer_size(X, Y)[0]
n_y = layer_size(X, Y)[1]

parameters = initialize_parameters(n_x, n_h, n_y)
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']

cost_list = []
for i in range(0, num_iterations):

    A2, cache = forward_propagation(X, parameters)

    cost = compute_cost(A2, Y, parameters)

    cost_list.append(cost)

    grads = backward_propagation(parameters, cache, X, Y)

    parameters = update_parameters(parameters, grads, learning_rate)

    if print_cost and i % 1000 == 0:
        print("Cost after iteration %i: %f" % (i, cost))

return parameters, cost_list

鸢尾花分类测试

既然已经完成了这个BP神经网络模型，现在就可以来试试效果。在这里我们用鸢尾花的分类来检测这个模型的可用性。

简介

Iris数据集是常用的分类实验数据集，由Fisher, 1936收集整理。Iris也称鸢尾花卉数据集，是一类多重变量分析的数据集。数据集包含150个数据集，分为3类，每类50个数据，每个数据包含4个属性。可通过花萼长度，花萼宽度，花瓣长度，花瓣宽度4个属性预测鸢尾花卉属于（Setosa，Versicolour，Virginica）三个种类中的哪一类。

属性：

Sepal.Length（花萼长度），单位是cm;
Sepal.Width（花萼宽度），单位是cm;
Petal.Length（花瓣长度），单位是cm;
Petal.Width（花瓣宽度），单位是cm;

种类：

Iris Setosa（山鸢尾）（本例中使用数字‘0’表示）
Iris Versicolour（杂色鸢尾）（本例中使用数字‘1’表示）
Iris Virginica（维吉尼亚鸢尾）（本例中使用数字‘2’表示）

鸢尾花数据下载

测试程序

#!/usr/bin/env python  
# _*_ coding:utf-8 _*_  
#  
# @Version : 1.0  
# @Time    : 2018/6/6  
# @Author  : 圈圈烃
# @File    : User_BPNN
import numpy as np
import matplotlib.pyplot as plt
from Forward_NeuralNetwork import *


def data_process():
    """Iris.txt数据预处理"""
    with open("iris.txt", 'r') as f:
        data = f.read()
        data = data.replace('Iris-setosa', '0,')
        data = data.replace('Iris-versicolor', '1,')
        data = data.replace('Iris-virginica', '2,')
    with open("iris1.txt", 'w') as fw:
        fw.write(data)
        fw.close()


def load_csv():
    """加载处理好存入csv格式的数据"""
    tmp = np.loadtxt("iris.csv",dtype=np.str, delimiter=",")
    data = tmp[0:, 0:4].astype(np.float)
    label = tmp[0:, 4].astype(np.float)
    label = label.reshape(150, 1)
    return data.T, label.T


def normalized(X):
    """
    :param X: 待归一化的数据 
    :return: 
    X：归一化后的数据
    """
    Xmin, Xmax = X.min(), X.max()
    XN = (X - Xmin) / (Xmax - Xmin)
    return XN


def main():

    X, Y = load_csv()
    X = normalized(X)
    Y = normalized(Y)
    """训练集90个数据"""
    train_x = np.hstack((X[:, 0:30], X[:, 50:80], X[:, 100:130]))
    train_y = np.hstack((Y[:, 0:30], Y[:, 50:80], Y[:, 100:130]))
    """测试集60个数据"""
    test_x = np.hstack((X[:, 30:50], X[:, 80:100], X[:, 130:150]))
    test_y = np.hstack((Y[:, 30:50], Y[:, 80:100], Y[:, 130:150]))
    """训练，中间层10个神经元，迭代10000次，学习率0.25"""
    n_h = 10
    parameter, cost_list = nn_model(train_x, train_y, n_h, num_iterations=10000, learning_rate=0.25, print_cost=True)
    """测试，代入测试集数据"""
    A2, cache = forward_propagation(test_x, parameters=parameter)
    TY = A2
    TY[TY > 0.8] = 1
    TY[TY < 0.2] = 0
    TY[(TY >= 0.2) & (TY <= 0.8)] = 0.5
    # print(A2,TY)
    count = 0
    for i in range(0, 60):
        if TY[0, i] == test_y[0, i]:
            count += 1
    print("准确率为：%f %%" %(100*count/60))
    """绘制梯度下降曲线"""
    plt.plot(cost_list)
    plt.show()


if __name__ == '__main__':
    main()

测试结果

测试中，将150个数划分成了90个训练数据，60个测试数据。神经网络的中间层为10个神经元，迭代次数为10000次，学习率为0.25。在训练和测试中，需要对数据进行归一化，其中包括对标签数据Y的归一化，原来，我设置的三类鸢尾花的标签分别是0，1，2。通过归一化之后，获得的标签数据为0，0.5，1。对测试集获得的结果，进行归档，小于0.2的为0，大于0.8的为1，其余的均为0.5。最终获得的分类结果的准确率为98.3%。

Cost after iteration 0: 0.693152
Cost after iteration 1000: 0.280715
Cost after iteration 2000: 0.275627
Cost after iteration 3000: 0.274676
Cost after iteration 4000: 0.274162
Cost after iteration 5000: 0.273742
Cost after iteration 6000: 0.273368
Cost after iteration 7000: 0.273018
Cost after iteration 8000: 0.272678
Cost after iteration 9000: 0.272336
准确率为：98.333333 %

写在最后

完整程序，欢迎下载

还有不足，欢迎交流。