Deeplearning.ai-Course 1- 具有神经网络思维的Logistic Regression(编程作业)

声明:本文参考 https://blog.csdn.net/u013733326/article/details/79639509,记录学习过程中的心得体会

Python版本:3.6.x

实验目的:搭建一个能够识别猫的图片的简单神经网络

实验步骤:

一、加载、处理数据

开始前引入的库:

import numpy as np
import matplotlib.pyplot as plt
import h5py
from lr_utils import load_dataset
引入的库
  • numpy:进行科学计算的软件包
  • matplotlib:绘制图表
  • h5py:与H5文件中存储的数据集进行交互的软件包
  • lr_utils:资料包里提供的库,加载资料包里面的数据

lr_utils.py代码如下:

import numpy as np
import h5py
    
    
def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
lr_utils.py
  • train_set_x_orig:训练集中的图像数据(209张图像 像素为64*64)
  • train_set_y_orig:训练集中图像对应到的标签(1-猫,0-非猫)
  • test_set_x_orig:测试集中的图像数据(50张图像 像素为64*64)
  • test_set_y_orig:测试集中图像对应的标签(1-猫,0-非猫)
  • classes:保存的是以bytes类型的两个字符串,数据为[b’non-cat’ b’cat’]

加载文件里的图片:

num = 1
for index in range(25,29):
    plt.subplot(2,2,num)
    num = num + 1
    plt.imshow(train_set_x[index])
plt.show()
加载样例图片

打印当前图片的标签:

index = 1
print("y = " + str(train_set_y[:, index]) + ", it's a " + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") + "' picture")
打印当前图片的标签

np.squeeze()函数用来压缩维度,只有使用该函数以后才能解码,eg:

print(train_set_y[:,index]) #result [0]
print(np.squeeze(train_set_y[:,index])) #result 0

加载图片的参数、维度

  • m_train:训练集中图片的数量
  • m_test:测试集中图片的数量
  • num_pixel:像素数目(所有图片均为64*64)
m_train = train_set_y.shape[1]
m_test = test_set_y.shape[1]
num_pixel = train_set_x.shape[1]

print("训练集的数量: m_train = " + str(m_train))
print("测试集的数量: m_test = " + str(m_test))
print("每张图片的宽度/高度: num_pixel = " + str(num_pixel))
print("每张图片的大小: (" + str(num_pixel) + ", " + str(num_pixel) + ", 3)")
print("训练集-图片的维度: " + str(train_set_x.shape))
print("训练集标签的维度: " + str(train_set_y.shape))
print("测试集-图片的维度: " + str(test_set_x.shape))
print("测试集标签的维度: " + str(test_set_y.shape))
图片维度信息
训练集的数量: m_train = 209
测试集的数量: m_test = 50
每张图片的宽度/高度: num_pixel = 64
每张图片的大小: (64, 64, 3) #每个像素点由(R,G,B)三原色组成
训练集-图片的维度: (209, 64, 64, 3)
训练集标签的维度: (1, 209)
测试集-图片的维度: (50, 64, 64, 3)
测试集标签的维度: (1, 50)

为了之后编程方便,将(209,64,64,3)的numpy数组reshape为(64*64*3,209)的数组,每列代表一幅图像

#将训练集的维度降低并转置
train_set_x_flatten = train_set_x.reshape(train_set_x.shape[0], -1).T # -1表示:在行已知的情况下,自动计算列
#将测试集的维度降低并转置
test_set_x_flatten = test_set_x.reshape(test_set_x.shape[0], -1).T

数组变为209行(因为训练集里有209张图片),如果不想计算有多少列,就用-1告诉程序帮你算,算出来12288列。用一个T表示转置,这就变成了12288行,209列。测试集亦如此。

print("训练集-图片的维度: " + str(train_set_x_flatten.shape))
print("训练集标签的维度: " + str(train_set_y.shape))
print("测试集-图片的维度: " + str(test_set_x_flatten.shape))
print("测试集标签的维度: " + str(test_set_y.shape))
View Code
训练集-图片的维度: (12288, 209)
训练集标签的维度: (1, 209)
测试集-图片的维度: (12288, 50)
测试集标签的维度: (1, 50)

注意:像素有红、绿、蓝三元色组成,范围在0-255;正式写算法前,要进行归一化

#归一化
train_set_x = train_set_x_flatten/255
test_set_x = test_set_x_flatten/255

二、搭建逻辑斯蒂回归模型

1、建立神经网络的主要步骤:

  • 定义模型结构(eg、输入特征的数目)
  • 初始化模型的参数(W、b
  • LOOP:
    • 前向传播(计算损失函数)
    • 反向传播(计算当前梯度)
    • 更新参数(梯度下降)
      • $W= W - \alpha \frac{\partial J\left (W ,b  \right )}{\partial W}$
      • $b= b - \alpha \frac{\partial J\left (W ,b  \right )}{\partial b}$

 2、逻辑斯蒂回归算法流程图

3、构建神经网络的数学公式,参见吴恩达深度学习的视频

前向传播:

①、单个样本,对于$\left \{ x^{\left ( i \right )},y^{\left ( i \right )} \right \}$:

$$z^{^{\left ( i \right )}} = w^{T}x^{^{\left ( i \right )}} + b\quad\left ( 1 \right )\\$$

$$\hat{y}^{\left ( i \right )}=a^{\left ( i \right )}=sigmoid\left ( z^{\left ( i \right )} \right )\quad\left ( 2 \right )\\$$

$$l\left ( a^{\left ( i \right )},y^{\left ( i \right )}\right )=-y^{\left ( i \right )}log\left ( a^{\left ( i \right )} \right )-\left ( 1-y^{\left ( i \right )} \right )log\left ( 1-a^{\left ( i \right )} \right )\quad\left ( 3 \right )\\$$

$$J\left ( w,b \right )=\frac{1}{m}\sum _{i=1}^{m}l\left ( a^{\left ( i \right )},y^{\left ( i \right )} \right )\quad\left ( 4 \right )$$

$$注:w^{T} = \left [ w_{1},w_{2},w_{3},\cdots,w_{n_{x}} \right ],x^{\left ( i \right )}.shape = \left ( n_{x},1 \right ),n_{x}为单个样本特征的数量,b相当于一个实数$$

②、m个样本,对于$X = \begin{bmatrix}x^{\left ( 1 \right )} &x^{\left ( 2 \right )} & \cdots &x^{\left ( m \right )}\end{bmatrix} , \mathbb{R}^{n_{x}\times m}$:

$$\begin{align*}
Z &= \begin{bmatrix}
z^{\left ( 1 \right )} &z^{\left ( 2 \right )} &\cdots &z^{\left ( m \right )}
\end{bmatrix}\\
&=\begin{bmatrix}
w^{T}x^{\left ( 1 \right )}+b &w^{T}x^{\left ( 2 \right )}+b &\cdots &w^{T}x^{\left ( m \right )}+b
\end{bmatrix}\\
&=w^{T}\begin{bmatrix}
x^{\left ( 1 \right )} &x^{\left ( 2 \right )} &\cdots &x^{\left ( m \right )}
\end{bmatrix} +\begin{bmatrix}
b &b &\cdots &b
\end{bmatrix}\\
&=w^{T}X + \begin{bmatrix}
b &b &\cdots &b
\end{bmatrix}\\
&=w^{T}X+\mathbf{b}
\end{align*}\\$$

$$A=\begin{bmatrix}
a^{\left ( 1 \right )} &a^{\left ( 2 \right )} &\cdots &a^{\left ( m \right )}
\end{bmatrix}=\sigma \left ( Z \right )$$

 反向传播:

①、单个样本

$$da=\frac{\partial l\left ( a,y \right )}{\partial a}=-\frac{y}{a}+\frac{1-y}{ 1-a}\quad\left ( 1 \right )\\$$

$$dz=\frac{\partial l\left ( a,y \right )}{\partial z}=\left ( \frac{\partial l}{\partial a} \right )\cdot \left ( \frac{\partial a}{\partial z} \right )=\left ( -\frac{y}{a} + \frac{1-y}{1-a}\right )\cdot a\left ( 1-a \right )=a-y\quad\left ( 2 \right )\\$$
$$dw_{1}=x_{1}dz\quad\left ( 3 \right )\\$$
$$dw_{2}=x_{2}dz\quad\left ( 4 \right )\\$$
$$db=dz\quad\left ( 5 \right )$$

②、m个样本

 $$\frac{\partial }{\partial w_{1}}J\left ( w,b \right )=\frac{1}{m}\sum_{i=1}^{m}\frac{\partial }{\partial w_{1}}l\left ( a^{\left ( i \right )} ,b^{\left ( i \right )}\right )$$

已知全局的损失函数,对$w_{1}$的微分实际上是从1到m各项损失对$w_{1}$微分的平均,故有:

$$dz^{\left ( i \right )}=a^{\left ( i \right )}-y^{\left ( i \right )}\quad\left ( 1 \right )\\$$
$$dw_{1}=\frac{1}{m}\sum_{i=1}^{m}x_{1}^{\left ( i \right )}\left ( a^{\left ( i \right )}-y^{\left ( i \right )} \right )\quad\left ( 2 \right )\\$$
$$dw_{2}=\frac{1}{m}\sum_{i=1}^{m}x_{2}^{\left ( i \right )}\left ( a^{\left ( i \right )}-y^{\left ( i \right )} \right )\quad\left ( 3 \right )\\$$
$$db=\frac{1}{m}\sum_{i=1}^{m}\left ( a^{\left ( i \right )}-y^{\left ( i \right )} \right )\quad\left ( 4 \right )$$

 向量化后:

$$dZ = A - Y\\$$

$$d\mathbf{w}=\begin{bmatrix}
dw_{1}\\
dw_{2}\\
\cdots \\dw_{n_{x}}
\end{bmatrix}=\frac{1}{m}\begin{bmatrix}
\sum_{i}^{m}x_{1}^{i}dz^{\left ( i \right )}\\
\sum_{i}^{m}x_{2}^{i}dz^{\left ( i \right )}\\
\cdots \\ \sum_{i}^{m}x_{n_{x}}^{i}dz^{\left ( i \right )}
\end{bmatrix}=\frac{1}{m}\sum_{i=1}^{m}x^{\left ( i \right )}dz^{\left ( i \right )}=\frac{1}{m}*X*dZ^{T}=\frac{1}{m}*np.dot\left ( X,dZ^{T} \right )\\$$
$$db = \frac{1}{m}\sum_{i=1}^{m}dz^{\left ( i \right )}=\frac{1}{m}*np.sum\left ( dZ^{T} \right )$$

4、函数部分

①、激活函数sigmoid()

def sigmoid(z):
    """
    :param z: 任何大小的标量或numpy数组
    :return: s - sigmoid(z) 
    """
    s = 1/(1 + np.exp(-z))
    return s
sigmoid

②、初始化参数initialize_with_zeros()

def initialize_with_zeros(dim):
    """
    :param dim: w矢量的大小(参数的数量) 
    :return: w -维度为(dim, 1)的初始化向量
                  b -初始化的标量
    """
    w = np.zeros(shape = (dim, 1))
    b = 0
    #使用断言确保数据维度和类型正确
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    return (w, b)
initialize_with_zeros

③、正向传播和反向传播propagate()

def propagate(w, b, X, Y):
    """
    :param w:  - 权重,大小不等的数组(num_pixel * num_pixel * 3,1)
    :param b:  - 偏差,一个标量
    :param X:  - 矩阵类型为(num_px * num_px * 3,训练数量)
    :param Y:  - 真正的“标签”矢量(如果非猫则为0,如果是猫则为1),矩阵维度为(1,训练数据数量)
    :return: 
                cost- 逻辑回归的负对数似然成本
                dw  - 相对于w的损失梯度,因此与w相同的形状
                db  - 相对于b的损失梯度,因此与b的形状相同
    """
    m = X.shape[1]

    #正向传播(计算当前损失)
    A = sigmoid(np.dot(w.T, X) + b)
    lost = (-1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A)))

    #反向传播
    dZ = A - Y
    dw = (1 / m) * np.dot(X , dZ.T)
    db = (1 / m) * np.sum(dZ)

    #使用断言确保数据正确
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    lost = np.squeeze(lost) #squeeze()从数组中删除单维度条目,即把shape中为1的维度去掉
    assert(lost.shape == ())

    #创建一个字典
    grads = {"dw" : dw , "db" : db}
    return (grads , lost)
propagate

④、更新参数optimize()

def optimize(w, b, X, Y, num_iterations, learning_rate, print_lost = False):
    """
    此函数通过梯度下降法来优化w和b 
    :param w:  - 权重(num_pixel * num_pixel * 3, 1)
    :param b:  - 偏差(标量)
    :param X:  - 维度为(num_pixel * num_pixel * 3, 训练数据的数量)
    :param Y:  - 标签矢量(0-非猫, 1-猫), 矩阵维度为(1, 训练数据的数量)
    :param num_iterations: - 迭代的次数(梯度下降的次数)
    :param learning_rate:  - 学习率
    :param print_lost:  - 每100步打印一次损失值
    :return: 
        params -  包含权重w和偏差b的字典
        gards - 包含权重和偏差的梯度的字典
        成本 - 梯度下降期间计算的所有成本列表, 绘制学习曲线 
    :提示:
        1)计算当前参数以及梯度, 使用propagate()
        2)使用w和b的梯度下降法, 更新参数
    """
    losts = []
    for i in range(num_iterations):

        grads, lost = propagate(w, b, X, Y)
        dw = grads["dw"]
        db = grads["db"]
        w = w - learning_rate * dw
        b = b - learning_rate * db

        #记录成本
        if i % 100 == 0:
            losts.append(lost)
        #打印成本数据
        if print_lost and (i % 100 == 0):
            print("迭代次数: %i, 误差值: %f" %(i,lost))

    params = {"w": w , "b": b}
    grads = {"dw": dw, "db":db}
    return (params, grads, losts)
optimize

⑤、预测函数predict()

def predict(w, b, X):
    """
    使用学习logistic(w,b)预测标签是0还是1
    :param w:  - 权重,大小不等的数组(num_pixel * num_pixel * 3,1)
    :param b:  - 偏差,一个标量
    :param X:  - 维度为(num_pixel * num_pixel * 3,训练数据的数量)的数据
    :return: Y_prediction  - 包含X中所有图片的所有预测的一个numpy(1,m)
    """
    m = X.shape[1] #图片的数量
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    #预测
    A = sigmoid(np.dot(w.T, X) + b)
    for i in range(A.shape[1]):
        #将A转换成(0,1)
        Y_prediction[0,i] = 1 if A[0,i] > 0.5 else 0

    #使用断言
    assert(Y_prediction.shape == (1,m))
    return Y_prediction
predict  

⑥、将上述函数整合到一个model()

def Logistic_Regression_model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_lost = False):
    """
    通过调用之前的函数来构建Logistic回归模型
    :param X_train:  - numpy的数组,维度为(num_pixel * num_pixel * 3, m_train)的训练集
    :param Y_train:  - numpy的数组,维度为(1,m_train)(矢量)的训练标签集
    :param X_test:   - numpy的数组,维度为(num_px * num_px * 3,m_test)的测试集
    :param Y_test:   - numpy的数组,维度为(1,m_test)的(向量)的测试标签集
    :num_iterations  - 表示用于优化参数的迭代次数的超参数
    :learning_rate  - 表示optimize()更新规则中使用的学习速率的超参数
    :print_cost  - 设置为true以每100次迭代打印成本
    :return: d  - 包含有关模型信息的字典。
    """
    w, b = initialize_with_zeros(X_train.shape[0])
    parameters, grads, losts = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_lost)
    w, b = parameters["w"], parameters["b"]

    #预测
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    #打印训练后的准确性
    print("训练集准确性:", format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100) , "%")
    print("测试集准确性:", format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100), "%")

    d = {
        "lost" : losts,
        "Y_prediction_test" : Y_prediction_test,
        "Y_prediction_train" : Y_prediction_train,
        "learning_rate" : learning_rate,
        "num_iterations" : num_iterations,
        "w": w,
        "b": b
    }
    return d
model
print("---------------------------------测试model------------------------------")
#这里加载的是真实的数据,请参见上面的代码部分。
d = Logistic_Regression_model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_lost = False)
训练集准确性: 99.04306220095694 %
测试集准确性: 70.0 %

 ⑦、可视化

#------------绘图(alpha = 0.005, 损失函数和迭代次数的关系)-----------
losts = np.squeeze(d["lost"])
plt.plot(losts)
plt.ylabel("Lost")
plt.xlabel("Iterations/100 times")
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

 

 

三、逻辑斯蒂回归模型的物理意义

假设训练集中的样本都是统计独立的,最小化损失函数等价于逻辑斯蒂回归模型的最大化似然函数(证明过程见吴恩达深度学习视频)

猜你喜欢

转载自www.cnblogs.com/xiazhenbin/p/12231867.html