Part 1：Python Basics with Numpy (optional)

1. Building basic functions with numpy

1.1 Sigmoid function，np.exp()

exe. Build a function that returns the sigmoid of a real number x. Use math.exp(x) for the exponential function.

import math 
# x are real numbers
def basic_sigmoid(x):
    ans = 1/(1+1/math.exp(x))
    return ans

exe. Implement the sigmoid function using numpy.

# x can be real numers, matrices and vectors
import numpy
def sigmoid(x):
    ans = 1/(1+1/np.exp(x))
    return ans

1.2 Sigmoid gradient

exe. Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. The formula is:

def sigmoid_derivative(x):
    s = sigmoid(x)
    ans = s * (1-s)
    return ans

1.3 Reshaping arrays

exe. Implement image2vector() that takes an input of shape (length, height, 3) and returns a vector of shape (lengthheight3, 1).

def image2vector(image):
    v = image.reshape((image.shape[0]*image.shape[1]*image.shape[2], 1))
    return v

1.4 Normalizing rows

exe. Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).

def normalizeRows(x):
    x_norm = np.linalg.norm(x, axis=1, keepdims=True)
    x = x / x_norm
    return x

1.5 Broadcasting and the softmax function

exe. Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization.

def softmax(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis=1, keepdims=True)
    s = x_exp / x_sum
    return s

reminder：

np.exp(x) works for any np.array x and applies the exponential function to every coordinate
the sigmoid function and its gradient
image2vector is commonly used in deep learning
np.reshape is widely used. In the future, you’ll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs.
numpy has efficient built-in functions
broadcasting is extremely useful

2. Vectorization

2.1 dot、outer、elementwise

In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product.

loop版本：

import time
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

#dot：一维，计算内积，得到一个值；多维，满足矩阵相乘。
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot += x1[i]*x2[i]
toc = time.process_time()
print("dot = " + str(dot) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

#outer：对于多维向量，全部展开变为一维向量；第一个参数表示倍数，使得第二个向量每次变为几倍；第一个参数确定结果的行，第二个参数确定结果的列
tic = time.process_time()
outer = np.zeros((len(x1), len(x2)))
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i, j] = x1[i] * x2[j]
toc = time.process_time()
print("outer = " + str(outer) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

#elementwise：每个元素对应相乘
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i] * x2[i]
toc = time.process_time()
print("elementwise = " + str(mul) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

#generaldot
w = np.random.rand(3, len(x1))
tic = time.process_time()
gdot = np.zeros(w.shape[0])
for i in range(w.shape[0]):
    for j in range(len(x1)):
        gdot[i] += w[i, j] * x1[j]
toc = time.process_time()
print("gdot = " + str(gdot) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

向量化版本：

import numpy as np
###vectorize
#dot：一维，计算内积，得到一个值；多维，满足矩阵相乘。
tic = time.process_time()
dot = np.dot(x1, x2)
toc = time.process_time()
print("dot = " + str(dot) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

#outer：对于多维向量，全部展开变为一维向量；第一个参数表示倍数，使得第二个向量每次变为几倍；第一个参数确定结果的行，第二个参数确定结果的列
tic = time.process_time()
outer = np.outer(np.outer)
toc = time.process_time()
print("outer = " + str(outer) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

#elementwise：每个元素对应相乘
tic = time.process_time()
mul = np.multiply(x1, x2)
toc = time.process_time()
print("elementwise = " + str(mul) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

#generaldot
w = np.random.rand(3, len(x1))
tic = time.process_time()
gdot = np.dot(w, x1)
toc = time.process_time()
print("gdot = " + str(gdot) + "\n----------------computation time = " +  str(1000 * (toc - tic)) + "ms")

python中np.multiply（）、np.dot（）和星号（*）三种乘法运算的区别

2.2 Implement the L1 and L2 loss functions

exe. Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.

扫描二维码关注公众号，回复： 5302623 查看本文章

#L1 loss
def L1(yhat, y):
    loss = np.sum(np.abs(yhat-y))
    return loss

exe. Implement the numpy vectorized version of the L2 loss. There are several way of implementing the L2 loss but you may find the function np.dot() useful.

#L2 loss
def L2(yhat, y):
    loss = np.sum(np.power((yhat-y), 2))
    return loss

reminder：

Vectorization is very important in deep learning. It provides computational efficiency and clarity.
You have reviewed the L1 and L2 loss.
You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc…

Part 2： Logistic Regression with a Neural Network mindset

Problem Statement: You are given a dataset (“data.h5”) containing:

a training set of m_train images labeled as cat (y=1) or non-cat (y=0)
a test set of m_test images labeled as cat or non-cat
each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).

You will build a simple image-recognition algorithm that can correctly classify pictures as cat or non-cat.

完整代码：

import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import h5py
import scipy
from lr_utils import load_dataset

#加载数据集
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes =load_dataset()
# Example of a picture
index = 10
#plt.imshow(train_set_x_orig[index])
#plt.show()
#print("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")
#训练集里面的标签
#print("train_set_y=" + str(train_set_y))

m_train = train_set_y.shape[1]
m_test = test_set_y.shape[1]
num_px = train_set_x_orig.shape[1]

print("训练集数量：m_train " + str(m_train))
print("测试集数量：m_test " + str(m_test))
print("每张图片的宽/高：" + str(num_px))
print("每张图片的大小：( " + str(num_px) + "," + str(num_px) + ", 3 )")
print("训练集图片的维度：" + str(train_set_x_orig.shape))
print("训练集标签的维度：" + str(train_set_y.shape))
print("测试集图片的维度：" + str(test_set_x_orig.shape))
print("测试集标签的维度：" + str(test_set_y.shape))

# 将训练集的维度降低并转置
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
# 将测试集的维度降低并转置
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

print("训练集降维最后的维度：" + str(train_set_x_flatten.shape))
print("训练集标签的维度：" + str(train_set_y.shape))
print("测试集降维最后的维度：" + str(test_set_x_flatten.shape))
print("测试集标签的维度：" + str(test_set_y.shape))

#标准化数据
train_set_x = train_set_x_flatten / 255
test_set_x = test_set_x_flatten / 255

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))
    return s

def initialize_with_zero(dim):
    w = np.zeros(shape=(dim, 1))
    b = 0
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    return (w, b)

def propagate(w, b, X, Y):
    """
    实现前向和后向传播的成本函数及其梯度
    :param w:权重 (numpy*numpy*3, 1)
    :param b:偏差，标量
    :param X:(numpy*numpy*3, 训练数量)
    :param Y:标签矢量，（1， 训练数据数量）
    :return:cost, dw，db
    """
    m = X.shape[1]
    #正向传播
    A = sigmoid(np.dot(w.T, X) + b)
    cost = (-1 / m) * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

    #反向传播
    dw = (1 / m) * np.dot(X, (A - Y).T)
    db = (1 / m) * np.sum(A - Y)

    #断言确保数据维度正确
    assert (dw.shape == w.shape)
    assert (db.dtype == float)
    cost = np.squeeze(cost)
    assert (cost.shape == ())

    grads = {
        "dw": dw,
        "db": db
    }
    return (grads, cost)

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    通过运行梯度下降算法来优化w和b
    :param w:
    :param b:
    :param X:
    :param Y:
    :param num_iterations:优化循环的迭代次数
    :param learning_rate:梯度下降更新规则的学习率
    :param print_cost:每100步打印一次损失值
    :return:
    params：包含权重w和偏置b的字典
    grads：包含权重和偏置相对于成本函数的梯度的字典
    成本：优化期间绘制的所有成本列表，将用于绘制学习曲线

    需要写下两个步骤并遍历它们：
        1）计算当前参数的成本和梯度，使用propagate（）。
        2）使用w和b的梯度下降法则更新参数。
    """
    costs = []
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        dw = grads["dw"]
        db = grads["db"]
        w = w - learning_rate * dw
        b = b - learning_rate * db

        # 记录成本
        if i % 100 == 0:
            costs.append(cost)
        if (print_cost) and (i % 100 == 0):
            print("迭代的次数 %i，误差值 %f" % (i, cost))

    params = {
        "w": w,
        "b": b
    }
    return (params, grads, costs)

def predict(w, b, X):
    """
    使用学习到的参数w, b预测标签
    :param w:
    :param b:
    :param X:
    :return:
    Y_prediction:包含X中所有图片的所有预测的一个numpy数组
    """
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    #预测猫在图片中出现的概率
    A = sigmoid(np.dot(w.T, X) + b)
    for i in range(A.shape[1]):
        Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
    assert (Y_prediction.shape == (1, m))
    return Y_prediction

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
        通过调用之前实现的函数来构建逻辑回归模型
        :param
        X_train  - numpy的数组,维度为（num_px * num_px * 3，m_train）的训练集
        Y_train  - numpy的数组,维度为（1，m_train）（矢量）的训练标签集
        X_test   - numpy的数组,维度为（num_px * num_px * 3，m_test）的测试集
        Y_test   - numpy的数组,维度为（1，m_test）的（向量）的测试标签集
        num_iterations  - 表示用于优化参数的迭代次数的超参数
        learning_rate  - 表示optimize（）更新规则中使用的学习速率的超参数
        print_cost  - 设置为true以每100次迭代打印成本
        :return
         d  - 包含有关模型信息的字典。
    """
    w, b = initialize_with_zero(X_train.shape[0])
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    #从字典"参数"中检索参数w和b
    w, b = parameters["w"], parameters["b"]

    # 预测测试集/训练集的例子
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    #打印训练后的准确性
    print("训练集准确性：", format(100-np.mean(np.abs(Y_prediction_train - Y_train)) * 100), "%")
    print("测试集准确性：", format(100-np.mean(np.abs(Y_prediction_test - Y_test)) * 100), "%")

    d = {
        "costs": costs,
        "Y_prediction_test": Y_prediction_test,
        "Y_prediction_train": Y_prediction_train,
        "w": w,
        "b": b,
        "learning_rate": learning_rate,
        "num_iterations": num_iterations
    }
    return d

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.005, print_cost=True)

#绘制图
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

learning_rates = [0.05, 0.005, 0.0005]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

终于对神经网络有了一个大概完整的理解，但对于模型的整个训练过程有些还是理解不深。

参考资料：

[1] 中文版：https://blog.csdn.net/u013733326/article/details/79639509
[2] 英文版：https://blog.csdn.net/koala_tree/article/details/78057033

本文 Part1 部分主要参考英文版博客，Part2 部分参考自中文版博客。

01.神经网络和深度学习——week2 神经网络基础（编程作业）

Part 1：Python Basics with Numpy (optional)

1. Building basic functions with numpy

1.1 Sigmoid function，np.exp()

1.2 Sigmoid gradient

1.3 Reshaping arrays

1.4 Normalizing rows

1.5 Broadcasting and the softmax function

reminder：

2. Vectorization

2.1 dot、outer、elementwise

2.2 Implement the L1 and L2 loss functions

reminder：

Part 2： Logistic Regression with a Neural Network mindset

完整代码：

参考资料：

猜你喜欢