P0 前言

第二门课 : Improving Deep Neural Networks: Hyperparameter turing,Regularization and Optimization (改善深层神经网络：超参数调试、正则化以及优化)
第二周 : Hyperparameter Tuning (超参数调试和Batch Norm及框架)
主要知识点 : 超参数的调试、Batch Normalization、Softmax、TensorFlow程序框架等；

视频地址：https://mooc.study.163.com/learn/2001281003?tid=2001391036#/learn/announce

笔记地址：以后补

数据集，源码，作业的本地版网页缓存下载：链接：https://pan.baidu.com/s/1Mbj9jlxlx591Rag-zLiHPQ 提取码：q9aa

P1 作业

之前都是使用Numpy来搭建神经网络，这周开始用深度学习框架（tensorflow）来构建网络模型。

1-探索Tensorflow的library

首先导入需要的包

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

%matplotlib inline
np.random.seed(1)

一些有用的函数（在tf_utils.py中）：

def load_dataset():
    train_dataset = h5py.File('datasets/train_signs.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('datasets/test_signs.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes


def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
    """
    Creates a list of random minibatches from (X, Y)

    Arguments:
    X -- input data, of shape (input size, number of examples)
    Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
    mini_batch_size - size of the mini-batches, integer
    seed -- this is only for the purpose of grading, so that you're "random minibatches are the same as ours.

    Returns:
    mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
    """

    m = X.shape[1]                  # number of training examples
    mini_batches = []
    np.random.seed(seed)

    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[:, permutation]
    shuffled_Y = Y[:, permutation].reshape((Y.shape[0],m))

    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[:, k * mini_batch_size : k * mini_batch_size + mini_batch_size]
        mini_batch_Y = shuffled_Y[:, k * mini_batch_size : k * mini_batch_size + mini_batch_size]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)

    # Handling the end case (last mini-batch < mini_batch_size)
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size : m]
        mini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size : m]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)

    return mini_batches


def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)].T
    return Y


def predict(X, parameters):

    W1 = tf.convert_to_tensor(parameters["W1"])
    b1 = tf.convert_to_tensor(parameters["b1"])
    W2 = tf.convert_to_tensor(parameters["W2"])
    b2 = tf.convert_to_tensor(parameters["b2"])
    W3 = tf.convert_to_tensor(parameters["W3"])
    b3 = tf.convert_to_tensor(parameters["b3"])

    params = {"W1": W1,
              "b1": b1,
              "W2": W2,
              "b2": b2,
              "W3": W3,
              "b3": b3}

    x = tf.placeholder("float", [12288, 1])

    z3 = forward_propagation_for_predict(x, params)
    p = tf.argmax(z3)

    sess = tf.Session()
    prediction = sess.run(p, feed_dict = {x: X})

    return prediction

先来个开胃菜，实现这个函数： $loss = \mathcal{L}(\hat{y}, y) = (\hat y^{(i)} - y^{(i)})^2 \tag{1}$

如下所示（已经为你实现好了）：

y_hat = tf.constant(36, name='y_hat')            # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y')                    # Define y. Set to 39

loss = tf.Variable((y - y_hat)**2, name='loss')  # Create a variable for the loss

init = tf.global_variables_initializer()         # When init is run later (session.run(init)),
                                                 # the loss variable will be initialized and ready to be computed
with tf.Session() as session:                    # Create a session and print the output
    session.run(init)                            # Initializes the variables
    print(session.run(loss))                     # Prints the loss

#下面是输出结果
9

由此可见用tensorflow写程序需要以下几步：

创建（尚未执行/计算的）张量(变量，variable)。

在这些张量之间定义运算。

初始化你的张量。

创建一个会话（session）。

运行会话。这将运行您上面所写的操作。

注意到前两行我们只是定义了张量，一直到第四行才初始化他们。同理，第三行我们也只是定义了运算，到了第五行我们才使用session来进行实际的计算。

再看一个例子：

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

#执行结果
Tensor("Mul:0", shape=(), dtype=int32)

正如预料中一样，我们并没有看到结果20，不过我们得到了一个Tensor类型的变量，没有维度，数字类型为int32。我们之前所做的一切都只是把这些东西放到了一个“计算图(computation graph)”中，而我们还没有开始运行这个计算图，为了实际计算这两个数字，我们需要创建一个会话并运行它：

sess = tf.Session()
print(sess.run(c))

#执行结果
20

总结：

定义的变量都需要初始化

定义的运算需要session来执行

下面再来了解一下占位符（placeholders）。

占位符是一个对象，它的值只能在稍后指定，要指定占位符的值，可以使用一个feed字典（feed_dict变量）来传入，接下来，我们为x创建一个占位符，这将允许我们在稍后运行session时传入一个数字。

# Change the value of x in the feed_dict

x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

当我们第一次定义x时，我们不必为它指定一个值。占位符只是一个变量，我们会在运行会话时将数据分配给它。

1.1 线性函数

实现Y=WX+b的tf版本（其中W和X是随机矩阵，b是随机向量，W维度（4,3）X维度（3,1）b维度（4,1））

提示：定义一个（3,1）的X可以像这样：

X = tf.constant(np.random.randn(3,1), name = "X")

你可能用到的函数：

# GRADED FUNCTION: linear_function

def linear_function():
    """
    Implements a linear function:
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns:
    result -- runs the session for Y = WX + b
    """

    np.random.seed(1)

    ### START CODE HERE ### (4 lines of code)
    X = np.random.randn(3,1)
    W = np.random.randn(4,3)
    b = np.random.randn(4,1)
    Y = tf.add(tf.matmul(W,X),b) #刚开始用的np.dot()习惯还是没改过来，后来又用了tf.multiply()元素乘法导致维度错
    ### END CODE HERE ###

    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate

    ### START CODE HERE ###
    sess = tf.Session()
    result = sess.run(Y)#刚开始没有在run里给出要执行的运算
    ### END CODE HERE ###

    # close the session
    sess.close()

    return result

结果：

print("result = " +  str(linear_function()))

#结果如下
result = [[-2.15657382]
 [ 2.95891446]
 [-1.08926781]
 [-0.84538042]]

1.2 - 计算sigmoid

其实tensorflow已经为你实现了一些特殊的函数如：tf.sigmoid,tf.softmax等，但是实际使用中很少有直接向tf.sigmoid传实值的，一般需要用到placeholder，本节的目的就是用占位符和tf.sigmoid实现一个可用的sigmoid模块接口API。

你可能用到的函数：

# GRADED FUNCTION: sigmoid
def sigmoid(z):
    """
    Computes the sigmoid of z

    Arguments:
    z -- input value, scalar or vector

    Returns:
    results -- the sigmoid of z
    """

    ### START CODE HERE ### ( approx. 4 lines of code)
    # Create a placeholder for x. Name it 'x'.
    x = tf.placeholder(tf.float32,name="x")#刚开始没写参数TypeError: placeholder() missing 1 required positional argument: 'dtype'
    #忽略第二个参数也能运行

    # compute sigmoid(x)
    sigmoid = tf.sigmoid(x)

    # Create a session, and run it. Please use the method 2 explained above.
    # You should use a feed_dict to pass z's value to x.
    sess = tf.Session()
    # Run session and call the output "result"
    result = sess.run(sigmoid,feed_dict={x:z})#可见函数的传入参数z直到seesrun阶段才被用上

    ### END CODE HERE ###

    return result

结果如下：

print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

#结果：
sigmoid(0) = 0.5
sigmoid(12) = 0.999994

1.3 - 计算成本

回顾一下之前使用numpy计算cost的方式：

# GRADED FUNCTION: compute_cost
 
def compute_cost(AL, Y):
    """
    Implement the cost function defined by equation (7).
    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
    Returns:
    cost -- cross-entropy cost
    """
 
    m = Y.shape[1]
 
    # Compute loss from aL and y.
    ### START CODE HERE ### (≈ 1 lines of code)
    cost = -np.sum(np.multiply(np.log(AL),Y) + np.multiply(np.log(1 - AL), 1 - Y)) / m
    ### END CODE HERE ###
 
    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    assert(cost.shape == ())
 
    return cost

在tensorflow中使用一个函数：tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)就能实现上述运算，如下所示：

# GRADED FUNCTION: cost

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0)

    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels"
    in the TensorFlow documentation. So logits will feed into z, and labels into y.

    Returns:
    cost -- runs the session of the cost (formula (2))
    """

    ### START CODE HERE ###

    # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
    z = tf.placeholder(dtype=tf.float32,name="z")
    y = tf.placeholder(dtype=tf.float32,name="y")

    # Use the loss function (approx. 1 line)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z,labels=y)#logits是最后一层激活层的输出值

    # Create a session (approx. 1 line). See method 1 above.
    sess = tf.Session()

    # Run the session (approx. 1 line).
    cost = sess.run(cost,feed_dict={z:logits,y:labels})

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###

    return cost

结果：

logits = sigmoid(np.array([0.2, 0.4, 0.7, 0.9]))
cost = cost(logits, np.array([0, 0, 1, 1]))
print("cost = " + str(cost))

#结果
cost = [1.0053872  1.0366409  0.4138543  0.39956614]

1.4 - 使用独热编码（0、1编码）

深度学习中label一般来说要转换成独热码作为输入，如下所示：

onehot

在tensorflow中，只需要使用一行代码就能实现（numpy则复杂一点）：tf.one_hot(labels, depth, axis)

注意axis的含义（见代码），axis默认为1，也就是以列号为类号

# GRADED FUNCTION: one_hot_matrix

def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
                     will be 1.

    Arguments:
    labels -- vector containing the labels
    C -- number of classes, the depth of the one hot dimension

    Returns:
    one_hot -- one hot matrix
    """

    ### START CODE HERE ###

    # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
    C = tf.constant(C,name="C")

    # Use tf.one_hot, be careful with the axis (approx. 1 line)
    one_hot_matrix = tf.one_hot(labels,C,axis=0)#刚开始没加axis结果错误是个6*4矩阵,axis=0代表以行号为类号,=1是列号为类号

    # Create the session (approx. 1 line)
    sess = tf.Session()

    # Run the session (approx. 1 line)
    one_hot = sess.run(one_hot_matrix) #C现在是tfconstant不用feeddic

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###

    return one_hot

结果：

labels = np.array([1, 2, 3, 0, 2, 1])
one_hot = one_hot_matrix(labels, C=4)
print("one_hot = " + str(one_hot))

#结果：
one_hot = [[0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 1.]
 [0. 1. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0. 0.]]

1.5 - 初始化为0和1

现在我们将学习如何用0或者1初始化一个向量，我们要用到tf.ones()和tf.zeros()，给定这些函数一个维度值那么它们将会返回全是1或0的满足条件的向量/矩阵，我们来看看怎样实现它们：

# GRADED FUNCTION: ones

def ones(shape):
    """
    Creates an array of ones of dimension shape

    Arguments:
    shape -- shape of the array you want to create

    Returns:
    ones -- array containing only ones
    """

    ### START CODE HERE ###

    # Create "ones" tensor using tf.ones(...). (approx. 1 line)
    ones = tf.ones(shape)

    # Create the session (approx. 1 line)
    sess = tf.Session()

    # Run the session to compute 'ones' (approx. 1 line)
    ones = sess.run(ones)

    # Close the session (approx. 1 line). See method 1 above.
    sess.close()

    ### END CODE HERE ###
    return ones


#查看结果：
print("ones = " + str(ones([3])))
#结果：
ones = [1. 1. 1.]

2 - 使用TensorFlow构建你的第一个神经网络

我们将会使用TensorFlow构建一个神经网络，需要记住的是实现模型需要做以下两个步骤：
1. 创建计算图
2. 运行计算图

我们开始一步步地走一下：

2.0 - 要解决的问题

建立一个算法进行手势识别。下面是每个数字的样本，以及我们如何表示标签的解释。这些都是原始图片，我们实际上用的是64 * 64像素的图片。

hands

训练集：有从0到5的数字的1080张图片(64x64像素)，每个数字拥有180张图片。
测试集：有从0到5的数字的120张图片(64x64像素)，每个数字拥有5张图片。

加载数据集：

# Loading the dataset
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()
index = 11
plt.imshow(X_train_orig[index])
print("Y = " + str(np.squeeze(Y_train_orig[:,index])))

#结果
Y = 1

y=1

和往常一样，我们要对数据集进行扁平化，然后再除以255以归一化数据，除此之外，我们要需要把每个标签转化为独热向量，像上面的图一样。

    #扁平化
    X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T  # 每一列就是一个样本
    X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T

    # 归一化数据
    X_train = X_train_flatten / 255
    X_test = X_test_flatten / 255

    # 转换为独热矩阵
    Y_train = convert_to_one_hot(Y_train_orig, 6)
    Y_test = convert_to_one_hot(Y_test_orig, 6)

    print("训练集样本数 = " + str(X_train.shape[1]))
    print("测试集样本数 = " + str(X_test.shape[1]))
    print("X_train.shape: " + str(X_train.shape))
    print("Y_train.shape: " + str(Y_train.shape))
    print("X_test.shape: " + str(X_test.shape))
    print("Y_test.shape: " + str(Y_test.shape))


#结果
训练集样本数 = 1080
测试集样本数 = 120
X_train.shape: (12288, 1080)
Y_train.shape: (6, 1080)
X_test.shape: (12288, 120)
Y_test.shape: (6, 120)

我们的目标是构建能够高准确度识别符号的算法。要做到这一点，你要建立一个TensorFlow模型，这个模型几乎和你之前在猫识别中使用的numpy一样（但现在使用softmax输出）。要将您的numpy实现与tensorflow实现进行比较的话这是一个很好的机会。

目前的模型是：LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX，SIGMOID输出层已经转换为SOFTMAX。当有两个以上的类时，一个SOFTMAX层将SIGMOID一般化。

2.1 - 创建placeholders

我们的第一项任务是为X和Y创建占位符，这将允许我们稍后在运行session时传递您的训练数据。

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)

    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"

    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """


    ### START CODE HERE ### (approx. 2 lines)
    X = tf.placeholder(dtype=tf.float32,shape=[n_x,None],name="X")#刚开始直接用的float32报错，因该是tf.float32
    Y = tf.placeholder(dtype=tf.float32,shape=[n_y,None],name="Y")
    ### END CODE HERE ###

    return X, Y

#查看结果
X, Y = create_placeholders(12288, 6)
print("X = " + str(X))
print("Y = " + str(Y))

#结果
X = Tensor("X:0", shape=(12288, ?), dtype=float32)
Y = Tensor("Y:0", shape=(6, ?), dtype=float32)

2.2 - 初始化参数

初始化tensorflow中的参数，我们将使用Xavier初始化权重和用零来初始化偏差，比如：

W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())

注意：tf.Variable() 每次都在创建新对象，对于get_variable()来说，对于已经创建的变量对象，就把那个对象返回，如果没有创建变量对象的话，就创建一个新的。

def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]

    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """

    tf.set_random_seed(1)  # so that your "random" numbers match ours

    ### START CODE HERE ### (approx. 6 lines of code)
    W1 = tf.get_variable(name="W1",shape=[25,12288],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b1 = tf.get_variable(name="b1",shape=[25,1],dtype=tf.float32,initializer=tf.zeros_initializer())
    W2 = tf.get_variable(name="W2",shape=[12,25],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b2 = tf.get_variable(name="b2",shape=[12,1],dtype=tf.float32,initializer=tf.zeros_initializer())
    W3 = tf.get_variable(name="W3",shape=[6,12],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b3 = tf.get_variable(name="b3",shape=[6,1],dtype=tf.float32,initializer=tf.zeros_initializer())
    ### END CODE HERE ###


    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}

    return parameters

#查看结果
tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

#结果
W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>

正如预期的那样，这些参数只有物理空间，但是还没有被赋值，这是因为没有通过session执行。

2.3 - 前向传播

我们将要在TensorFlow中实现前向传播，该函数将接受一个字典参数并完成前向传播，它会用到以下代码：

tf.add(...,...) to do an addition
tf.matmul(...,...) to do a matrix multiplication
tf.nn.relu(...) to apply the ReLU activation

我们要实现神经网络的前向传播，我们会拿numpy与TensorFlow实现的神经网络的代码作比较。两者之间的重要区别是TF的前向传播要在Z3处停止，因为在TensorFlow中最后的线性输出层的输出（Z3）作为计算损失函数（Cost_Function）的输入，所以不需要A3.

# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """

    # Retrieve the parameters from the dictionary "parameters"
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']

    ### START CODE HERE ### (approx. 5 lines)              # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1,X),b1)  # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)  # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2,A1),b2)  # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)  # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3,A2),b3)  # Z3 = np.dot(W3,A2) + b3
    ### END CODE HERE ###

    return Z3

#查看结果
    tf.reset_default_graph()
    with tf.Session() as sess:
        X, Y = create_placeholders(12288, 6)
        parameters = initialize_parameters()
        Z3 = forward_propagation(X, parameters)
        print("Z3 = " + str(Z3))
#结果
Z3 = Tensor("Add_2:0", shape=(6, ?), dtype=float32)

您可能已经注意到前向传播不会输出任何cache，当我们完成反向传播的时候你就会明白了。

2.4 - 计算成本

如前所述，成本很容易计算：

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))

由于Z3的维度是[N_3 , number of examples]，label的维度是[num_classes, number of examples],因此在计算cost前需要进行转置（transpose）。此外tf.reduce_mean 是对所有样本的求和。

# GRADED FUNCTION: compute_cost

def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """

    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)

    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels))
    ### END CODE HERE ###

    return cost

#查看结果
    tf.reset_default_graph()
    with tf.Session() as sess:
        X, Y = create_placeholders(12288, 6)
        parameters = initialize_parameters()
        Z3 = forward_propagation(X, parameters)
        cost = compute_cost(Z3, Y)
        print("cost = " + str(cost))
#结果
cost = Tensor("Mean:0", shape=(), dtype=float32)

2.5 - 反向传播&更新参数

得益于编程框架，所有反向传播和参数更新都在1行代码中处理。计算成本函数cost后，你将创建一个“optimizer”对象。运行tf.session时，必须将optimizer对象与成本函数cost一起调用（sess.run），当被调用时，它将使用所选择的方法和学习速率对给定成本进行优化。

举个例子，对于梯度下降:

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

要进行优化，应该这样做：

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

编写代码时，我们经常使用 _ 作为一次性变量来存储我们稍后不需要使用的值。这里，_具有我们不需要的optimizer的评估值（c则是cost的值）。

2.6 - 构建模型

def model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,
          num_epochs=1500, minibatch_size=32, print_cost=True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.

    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs

    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """

    ops.reset_default_graph()  # to be able to rerun the model without overwriting tf variables
    tf.set_random_seed(1)  # to keep consistent results
    seed = 3  # to keep consistent results
    (n_x, m) = X_train.shape  # (n_x: input size, m : number of examples in the train set)
    n_y = Y_train.shape[0]  # n_y : output size
    costs = []  # To keep track of the cost

    # Create Placeholders of shape (n_x, n_y)
    ### START CODE HERE ### (1 line)
    X, Y = create_placeholders(n_x=n_x,n_y=n_y)
    ### END CODE HERE ###


    # Initialize parameters
    ### START CODE HERE ### (1 line)
    parameters = initialize_parameters()
    ### END CODE HERE ###

    # Forward propagation: Build the forward propagation in the tensorflow graph
    ### START CODE HERE ### (1 line)
    Z3 = forward_propagation(X=X,parameters=parameters)
    ### END CODE HERE ###

    # Cost function: Add cost function to tensorflow graph
    ### START CODE HERE ### (1 line)
    cost = compute_cost(Z3=Z3,Y=Y)
    ### END CODE HERE ###

    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    ### START CODE HERE ### (1 line)
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)#刚开始没有minimize
    ### END CODE HERE ###

    # Initialize all the variables
    init = tf.global_variables_initializer()

    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:
        # Run the initialization
        sess.run(init)

        # Do the training loop
        for epoch in range(num_epochs):

            epoch_cost = 0.  # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size)  # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                #Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch

                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                ### START CODE HERE ### (1 line)
                _, minibatch_cost = sess.run([optimizer,cost],feed_dict={X:minibatch_X,Y:minibatch_Y})
                ### END CODE HERE ###

                epoch_cost += minibatch_cost / num_minibatches

            # Print the cost every epoch
            if print_cost == True and epoch % 100 == 0:
                print("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)

        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # lets save the parameters in a variable
        parameters = sess.run(parameters)#这句干嘛的？？为什么用run来保存？
        print("Parameters have been trained!")

        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters

我们来正式运行一下模型，请注意，这次的运行时间大约在5-8分钟左右，如果在epoch = 100的时候，你的epoch_cost = 1.01645776539的值和我相差过大，那么你就立即停止，回头检查一下哪里出了问题。

parameters = model(X_train, Y_train, X_test, Y_test)

#结果
epoch = 0    epoch_cost = 1.85570189447
epoch = 100    epoch_cost = 1.01645776539
epoch = 200    epoch_cost = 0.733102379423
epoch = 300    epoch_cost = 0.572938936226
epoch = 400    epoch_cost = 0.468773578604
epoch = 500    epoch_cost = 0.3810211113
epoch = 600    epoch_cost = 0.313826778621
epoch = 700    epoch_cost = 0.254280460603
epoch = 800    epoch_cost = 0.203799342567
epoch = 900    epoch_cost = 0.166511993291
epoch = 1000    epoch_cost = 0.140936921718
epoch = 1100    epoch_cost = 0.107750129745
epoch = 1200    epoch_cost = 0.0862994250475
epoch = 1300    epoch_cost = 0.0609485416137
epoch = 1400    epoch_cost = 0.0509344103436
Parameters have been trained!
Train Accuracy: 0.999074
Test Accuracy:0.725

plot

思考：

你的模型似乎足够大，能很好地适应训练集。然而，考虑到train精度和test精度之间的差异，您可以尝试添加L2或dropout正则化以减少过拟合。

将session看作是训练模型的代码块。每次在minibatch上运行session时，它都会训练参数。总而言之，您已经重复运行session很多次了(1500次epoch)，直到您获得了经过良好训练的参数。

2.7 - 测试你自己的图片（选做）

import scipy
from PIL import Image
from scipy import ndimage

## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "thumbs_up.jpg"
## END CODE HERE ##

# We preprocess your image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64,64)).reshape((1, 64*64*3)).T
my_image_prediction = predict(my_image, parameters)

plt.imshow(image)
print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))

#结果：
Your algorithm predicts: y = 3

还在路上，稍等...

你确实应该“竖起大拇指”的图（代表1），尽管正如你所看到的，算法似乎对它的分类不正确。原因是训练集不包含任何“竖起大拇指”的图片，所以模型不知道如何处理它!我们称之为"不匹配的数据分布"这是下一门关于"构建机器学习项目"的课程之一。

总结：

TF中的两个主要对象类是张量（Tensors）和运算（Operators）

当您在tensorflow中编码时，您必须采取以下步骤:

就像在model()中看到的那样，可以多次执行这个静态图

在“optimizer”对象上运行session时，将自动执行反向传播（backpropagation）和优化（optimization）。

吴恩达Coursera深度学习课程 course2-week3 超参数调试和Batch Norm及框架作业

P0 前言

P1 作业

1-探索Tensorflow的library

1.1 线性函数

1.2 - 计算sigmoid

1.3 - 计算成本

1.4 - 使用独热编码（0、1编码）

1.5 - 初始化为0和1

2 - 使用TensorFlow构建你的第一个神经网络

2.0 - 要解决的问题

2.1 - 创建placeholders

2.2 - 初始化参数

2.3 - 前向传播

2.4 - 计算成本

2.5 - 反向传播&更新参数

2.6 - 构建模型

2.7 - 测试你自己的图片（选做）

猜你喜欢

吴恩达Coursera深度学习课程 course2-week3 超参数调试和Batch Norm及框架 作业

P0 前言

P1 作业

1-探索Tensorflow的library

1.1 线性函数

1.2 - 计算sigmoid

1.3 - 计算成本

1.4 - 使用独热编码（0、1编码）

1.5 - 初始化为0和1

2 - 使用TensorFlow构建你的第一个神经网络

2.0 - 要解决的问题

2.1 - 创建placeholders

2.2 - 初始化参数

2.3 - 前向传播

2.4 - 计算成本

2.5 - 反向传播&更新参数

2.6 - 构建模型

2.7 - 测试你自己的图片（选做）

猜你喜欢

吴恩达Coursera深度学习课程 course2-week3 超参数调试和Batch Norm及框架作业