深度学习系列之cs231n assignment1 two_layer_net（五）

写在开头：这次是完成assignmengt1的第四个作业浅层神经网络，通过这样的浅层神经网络来感受神经网络向前计算得分与向后计算梯度更新的过程。

内容安排

今天的任务主要是搭建两层全连接层，并在中间加入Relu的操作处理，最后使用softmax的损失函数进行梯度的更新，并进行预测。在本次的任务中与上一节softmax的区别在于搭建网络和全连接层的传递，任务的loss是softmax是一样的，同样会在需要公示或者讲解的地方进行讲解。

开始完成任务

1.构建网路并加载测试数据
首先是加载一些调用的包，还有我们用于计算误差的函数，

# A bit of setup

import numpy as np
import matplotlib.pyplot as plt

from cs231n.classifiers.neural_net import TwoLayerNet


%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

然后为了方便我们测试代码是否运行正常，我们在这里应当使用小样本数据来进行测试，并且我们要初始化网络的输入，隐藏层数量，还有输出类别的数量，

# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
    np.random.seed(0)
    return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
    np.random.seed(1)
    X = 10 * np.random.randn(num_inputs, input_size)
    y = np.array([0, 1, 2, 2, 1])
    return X, y

net = init_toy_model()
X, y = init_toy_data()

2.编写得分、损失、梯度函数
老样子这时我们需打开neural_net.py并对其中的任务进行编辑，在前面我们可以一起编辑得分、损失和梯度的计算，但是这里我们需要通过分块并采用讲解加编程的结合进行。
浅层神经网络
首先我们讲解一下今天的关键的两层神经网络是个啥。首先上一张图，
在这里插入图片描述这个图就是我们今天要研究的两层浅层神经网络，首先我们来说一下第一个问题，他们是如何进行传递的？
Q1：神经网络是怎么进行传递的？
我们这里的数据从输入层到隐藏层1是怎么进行的呢？利用的就是一个全连接层，跟我们在前文计算svm和softmax一样，通过像素与权重相乘得到每个类别的得分，在这里插入图片描述
这里展示的s231n课程文件中对于一张猫猫图片的预测流程，这与前几节是一样的，先将图片像素向量化乘以权重矩阵子再加上常数调整项，得到每个类别的得分。如果还有更多层，那么就将计算得到的结果再与下一个权重矩阵相乘得到类的得分。
Q2：ReLU是什么？
根据题目要求，我们需要在第一层与第二层之间加入一个ReLU函数进行处理，那么ReLU函数是什么呢？我们将其函数图像放出来，
在这里插入图片描述
ReLU函数的形式是,
$f=max(0,x)$
通过这样的处理，在第一层计算得到数据后，需要对其进行ReLU处理，让小于0的数变为0，也就是不去考虑得分小于0的值，那么有的朋友会问了，为什么需要这样一个ReLU处理呢？不处理不行吗？其他的处理方法不行吗？这个主要还是因为ReLU首先符合一种生物学的激活特性，其次能够引入稀疏增快计算效率，可以参考一下这篇博客，点击这里进行查看。笔者会在后期看了论文后进行分析和讨论。
任务1：得分函数代码编写
所以再将ReLU后的结果放入下一层中就可以输出我们的得分了。我们直接在这里展示一下我们第一个需要编辑的代码，就是求每个样本在全部类别的得分，我们在编写代码时需要注意一下矩阵维度的对应，

"""
X: (N, D). X[i]就是一个训练样本，共有N个训练样本.
W1:第一层权重;(D, H)
b1: 第一层偏移项;(H,)
W2: 第二层权重;(H, C)
b2: 第二层偏移项;(C,)
"""
layer1 = np.dot(X, W1) + b1  #输出(N,H)
reluLayer = np.maximum(0, layer1) #输出(N,H)
scores = np.dot(reluLayer, W2) + b2 #输出(N,C)

第一个任务就完成了，在作业里有对应的验证程序我们后再后面调用时说明。那么求解完了得分，我们就需要就算损失函数，这里我们仍使用上一节的softmax损失函数，具体函数形式见上一节，可以点击这里进行查看。
任务2：sotfmax损失函数代码编写
损失函数是通过计算得分函数的变形得到，于是代码如下，

scores = scores - np.max(scores, axis=1).reshape(-1,1)
softmaxFucntion = np.exp(scores)/np.sum(np.exp(scores), axis=1).reshape(-1,1)
loss = np.sum(-np.log(softmaxFucntion[range(N), list(y)]))
loss /= N
loss += 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)

这里我们需要注意两点，第一点就是需要对得分函数进行处理，减去其最大值为了使得结果更加稳定；第二个点就是在正则化的时候需要对两个权重都进行正则化，并且各占一半的比例，对于正则化的使用我还不是很熟练后面会集中学习一下再更新。同样针对损失函数也有对应的测试代码，后面再进行展示。
任务3：浅层神经网络梯度代码编写
计算完得分函数、损失函数后就是需要计算梯度来对参数进行更新，这里我们将整个的一个推导过程进行展示，主要使用的是求偏导的链式法则进行倒推。
首先我们计算得分函数的流程为，
$layer1 = X\times W_1+b_1$ $reluLayer = max(0,layer1)$ $scores = layer2 = reluLayer\times W_2+b_2$ $loss=softmax(layer2)$
然后我们计算倒推梯度计算

$\frac{\partial loss}{\partial W_2}=\frac{\partial loss}{\partial layer2}\times\frac{\partial layer2}{\partial W_2}=\frac{\partial loss}{\partial scores}\times\frac{\partial scores}{\partial W_2}=\frac{\partial loss}{\partial scores}\times reluLayer$ $\frac{\partial loss}{\partial b_2}=\frac{\partial loss}{\partial layer2}\times\frac{\partial layer2}{\partial b_2}=\frac{\partial loss}{\partial scores}\times\frac{\partial scores}{\partial b_2}=(1,...,1)_N\times\frac{\partial loss}{\partial scores}$ $\frac{\partial loss}{\partial reluLayer}=\frac{\partial loss}{\partial layer2}\times\frac{\partial layer2}{\partial reluLayer}=\frac{\partial loss}{\partial scores}\times\frac{\partial scores}{\partial reluLayer}=\frac{\partial loss}{\partial scores}\times W_2$ $\frac{\partial loss}{\partial W_1}=\frac{\partial loss}{\partial reluLayer}\times\frac{\partial reluLayer}{\partial W_1}=\frac{\partial loss}{\partial scores}\times X$ $\frac{\partial loss}{\partial b_1}=\frac{\partial loss}{\partial reluLayer}\times\frac{\partial reluLayer}{\partial b_1}=(1,...,1)_N\times\frac{\partial loss}{\partial reluLayer}$ 这样就得到了所有参数的梯度，就可以进行梯度编程计算，然后更新权重，进行迭代了。这里值得一提的是，在我们计算完当前轮的权重梯度时我们需要添加L1正则项，至于为什么我会对权重梯度增加正则项，为什么不仍然采用L2的正则项，我会在确认后进行更新的。

dscores = softmaxFucntion.copy()
dscores[range(N), list(y)] -= 1
dscores /= N
dW2 = np.dot(reluLayer.T, dscores)
db2 = np.sum(dscores, axis=0)

drelu = np.dot(dscores, W2.T)
drelu[reluLayer <= 0] = 0

dW1 = np.dot(X.T, drelu)
db1 = np.sum(drelu, axis=0)

dW2 += reg*W2
dW1 += reg*W1

grads['W1'] = dW1
grads['b1'] = db1
grads['W2'] = dW2
grads['b2'] = db2

3.训练和预测函数的编写
然后在计算完梯度后我们需要去更新我们的权重训练我们的模型，所以需要编写训练函数，但这里的要求我们只需要完成随机梯度下降的小样本选择以及参数的更新过程的代码即可，

idx_batch = np.random.choice(num_train, batch_size, replace=True)
X_batch = X[idx_batch]
y_batch = y[idx_batch]

self.params['W1'] -= learning_rate * grads['W1']
self.params['b1'] -= learning_rate * grads['b1']
self.params['W2'] -= learning_rate * grads['W2']
self.params['b2'] -= learning_rate * grads['b2']

最后我们的目的是为了预测不同样本的类别因此我们还需要编写一个预测函数，同样选择每一类中得分函数最大的为该样本的类，

W1 = self.params['W1']
b1 = self.params['b1']
W2 = self.params['W2']
b2 = self.params['b2']
layer1 = np.dot(X, W1)+b1
reluLayer = np.maximum(0, layer1)
scores = np.dot(reluLayer, W2) + b2
y_pred = np.argmax(scores, axis=1)

好了到这里我们就完成了neural_net里的所有任务，完整代码如下，

from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt
from past.builtins import xrange

class TwoLayerNet(object):
  """
  A two-layer fully-connected neural network. The net has an input dimension of
  N, a hidden layer dimension of H, and performs classification over C classes.
  We train the network with a softmax loss function and L2 regularization on the
  weight matrices. The network uses a ReLU nonlinearity after the first fully
  connected layer.

  In other words, the network has the following architecture:

  input - fully connected layer - ReLU - fully connected layer - softmax

  The outputs of the second fully-connected layer are the scores for each class.
  """

  def __init__(self, input_size, hidden_size, output_size, std=1e-4):
    """
    Initialize the model. Weights are initialized to small random values and
    biases are initialized to zero. Weights and biases are stored in the
    variable self.params, which is a dictionary with the following keys:

    W1: First layer weights; has shape (D, H)
    b1: First layer biases; has shape (H,)
    W2: Second layer weights; has shape (H, C)
    b2: Second layer biases; has shape (C,)

    Inputs:
    - input_size: The dimension D of the input data.
    - hidden_size: The number of neurons H in the hidden layer.
    - output_size: The number of classes C.
    """
    self.params = {}
    self.params['W1'] = std * np.random.randn(input_size, hidden_size)
    self.params['b1'] = np.zeros(hidden_size)
    self.params['W2'] = std * np.random.randn(hidden_size, output_size)
    self.params['b2'] = np.zeros(output_size)

  def loss(self, X, y=None, reg=0.0):
    """
    Compute the loss and gradients for a two layer fully connected neural
    network.

    Inputs:
    - X: Input data of shape (N, D). Each X[i] is a training sample.
    - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
      an integer in the range 0 <= y[i] < C. This parameter is optional; if it
      is not passed then we only return scores, and if it is passed then we
      instead return the loss and gradients.
    - reg: Regularization strength.

    Returns:
    If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
    the score for class c on input X[i].

    If y is not None, instead return a tuple of:
    - loss: Loss (data loss and regularization loss) for this batch of training
      samples.
    - grads: Dictionary mapping parameter names to gradients of those parameters
      with respect to the loss function; has the same keys as self.params.
    """
    # Unpack variables from the params dictionary
    W1, b1 = self.params['W1'], self.params['b1']
    W2, b2 = self.params['W2'], self.params['b2']
    N, D = X.shape

    # Compute the forward pass
    layer1 = np.dot(X, W1) + b1
    reluLayer = np.maximum(0, layer1)
    scores = np.dot(reluLayer, W2) + b2
    #############################################################################

    # TODO: Perform the forward pass, computing the class scores for the input. #
    # Store the result in the scores variable, which should be an array of      #
    # shape (N, C).                                                             #
    #############################################################################
    pass
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

    # If the targets are not given then jump out, we're done
    if y is None:
      return scores

    # Compute the loss
    scores = scores - np.max(scores, axis=1).reshape(-1,1)
    softmaxFucntion = np.exp(scores)/np.sum(np.exp(scores), axis=1).reshape(-1,1)
    loss = np.sum(-np.log(softmaxFucntion[range(N), list(y)]))
    loss /= N
    loss += 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)


    #############################################################################
    # TODO: Finish the forward pass, and compute the loss. This should include  #
    # both the data loss and L2 regularization for W1 and W2. Store the result  #
    # in the variable loss, which should be a scalar. Use the Softmax           #
    # classifier loss.                                                          #
    #############################################################################
    pass
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

    # Backward pass: compute gradients
    grads = {}

    dscores = softmaxFucntion.copy()
    dscores[range(N), list(y)] -= 1
    dscores /= N
    dW2 = np.dot(reluLayer.T, dscores)
    db2 = np.sum(dscores, axis=0)

    drelu = np.dot(dscores, W2.T)
    drelu[reluLayer <= 0] = 0

    dW1 = np.dot(X.T, drelu)
    db1 = np.sum(drelu, axis=0)

    dW2 += reg*W2
    dW1 += reg*W1

    grads['W1'] = dW1
    grads['b1'] = db1
    grads['W2'] = dW2
    grads['b2'] = db2

    #############################################################################
    # TODO: Compute the backward pass, computing the derivatives of the weights #
    # and biases. Store the results in the grads dictionary. For example,       #
    # grads['W1'] should store the gradient on W1, and be a matrix of same size #
    #############################################################################
    pass
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

    return loss, grads

  def train(self, X, y, X_val, y_val,
            learning_rate=1e-3, learning_rate_decay=0.95,
            reg=5e-6, num_iters=100,
            batch_size=200, verbose=False):
    """
    Train this neural network using stochastic gradient descent.

    Inputs:
    - X: A numpy array of shape (N, D) giving training data.
    - y: A numpy array f shape (N,) giving training labels; y[i] = c means that
      X[i] has label c, where 0 <= c < C.
    - X_val: A numpy array of shape (N_val, D) giving validation data.
    - y_val: A numpy array of shape (N_val,) giving validation labels.
    - learning_rate: Scalar giving learning rate for optimization.
    - learning_rate_decay: Scalar giving factor used to decay the learning rate
      after each epoch.
    - reg: Scalar giving regularization strength.
    - num_iters: Number of steps to take when optimizing.
    - batch_size: Number of training examples to use per step.
    - verbose: boolean; if true print progress during optimization.
    """
    num_train = X.shape[0]
    iterations_per_epoch = max(num_train / batch_size, 1)

    # Use SGD to optimize the parameters in self.model
    loss_history = []
    train_acc_history = []
    val_acc_history = []

    for it in xrange(num_iters):
      idx_batch = np.random.choice(num_train, batch_size, replace=True)
      X_batch = X[idx_batch]
      y_batch = y[idx_batch]



      #########################################################################
      # TODO: Create a random minibatch of training data and labels, storing  #
      # them in X_batch and y_batch respectively.                             #
      #########################################################################
      pass
      #########################################################################
      #                             END OF YOUR CODE                          #
      #########################################################################

      # Compute loss and gradients using the current minibatch
      loss, grads = self.loss(X_batch, y=y_batch, reg=reg)
      loss_history.append(loss)
      self.params['W1'] -= learning_rate * grads['W1']
      self.params['b1'] -= learning_rate * grads['b1']
      self.params['W2'] -= learning_rate * grads['W2']
      self.params['b2'] -= learning_rate * grads['b2']


      #########################################################################
      # TODO: Use the gradients in the grads dictionary to update the         #
      # parameters of the network (stored in the dictionary self.params)      #
      # using stochastic gradient descent. You'll need to use the gradients   #
      # stored in the grads dictionary defined above.                         #
      #########################################################################
      pass
      #########################################################################
      #                             END OF YOUR CODE                          #
      #########################################################################

      if verbose and it % 100 == 0:
        print('iteration %d / %d: loss %f' % (it, num_iters, loss))

      # Every epoch, check train and val accuracy and decay learning rate.
      if it % iterations_per_epoch == 0:
        # Check accuracy
        train_acc = (self.predict(X_batch) == y_batch).mean()
        val_acc = (self.predict(X_val) == y_val).mean()
        train_acc_history.append(train_acc)
        val_acc_history.append(val_acc)

        # Decay learning rate
        learning_rate *= learning_rate_decay

    return {
      'loss_history': loss_history,
      'train_acc_history': train_acc_history,
      'val_acc_history': val_acc_history,
    }

  def predict(self, X):
    """
    Use the trained weights of this two-layer network to predict labels for
    data points. For each data point we predict scores for each of the C
    classes, and assign each data point to the class with the highest score.

    Inputs:
    - X: A numpy array of shape (N, D) giving N D-dimensional data points to
      classify.

    Returns:
    - y_pred: A numpy array of shape (N,) giving predicted labels for each of
      the elements of X. For all i, y_pred[i] = c means that X[i] is predicted
      to have class c, where 0 <= c < C.
    """
    y_pred = None

    W1 = self.params['W1']
    b1 = self.params['b1']
    W2 = self.params['W2']
    b2 = self.params['b2']
    layer1 = np.dot(X, W1)+b1
    reluLayer = np.maximum(0, layer1)
    scores = np.dot(reluLayer, W2) + b2
    y_pred = np.argmax(scores, axis=1)
    ###########################################################################
    # TODO: Implement this function; it should be VERY simple!                #
    ###########################################################################
    pass
    ###########################################################################
    #                              END OF YOUR CODE                           #
    ###########################################################################

    return y_pred

4.运行作业中的代码
4.1得分函数验证
下面我们继续运行作业中的代码，在编辑完得分函数后，让我们来看一下得分函数的结果是否正确，利用提前计算好的结果进行对比查看差异，如果差异小于1e-7那么认为我们的得分函数没有问题，

scores = net.loss(X)
print('Your scores:')
print(scores)
print()
print('correct scores:')
correct_scores = np.asarray([
  [-0.81233741, -1.27654624, -0.70335995],
  [-0.17129677, -1.18803311, -0.47310444],
  [-0.51590475, -1.01354314, -0.8504215 ],
  [-0.15419291, -0.48629638, -0.52901952],
  [-0.00618733, -0.12435261, -0.15226949]])
print(correct_scores)
print()

# The difference should be very small. We get < 1e-7
print('Difference between your scores and correct scores:')
print(np.sum(np.abs(scores - correct_scores)))

Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.6802720496109664e-08

可以看到差异小于给出的1e-7的标准，所以我们认为得分函数是正确的。下面验证损失的函数是否正确。
4.2损失函数验证
这里需要说明的是我们将reg改为了0.1，来控制误差小于1e-12，

loss, _ = net.loss(X, y, reg=0.1)
correct_loss = 1.30378789133

# should be very small, we get < 1e-12
print('Difference between your loss and correct loss:')

print(np.sum(np.abs(loss - correct_loss)))

Difference between your loss and correct loss:
1.7985612998927536e-13

4.3梯度验证

from cs231n.gradient_check import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.05)

# these should all be less than 1e-8 or so
for param_name in grads:
    f = lambda W: net.loss(X, y, reg=0.05)[0]
    param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
    print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))

W1 max relative error: 3.561318e-09
b1 max relative error: 1.555470e-09
W2 max relative error: 3.440708e-09
b2 max relative error: 3.865091e-11

同样当梯度的误差小于1e-8也认为梯度函数的编写是没有问题的。
4.4测试数据损失可视化
好了既然得分、损失、梯度都没有问题，那么现在开始训练模型更新参数，我们可以观察一下其损失函数的变化，

net = init_toy_model()
stats = net.train(X, y, X, y,
            learning_rate=1e-1, reg=5e-6,
            num_iters=100, verbose=False)

print('Final training loss: ', stats['loss_history'][-1])

# plot the loss history
plt.plot(stats['loss_history'])
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()

在这里插入图片描述
从图上可以看到随着迭代次数的增加，损失的值飞速下降，在15次跌倒后稳定在0的上面。所以整个训练过程展现出来的结果是可以接受的，那么对小样本的测试数据弄完了，该来运行正式CIFAR-10图像数据了。
4.5CIFAR-10数据的加载与训练

from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)

我们这里仍然把数据华为训练集、验证集和测试集，通过验证集来对训练集训练出的模型进行检验，

input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=1000, batch_size=200,
            learning_rate=1e-4, learning_rate_decay=0.95,
            reg=0.25, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print('Validation accuracy: ', val_acc)

iteration 0 / 1000: loss 2.302777
iteration 100 / 1000: loss 2.302281
iteration 200 / 1000: loss 2.296825
iteration 300 / 1000: loss 2.256667
iteration 400 / 1000: loss 2.229428
iteration 500 / 1000: loss 2.149038
iteration 600 / 1000: loss 2.078593
iteration 700 / 1000: loss 2.052749
iteration 800 / 1000: loss 1.976308
iteration 900 / 1000: loss 2.036181
Validation accuracy:  0.286

可以看到整体的准确度旨在0.286，结果不是很理想，那有的朋友会说还不如之前单独用softmax的结果好呢，那我们大费周章的使用神经网络真的结果还不如单独用softmax结果好嘛？其实不然，
4.6对训练集调试
我们对于这样不好的结果常常会使用绘制loss或者准确率曲线去观察整个的一个更新过程，或者绘制W1直观的从图像中看一下权重训练的样子，来判断一下模型的各方面参数的问题，

# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()

在这里插入图片描述
我们可以从图上看到损失函数到第200次迭代之前都还没有开始明显的变化，这根我们之前看到的损失函数形式不一样，这个导致的原因有可能是学习率过小导致迭代速度过慢。再看准确度函数就在0.29周围就开始平缓，可以适当增加隐藏层的神经元个数也就是维度来充分利用信息，这一点有一点像CNN中的增加卷积核的操作。
然后我们再来通过可视化W1看看训练出来的权重是什么样子，

from cs231n.vis_utils import visualize_grid

# Visualize the weights of the network

def show_net_weights(net):
    W1 = net.params['W1']
    W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
    plt.gca().axis('off')
    plt.show()

show_net_weights(net)

在这里插入图片描述
从W1中可以看到整个模板很模糊，甚至很相似，效果并不好。因此在神经网络中我们常常会花费许多时间去调节参数，从众多参数中找到验证集效果最好的参数进行预测，这个过程也叫参数的调整。
4.7任务：交叉验证参数的调整
这时候我们需要对不同的参数进行训练并进行验证，选择出验证效果最好的参数，这里的代码需要我们自己来编写，我们首先需要给出训练参数的选择，这里参数就是将默认的参数进行提取。然后重置网络格式，需要设置多个隐藏层的这样一个维度选择，于是代码如下，

best_net = None

learning_rate = [1e-4, 4e-4, 8e-4, 16e-4, 32e-4]
learning_rate_decay = 0.9
regList = [0.25, 0.5, 0.75, 1.0]
num_iters = 4000
batch_size = 200

input_size = 32 * 32 * 3
hidden_size = [50, 100, 150]
num_classes = 10

best_net = None
best_lr = None
best_reg = None
best_hidden_size = None
best_val = -1
results = {}
for i in range(len(hidden_size)):
    for lr in learning_rate:
        for reg in regList:
            net = TwoLayerNet(input_size, hidden_size[i], num_classes)
            stat = net.train(X_train, y_train, X_val, y_val,
                            learning_rate = lr, learning_rate_decay=learning_rate_decay,
                            reg = reg, num_iters = num_iters,
                            batch_size=batch_size)
            train_acc = stat['train_acc_history'][-1]
            val_acc = stat['val_acc_history'][-1]
            if val_acc > best_val:
                best_net = net
                best_lr = lr
                best_reg = reg
                best_hidden_size = hidden_size[i]
                best_val = val_acc
            results[(lr, reg)] = train_acc, val_acc
            print('hidden_size:%d, lr %e reg %e train accuracy: %f val accuracy: %f' % (
                hidden_size[i], lr, reg, results[(lr, reg)][0], results[(lr, reg)][1]))
    
print('best_hidden_size:%d, best_lr %e best_reg %e train accuracy: %f val accuracy: %f' % (
         hidden_size[i], best_lr, best_reg, results[(best_lr, best_reg)][0], results[(best_lr, best_reg)][1]))

其实整体跟之前的很像，除了增加了一个对隐藏层的循环，下面展示一部分输出结果，

hidden_size:50, lr 1.000000e-04 reg 2.500000e-01 train accuracy: 0.385000 val accuracy: 0.367000
hidden_size:50, lr 1.000000e-04 reg 5.000000e-01 train accuracy: 0.390000 val accuracy: 0.373000
hidden_size:50, lr 1.000000e-04 reg 7.500000e-01 train accuracy: 0.365000 val accuracy: 0.368000
hidden_size:100, lr 1.600000e-03 reg 5.000000e-01 train accuracy: 0.710000 val accuracy: 0.526000
hidden_size:100, lr 1.600000e-03 reg 7.500000e-01 train accuracy: 0.570000 val accuracy: 0.532000
hidden_size:100, lr 1.600000e-03 reg 1.000000e+00 train accuracy: 0.640000 val accuracy: 0.513000
hidden_size:150, lr 3.200000e-03 reg 2.500000e-01 train accuracy: 0.770000 val accuracy: 0.535000
hidden_size:150, lr 3.200000e-03 reg 5.000000e-01 train accuracy: 0.635000 val accuracy: 0.528000
hidden_size:150, lr 3.200000e-03 reg 7.500000e-01 train accuracy: 0.705000 val accuracy: 0.534000
best_hidden_size:150, best_lr 3.200000e-03 best_reg 2.500000e-01 train accuracy: 0.770000 val accuracy: 0.535000

从最优的结果我们可以看到选择了150个隐藏层维度，学习率为3.2e-3，还有reg=0.25，这里也验证了前面说的需要更多的隐藏层的维度和更大的学习率来提高精度，加快减小损失。
让我们来看一下最佳的网络权重是什么样子，

# visualize the weights of the best network
show_net_weights(best_net)

在这里插入图片描述 4.8运行测试集
最后折腾了这么久让我们来运行一下测试集吧，检验的时候到了，文档给出要求需要将准确度提高到48%以上，让我们来运行一下代码吧，

test_acc = (best_net.predict(X_test) == y_test).mean()
print('Test accuracy: ', test_acc)

Test accuracy:  0.546

很不错！我们这里的精度达到了54.6%。

结语
那么我们就到此对于浅层神经网络的研究就结束了，我们大致了解了神经网络的一个运算过程，并体验了调节参数的快乐。但对于部分地方掌握还是有问题，比如矩阵对于向量的求导，调整参数的更优范围，正则化的详细使用，这些还或多或少的存在问题，以及为什么用ReLU的具体细节。这都需要后期不断你的学习去更新自己的认知，希望尽快完善。
谢谢阅读。

明曦君

发布了30 篇原创文章 · 获赞 150 · 访问量 6151

私信关注