[] Shallow zero-based neural network analysis [AI] zero-based analytical neurons (including the example code)

review:

[AI] zero-based analytical neurons (including the example code)

 

I. Preamble

  Two days before the analytical writing about single neuron, make persistent efforts continue to resolve the shallow neural network here. Shallow neural network that is "less level" neural networks, though a small level but its performance relative to a single neuron powerful than that.

  Note: This article is mainly tutorial series on the "long bed" to summarize, highly recommended "long bed" Artificial Intelligence series of tutorials (https://www.captainbed.net/)

Second, the neural network composed of shallow

  Recalling the previous constitute a single neuron, we know that neurons contain four key functions:

  1) propagation function, the input x, the offset w, calculated a threshold value b

  Probability 2) the activation function, mapped to a result of y between 0 and 1, it is understood (yes, no)

  3) Reverse propagation function is calculated by y, the answer label dw, db (w to update and b)

  4) loss function, calculating an error between the label and the y

 

   Intuitively we know that the neural network is of a shallow number of neurons, for a simple two-layer neural network structure as shown below:

 

  It contains the input layer (i.e. X-), hidden layer and output layer, wherein the input layer neurons is not, so that it is a two-layer neural network.

  We are not in the actual implementation of the results one by one for each neuron, and finally the results output. We are all a calculated result of a layer of neurons, then the results of each layer as calculated under the input layer. Further propagation function of the first layer is not separated from the second layer, the propagation function, the neural network comprises a propagation function of the propagation calculations all layers, also the inverse propagation function comprises the calculation of all the reverse layer, the single neuron from the neural network code actually changed little.

  Here we direct the code to parse, if you understand the analytical front of the single neuron, and that this is very well understood.

  (With the end the full mode Codes)

Third, preparation

  1) issues to be addressed

  Before we issue a single neuron is difficult to deal with "a recognition from the picture number 9", even if the use of a single neuron is also correct rate of 93%, so here we want to increase the problem. The problem to "surprising recognition from the picture number", requires only minor modifications on the code, the code will now be

  The label data # 9 is not fully converted to 0, the turn 9. 1
  train_label = np.where (train_label == 9,1,0)
  test_label = np.where (test_label == 9,1,0)

  change into:

  # To find the odd picture in the label set to 1,02468 13579 is set to 0
  train_label = np.where ((train_label% 2)! = 0,1,0)
  test_label = np.where ((test_label% 2)! = 0,1,0)

  The results after single neuron is performed using the previous code shown below, can be seen that a substantial decline in prediction.

  2) the network configuration to be used

  It must first clear layers neural network, here we are a simple network, so only two layers. Followed by the number of neurons in each layer, where a first layer in our network provided with four neurons of the second layer is an output layer so as long as a neuron. The input layer as before, is 784 input (a 28x28 picture of all the pixels), the network structure as shown below:

Fourth, random initialization parameters

# Initialization parameters w and B
DEF initialize_parameters (input_num, hide_num, out_num):
  #input_num input layer neuron number
  #hide_num number of neurons in the hidden layer,
  the number of output layer neurons #out_num
  np.random.seed (2)
  # Random initializing a first layer parameters W, B
  W1 of np.random.rand = (hide_num, input_num) 0.01 *
  B1 = np.zeros (Shape = (hide_num,. 1))

  # Randomly initialized second layer parameters W, B
  W2 of np.random.randn = (out_num, hide_num) * 0.01
  B2 = np.zeros (Shape = (out_num,. 1))

  return W1 of, B1, W2 of, B2

   Shallow single neuron neural network and the initialization parameters w, b are the following differences:

  1) the use of different neurons and w b each layer, so there are w1, w2, b1, b2, if there are three networks the corresponding w3, b3.

  2) the shape data of different neuronal layers w, b is not the same. For example, we have the input image 784 pixels, and the second layer has four neurons, w is the shape of the first layer is (4,784), b is the shape of the (4,1). The second layer neuron has only one input from the first layer only four parameters, w is the shape of the second layer (1,4), b is the shape of the (1,1).

  3) w1 and w2 are randomly generated prior single neuron structure, w is the initial value 0, if used herein all 0, then, the final result may be as a single neuron, but also to ensure that the initial value is multiplied by 0.01 sufficiently small .

Fifth, the spread function

# Forward propagation function
def forward (img, W1, b1 , W2, b2):

  The first layer #
  A1 = np.dot (W1 of, IMG) + B1
  Yl = np.tanh (A1) # of the first and second layers using different activation function

  The second layer #
  A2 = np.dot (W2 of, Yl) + B2
  Y2 = Sigmoid (A2)
  return Yl, Y2

  Propagation function with single neuron-like structure, but be noted that, where the first layer is used for the activation function tanh, the second layer is still used sigmoid. Their difference is that, sigmoid output is mapped to 0 to 1 and the output is mapped to a tanh -1 to 1. After the specific reasons for the difference and say (because I did not understand it), here just know there are differences on the line.

 

 Six back-propagation function

# Inverse propagation function
DEF Backward (IMG, label, W1 of, B1, W2 of, B2, Yl, Y2):
  m = img.shape [. 1]
  # of the second layer
  Dz2 = Y2 - label
  DW2 np.dot = (Dz2, Y1.T) / m
  DB2 np.sum = (Dz2, Axis =. 1, keepdims = True) / m
  # of the first layer
  dZ1 = np.multiply (np.dot (W2.T, dZ2), 1-np.power (Yl, 2))
  DW1 = np.dot (DZ1, img.T) / m
  DB1 = np.sum (DZ1, Axis =. 1, keepdims = True) / m

  return dW1,db1,dW2,db2

  Opposite to the direction of propagation function here is to calculate the back-propagation of the second layer backpropagation recalculation of the first layer. Note that, since the first layer can be activated using a tanh function, so the reverse calculation formula and the formula of the first layer is different, because different activation function using the reverse of the derivative are not the same.

  The other two parameters in the axis np.sum = 1, keepdims = True db1 to ensure that the data is in the form of (1,4), wherein the axis = 1 means that row sum, keepdims = True mean retention matrix shape, np.sum schematically different parameters calculated as follows:

  np.sum(dZ1)/m                 0.0016664927232987162

  np.sum(dZ1, axis=1)/m            [0.00026393,0.00077922,0.0003274 ,0.00029595]

  np.sum(dZ1, axis=1, keepdims=True)/m   

                        [[0.00026393]
                        [0.00077922]
                        [0.0003274 ]
                        [0.00029595]]

七、梯度下降

#梯度下降 更新w、b参数
def update(W1,b1,W2,b2, dW1,db1,dW2,db2, learning_rate=1.2):
  W1 = W1 - learning_rate*dW1
  b1 = b1 - learning_rate*db1
  W2 = W2 - learning_rate*dW2
  b2 = b2 - learning_rate*db2

  return W1,b1,W2,b2

  梯度下降与单神经元的情况差不多。

八、损失函数

#损失函数
def costCal(Y2, label):
  m = label.shape[1]
  logprobs = np.multiply(np.log(Y2), label) + np.multiply((1-label), np.log(1-Y2))
  cost = -np.sum(logprobs)/m
  return cost

  损失函数与单神经元的情况也差不多,需要注意的是np.multiply就是将两个矩阵做对应元素的乘。

九、预测函数

#预测函数
def predict(W1,b1,W2,b2, img):
  Y1,Y2 = forward(img, W1,b1,W2,b2)
  predictions = np.round(Y2)#对结果四舍五入
  return predictions

  与单神经元的情况类似,预测函数其实就是做一次“向前传播”。

十、训练模型并预测

#训练模型
def model(img, label, hide_num, num_iterations = 1000, learning_rate=0.1, print_cost = False):
  np.random.seed(3)
  input_num = img.shape[0]
  out_num = label.shape[0]

  #初始化参数
  W1,b1,W2,b2 = initialize_parameters(input_num,hide_num,out_num)
  #循环若干次完成训练
  for i in range(0, num_iterations):
    #向前传播
    Y1,Y2 = forward(img, W1,b1,W2,b2)
    #计算本次成本
    cost = costCal(Y2, label)
    #反向传播,得到梯度
    dW1,db1,dW2,db2 = backward(img, label, W1,b1,W2,b2, Y1,Y2)
    #参数优化
    W1,b1,W2,b2 = update(W1,b1,W2,b2, dW1,db1,dW2,db2, learning_rate)
    # 将本次训练的成本打印出来
    if print_cost and i % 100 == 0:
    print ("在训练%i次后,成本是: %f" % (i, cost))

  return W1,b1,W2,b2

#调用训练模型

W1,b1,W2,b2 = model(train_img, train_label, 4, num_iterations=2000, learning_rate=1, print_cost=True)

#调用预测函数

predictions = predict(W1,b1,W2,b2, test_img)
print ('预测准确率是: %d' % float((np.dot(test_label, predictions.T) + np.dot(1 - test_label, 1 - predictions.T)) / float(test_label.size) * 100) + '%')

  需要注意的是这里的learning_rate=1,而单神经元时的learning_rate为0.005。

十一、总结回顾

  通过实现一个简单的二层神经网络我们发现,其实代码并没有修改很多,整体的结构也变化不大,其中最主要的变化在于第一层使用的激活函数变为tanh,由此导致反向传播的计算也有了较大的变化。

  运行后我们可以发现,预测的准确度较单神经元有了较大幅度的提升:

在训练0次后,成本是: 0.693817
在训练100次后,成本是: 0.251725
在训练200次后,成本是: 0.176756
在训练300次后,成本是: 0.110538
在训练400次后,成本是: 0.372297
在训练500次后,成本是: 0.128188
在训练600次后,成本是: 0.091792
在训练700次后,成本是: 0.075769
在训练800次后,成本是: 0.064764
在训练900次后,成本是: 0.055826
在训练1000次后,成本是: 0.132452
在训练1100次后,成本是: 0.102556
在训练1200次后,成本是: 0.131425
在训练1300次后,成本是: 0.086445
在训练1400次后,成本是: 0.178343
在训练1500次后,成本是: 0.077496
在训练1600次后,成本是: 0.093846
在训练1700次后,成本是: 0.071567
在训练1800次后,成本是: 0.070109
在训练1900次后,成本是: 0.060202
预测准确率是: 94%

  关注公众号“零基础爱学习”回复"AI5"可获得完整代码。后面我们还会继续更新“如何构建深度神经网络”,以及对目前还未明晰的问题解析。

Guess you like

Origin www.cnblogs.com/cation/p/11567460.html