Neural Network—Learn from shallow to deep and build your own neural network, full of useful information.
Table of contents
2. Construction of artificial neural network
Preface
Recently, I spent about a week reading the book "Make Your Own Neural Network" carefully, and also browsed its corresponding Chinese version "Python Neural Network Programming". There are a few errors in the book. You can check them out if you read them carefully. These errors will not prevent you from learning neural networks. I have to say that this book is indeed written in detail and is very suitable for beginners to get started with Neural Network (NN for short). I have also learned neural networks in machine learning classes before, but I am far from having a deep understanding after reading this book. I really gained a lot after reading it. Here I will carefully summarize the core knowledge learned to help everyone better understand neural networks.
Tip: The following is the text of this article. It is not easy to write an article. I hope it can help you. Please attach a link for reprinting.
1. Neurons in nature
Let’s first observe the basic unit in the biological brain – the neuron, as shown in Figure 1.
Neurons transmit electrical signals along their axons from dendrites to terminals, and from one neuron to another. Here are a few things we need to know about neurons:
First, we cannot simply think that the output and input of a neuron are a linear function;
Second, neurons may not necessarily respond immediately to every electrical signal input. Only when the input exceeds the threshold and is enough to connect the circuit will an output be generated;
Third, a neuron can have multiple input signals and multiple output signals, as shown in Figure 2;
Fourth, the dendrite collects all the incoming electrical signals and combines them into a stronger electrical signal. If the signal is strong enough to exceed the threshold, then the neuron will produce an output, along the axon, to the terminal, and is passed to The dendrites of the next neuron.
2. Construction of artificial neural network
We can build a neural network as shown in Figure 3 by simulating biological neurons.
Layer 1 is called the input layer, layer 2 is called the hidden layer, and layer 3 is called the output layer. Each layer has 3 nodes, and each node is equivalent to a neuron. The lowercase w is the weight symbol. Weight indicates that the signals of node i in the previous layer and node j in the next layer are related. This weight will reduce or amplify the transmitted signal. Here, you may be wondering, why connect each neuron to all neurons in the previous and next layers? There are two explanations:
First, all connections facilitate computer programming and instruction calculation;
Second, if there are redundant connections, the neural network will weaken these unnecessary connections after learning, that is, the learned weights will approach 0.
Regarding the process of simulating the threshold response of biological neurons, it seems that we can simply use a step function to simulate it, as shown in Figure 4.
In fact, this function is not smooth. We can use the S-shaped function (sigmoid function) shown in Figure 5 instead. This function looks much smoother than the step function and is closer to reality. There are few cold and sharp edges in nature.
Sigmoid function, sometimes also called logical function, has a functional relationship of . Observing the image, we can see that this function limits the input of to the output between .
We just add them together to get the final sum, use it as the input of the S function, and then output the result. This actually reflects the working mechanism of neurons. Figure 6 illustrates this idea of combining inputs and then applying a threshold to the final input sum.
If the combined signal is not strong enough, the effect of the S-threshold function is to suppress the output signal. If the sum x is large enough, the effect of the S-function is to fire the neuron. Interestingly, if only one of the inputs is large enough and the other inputs are small, then this is enough to fire the neuron. More importantly, if some of the inputs are individually large, but not very large, so that the combination of signals is large enough to exceed the threshold, then the neuron can also fire.
3. Forward propagation signal
Figure 7 shows an example of a neural network with 3 layers and 3 nodes in each layer (some weight values are not marked).
Enter matrix I The matrix composed of input signal ( Note that it is a column instead of a line ), that is:
The weight matrix between the input layer and the hidden layer is ( Pay attention to its dimensions. The number of rows is the number of neurons in the hidden layer, and the number of columns is the number of neurons in the input layer. Number):
The weight matrix between the hidden layer and the output layer is ( Pay attention to its dimensions. The number of rows is the number of neurons in the output layer, and the number of columns is the number of neurons in the hidden layer. Number):
The input value of the hidden layer is calculated as:
The output of the hidden layer is calculated as:
The input to the calculated output layer is:
The output of the output layer is calculated as:
The signals in the entire calculation are shown in Figure 8.
4. Back propagation error
A simple network with two input nodes and two output nodes is shown in Figure 9.
We use the proportion of weights to determine the error in adjusting the weights. For example, in the picture above, we use part of to update
Update using part of
A network with two input nodes and two hidden nodes and two output nodes is shown in Figure 10.
We use the sum of the segmentation errors linked on and to represent the error of the first node of the hidden layer. That is:
In the same way, the error of the second node in the hidden layer is:
The form written in matrix is as shown in the figure:
Note that the denominator of these fractions is a normalization factor. Ignoring these factors simplifies to:
Therefore, use matrix methods to propagate errors backward:
Note that there is a transposition here. The weight matrix needs to be transposed when calculating the backpropagation error.
Note that only the hidden layer and output layer of the 3-layer neural network have errors, because the input layer only represents the input of data and has no substantial calculation.
5. Update weights using gradient descent method
Before derivation, let me explain that i represents the subscript of the input layer node, j represents the subscript of the hidden layer node, and k represents the subscript of the output layer node.
First, let us expand the error function, which is the sum of the squares of the differences between the target value and the actual value, which is the sum over all n output nodes.
Weight The nodes linking are only related to , The above formula is simplified to:
Using the chain rule, we get:
ExpandReceive:
we know:
F:
Note that the sigmoid function also requires a derivative inside the brackets. The part inside the brackets of sigmoid is a composite function.
The updated weight is obtained by inverting the error slope just obtained and adjusting the old weight. If the slope is positive, we want to decrease the weight, if the slope is negative, we want to increase the weight, so we want to invert the slope.
The final result is:
is the learning speed . Note that there is also a transposition here. The derivation of the slope has a negative sign, and the inversion of the slope has a negative sign.
Of course, in the same way, we can get the weight update from the input layer to the hidden layer:
6. Prepare data
1.Input
OUTOUT If it is too flat, the gradient will be too small, which is not conducive to weight update. Do not enter 0 because computers lose precision when dealing with very small or very large numbers. Therefore, the range of the input signal is generally limited to [0.01, 1.00], so the input signal must be preprocessed before training the neural network.
2.Output
out out of sigmoid function, we will find that the output range of the sigmoid function is (0, 1), it can not get 0 or 1. Therefore, before we train the neural network, we need to preprocess the label signal so that its range falls in the interval [0.01, 0.99].
3. Random initial weights
We generally randomly select the initial weight from -1.0~+1.0. It is prohibited to set all to 0. The error will be evenly distributed during backpropagation and the weight will never be updated. We can randomly sample within the approximate range of the reciprocal square root of the number of incoming links to a node and initialize the weights. Therefore, if each node has 3 incoming links, the initial weight should range from to , which is ±0.577. We initialize the weight vector using a normal distribution with zero mean and a variance equal to the inverse square root of the number of links.
7. Programming Simulation
I conduct simulations on the mnist data set. Use Python language to build neural network simulation on Ipython.
1.Code
# python notebook for Make Your Own Neural Network
# code for a 3-layer neural network, and code for learning the MNIST dataset
import numpy
# scipy.special for the sigmoid function expit()
import scipy.special
# library for plotting arrays
import matplotlib.pyplot
# ensure the plots are inside this notebook, not an external window % matplotlib inline
# neural network class definition
class neuralNetwork:
# initialise the neural network
def __init__(self, inputnodes, hiddennodes, outputnodes,learningrate):
# set number of nodes in each input, hidden, output layer
self.inodes = inputnodes
self.hnodes = hiddennodes
self.onodes = outputnodes
# link weight matrices, wih and who
# weights inside the arrays are w_i_j, where link is from node i to node j in the next layer
self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
# learning rate
self.lr = learningrate
# activation function is the sigmoid function
self.activation_function = lambda x: scipy.special.expit(x)
pass
# train the neural network
def train(self, inputs_list, targets_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
targets = numpy.array(targets_list, ndmin=2).T
# calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
# calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
# output layer error is the (target - actual)
output_errors = targets - final_outputs
# hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
# update the weights for the links between the hidden and output layers
self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))
# update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
pass
# query the neural network
def query(self, inputs_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
# calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
# calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
return final_outputs
# number of input, hidden and output nodes
input_nodes = 784
hidden_nodes = 200
output_nodes = 10
# learning rate is 0.1
learning_rate = 0.1
# create instance of neural network
n = neuralNetwork(input_nodes,hidden_nodes,output_nodes,learning_rate)
# load the mnist training data CSV file into a list
training_data_file = open("mnist_train.csv", 'r')
training_data_list = training_data_file.readlines()
training_data_file.close()
# epochs is the number of times the training data set is used for training
epochs = 5
for e in range(epochs):
# go through all records in the training data set
for record in training_data_list:
# split the record by the ',' commas
all_values = record.split(',')
# scale and shift the inputs
inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
# create the target output values (all 0.01, except the desired label which is 0.99)
targets = numpy.zeros(output_nodes) + 0.01
# all_values[0] is the target label for this record
targets[int(all_values[0])] = 0.99
n.train(inputs, targets)
pass
pass
test_data_file = open("mnist_test.csv", 'r')
test_data_list = test_data_file.readlines()
test_data_file.close()
# test the neural network
# scorecard for how well the network performs, initially empty
scorecard = []
# go through all the records in the test data set
for record in test_data_list:
# split the record by the ',' commas
all_values = record.split(',')
# correct answer is first value
correct_label = int(all_values[0])
#print(correct_label, "correct label")
# scale and shift the inputs
inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
# query the network
outputs = n.query(inputs)
# the index of the highest value corresponds to the label
label = numpy.argmax(outputs)
#print(label, "network's answer")
# append correct or incorrect to list
if (label == correct_label):
# network's answer matches correct answer, add 1 to scorecard
scorecard.append(1)
else:
# network's answer doesn't match correct answer, add 0 to scorecard
scorecard.append(0)
pass
pass
# calculate the performance score, the fraction of correct answers
scorecard_array = numpy.asarray(scorecard)
print ("performance = ", scorecard_array.sum() / scorecard_array.size)
2. Simulation results
60,000 samples from mnist were used to train the neural network, and 10,000 samples were used to test the trained neural network. The recognition accuracy was 97.39%.
Summarize
The above is all the content I want to share this time. This article introduces the basic principles and training simulation of neural networks in detail. Hope it's helpful to all of you.