Neural Network—Learn from the shallower to the deeper and build your own neural network (full of useful information)

Neural Network—Learn from shallow to deep and build your own neural network, full of useful information.


Preface

        Recently, I spent about a week reading the book "Make Your Own Neural Network" carefully, and also browsed its corresponding Chinese version "Python Neural Network Programming". There are a few errors in the book. You can check them out if you read them carefully. These errors will not prevent you from learning neural networks. I have to say that this book is indeed written in detail and is very suitable for beginners to get started with Neural Network (NN for short). I have also learned neural networks in machine learning classes before, but I am far from having a deep understanding after reading this book. I really gained a lot after reading it. Here I will carefully summarize the core knowledge learned to help everyone better understand neural networks.


Tip: The following is the text of this article. It is not easy to write an article. I hope it can help you. Please attach a link for reprinting.

1. Neurons in nature

        Let’s first observe the basic unit in the biological brain – the neuron, as shown in Figure 1.

Figure 1 Neurons in the biological brain

        Neurons transmit electrical signals along their axons from dendrites to terminals, and from one neuron to another. Here are a few things we need to know about neurons:

        First, we cannot simply think that the output and input of a neuron are a linear function;

        Second, neurons may not necessarily respond immediately to every electrical signal input. Only when the input exceeds the threshold and is enough to connect the circuit will an output be generated;

        Third, a neuron can have multiple input signals and multiple output signals, as shown in Figure 2;

        Fourth, the dendrite collects all the incoming electrical signals and combines them into a stronger electrical signal. If the signal is strong enough to exceed the threshold, then the neuron will produce an output, along the axon, to the terminal, and is passed to The dendrites of the next neuron.

Figure 2 Signal transmission between multiple neurons

2. Construction of artificial neural network

        We can build a neural network as shown in Figure 3 by simulating biological neurons.

Figure 3 3-layer 3-node neural network

        Layer 1 is called the input layer, layer 2 is called the hidden layer, and layer 3 is called the output layer. Each layer has 3 nodes, and each node is equivalent to a neuron. The lowercase w is the weight symbol. Weightw_{i,j} indicates that the signals of node i in the previous layer and node j in the next layer are related. This weight will reduce or amplify the transmitted signal. Here, you may be wondering, why connect each neuron to all neurons in the previous and next layers? There are two explanations:

        First, all connections facilitate computer programming and instruction calculation;

        Second, if there are redundant connections, the neural network will weaken these unnecessary connections after learning, that is, the learned weights will approach 0.

        Regarding the process of simulating the threshold response of biological neurons, it seems that we can simply use a step function to simulate it, as shown in Figure 4.

Figure 4 Step function

        In fact, this function is not smooth. We can use the S-shaped function (sigmoid function) shown in Figure 5 instead. This function looks much smoother than the step function and is closer to reality. There are few cold and sharp edges in nature.

Figure 5 S function

        Sigmoid function, sometimes also called logical function, has a functional relationship of y=\frac{1}{1+e^{-x}}. Observing the image, we can see that this function limits the input of (-\infty ,+\infty ) to the output between (0,1).

        We just add them together to get the final sum, use it as the input of the S function, and then output the result. This actually reflects the working mechanism of neurons. Figure 6 illustrates this idea of ​​combining inputs and then applying a threshold to the final input sum.

Figure 6 Combined input

        If the combined signal is not strong enough, the effect of the S-threshold function is to suppress the output signal. If the sum x is large enough, the effect of the S-function is to fire the neuron. Interestingly, if only one of the inputs is large enough and the other inputs are small, then this is enough to fire the neuron. More importantly, if some of the inputs are individually large, but not very large, so that the combination of signals is large enough to exceed the threshold, then the neuron can also fire.

3. Forward propagation signal

        Figure 7 shows an example of a neural network with 3 layers and 3 nodes in each layer (some weight values ​​are not marked).

Figure 7 Neural network example

        Enter matrix I The matrix composed of input signal ( Note that it is a column instead of a line ), that is:

\textbf{I}=\begin{bmatrix} 0.9\\0.1 \\ 0.8 \end{bmatrix}

        The weight matrix between the input layer and the hidden layer is ( Pay attention to its dimensions. The number of rows is the number of neurons in the hidden layer, and the number of columns is the number of neurons in the input layer. Number):

\textbf{w}_{input_{-}hidden}=\begin{bmatrix} 0.9 &0.3 &0.4 \\ 0.2 &0.8 &0.2 \\ 0.1 &0.5 & 0.6 \end{bmatrix}

        The weight matrix between the hidden layer and the output layer is ( Pay attention to its dimensions. The number of rows is the number of neurons in the output layer, and the number of columns is the number of neurons in the hidden layer. Number):

\textbf{w}_{hidden_{-}output}=\begin{bmatrix} 0.3 &0.7 &0.5 \\ 0.6 &0.5 &0.2 \\ 0.8 &0.1 & 0.9 \end{bmatrix}

        The input value of the hidden layer is calculated as:

\textbf{x}_{hidden}=\textbf{w}_{input_{-}hidden}\cdot \textup{I}=\begin{bmatrix} 0.9 &0.3 &0.4 \\ 0.2 &0.8 &0.2 \\ 0.1 &0.5 & 0.6 \end{bmatrix}\cdot\begin{bmatrix} 0.9\\0.1 \\ 0.8 \end{bmatrix}=\begin{bmatrix} 1.16\\ 0.42 \\ 0.62 \end{bmatrix}

        The output of the hidden layer is calculated as:

\textbf{o}_{hidden}=sigmoid(\textbf{x}_{hidden})=sigmoid(\begin{bmatrix} 1.16\\ 0.42 \\ 0.62 \end{bmatrix})=\begin{bmatrix} 0.761 \\ 0.603 \\ 0.650 \end{bmatrix}

        The input to the calculated output layer is:

\textbf{x}_{output}=\textbf{w}_{hidden_{-}output}\cdot \textup{O}_{hidden}=\begin{bmatrix} 0.3 &0.7 &0.5 \\ 0.6 &0.5 &0.2 \\ 0.8 &0.1 & 0.9 \end{bmatrix}\cdot\begin{bmatrix} 0.761\\0.603 \\ 0.650\end{bmatrix}

\textbf{x}_{output}=\begin{bmatrix} 0.975\\ 0.888 \\ 1.254 \end{bmatrix}

        The output of the output layer is calculated as:

\textbf{o}_{output}=sigmoid(\textbf{x}_{output})=sigmoid(\begin{bmatrix} 0.975\\ 0.888 \\ 1.254 \end{bmatrix})=\begin{bmatrix} 0.726\\ 0.708 \\ 0.778 \end{bmatrix}

        The signals in the entire calculation are shown in Figure 8.

Figure 8 Calculation results

4. Back propagation error

        A simple network with two input nodes and two output nodes is shown in Figure 9.

Figure 9 Simple network

We use the proportion of weights to determine the error in adjusting the weights. For example, in the picture above, we use part of e_{1} to updatew_{1,1}

\frac{w_{1,1}}{w_{1,1}+w_{2,1}}

Update using part of e_{1}w_{2,1}

\frac{w_{2,1}}{w_{1,1}+w_{2,1}}

A network with two input nodes and two hidden nodes and two output nodes is shown in Figure 10.

Figure 10 Network with two input nodes, two hidden nodes, and two output nodes

We use the sum of the segmentation errors linked on w_{1,1} and w_{1,2} to represent the error of the first node of the hidden layer. That is:

e_{hidden,1}=e_{output,1}*\frac{w_{1,1}}{w_{1,1}+w_{2,1}}+e_{output,2}*\frac{w_{1,2}}{w_{1,2}+w_{2,2}}

In the same way, the error of the second node in the hidden layer is:

e_{hidden,2}=e_{output,1}*\frac{w_{2,1}}{w_{1,1}+w_{2,1}}+e_{output,2}*\frac{w_{2,2}}{w_{1,2}+w_{2,2}}

The form written in matrix is ​​as shown in the figure:

\textup{error}_{hidden}=\begin{bmatrix} \frac{w_{1,1}}{w_{1,1}+w_{2,1}}&\frac{w_{1,2}}{w_{1,2}+w_{2,2}} \\ \frac{w_{2,1}}{w_{1,1}+w_{2,1}}&\frac{w_{2,2}}{w_{1,2}+w_{2,2}} \end{bmatrix}\cdot \begin{bmatrix} e_{output,1}\\ e_{output,2}\end{bmatrix}

Note that the denominator of these fractions is a normalization factor. Ignoring these factors simplifies to:

\textup{error}_{hidden}=\begin{bmatrix} w_{1,1} &w_{1,2} \\ w_{2,1} & w_{2,2} \end{bmatrix}\cdot \begin{bmatrix} e_{output,1}\\ e_{output,2}\end{bmatrix}

Therefore, use matrix methods to propagate errors backward:

\textup{error}_{hidden}=\textup{w}^{T}_{hidden_{-}output}\cdot \textup{error}_{output}

Note that there is a transposition here. The weight matrix needs to be transposed when calculating the backpropagation error.

Note that only the hidden layer and output layer of the 3-layer neural network have errors, because the input layer only represents the input of data and has no substantial calculation.

5. Update weights using gradient descent method

        Before derivation, let me explain that i represents the subscript of the input layer node, j represents the subscript of the hidden layer node, and k represents the subscript of the output layer node.

        First, let us expand the error function, which is the sum of the squares of the differences between the target value and the actual value, which is the sum over all n output nodes.

        Weight w_{j,k} The nodes linking are only related to OK}, The above formula is simplified to:t_{n}

        Using the chain rule, we get:

        ExpandOK}Receive:

        we know:

        F:

        ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​ Note that the sigmoid function also requires a derivative inside the brackets. The part inside the brackets of sigmoid is a composite function.

        The updated weight is obtained by inverting the error slope just obtained and adjusting the old weight. If the slope is positive, we want to decrease the weight, if the slope is negative, we want to increase the weight, so we want to invert the slope.

        The final result is:

        new\textbf{w}_{j,k}=old\textbf{w}_{j,k}+\Delta \textbf{w}_{j,k} 

       \alpha is the learning speed . Note that there is also a transposition here. The derivation of the slope has a negative sign, and the inversion of the slope has a negative sign.

        Of course, in the same way, we can get the weight update from the input layer to the hidden layer:

\Delta \textup{w}_{i,j}=\alpha *E_{j}*O_{j}*(1-O_{j})\cdot O_{i}^{T}

new\textbf{w}_{i,j}=old\textbf{w}_{i,j}+\Delta \textbf{w}_{i,j}

6. Prepare data

1.Input

        ​ ​ ​ ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​OUTOUT If it is too flat, the gradient will be too small, which is not conducive to weight update. Do not enter 0 because computers lose precision when dealing with very small or very large numbers. Therefore, the range of the input signal is generally limited to [0.01, 1.00], so the input signal must be preprocessed before training the neural network.

2.Output

        ​ ​ ​ ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​ out out of sigmoid function, we will find that the output range of the sigmoid function is (0, 1), it can not get 0 or 1. Therefore, before we train the neural network, we need to preprocess the label signal so that its range falls in the interval [0.01, 0.99].

3. Random initial weights

        We generally randomly select the initial weight from -1.0~+1.0. It is prohibited to set all to 0. The error will be evenly distributed during backpropagation and the weight will never be updated. We can randomly sample within the approximate range of the reciprocal square root of the number of incoming links to a node and initialize the weights. Therefore, if each node has 3 incoming links, the initial weight should range from -\frac{1}{\sqrt{3}} to +\frac{1}{\sqrt{3}}, which is ±0.577. We initialize the weight vector using a normal distribution with zero mean and a variance equal to the inverse square root of the number of links.

7. Programming Simulation

        I conduct simulations on the mnist data set. Use Python language to build neural network simulation on Ipython.

1.Code

# python notebook for Make Your Own Neural Network
# code for a 3-layer neural network, and code for learning the MNIST dataset
import numpy
# scipy.special for the sigmoid function expit()
import scipy.special
# library for plotting arrays
import matplotlib.pyplot
# ensure the plots are inside this notebook, not an external window % matplotlib inline
# neural network class definition
class neuralNetwork:
   
    # initialise the neural network
    def __init__(self, inputnodes, hiddennodes, outputnodes,learningrate):
        # set number of nodes in each input, hidden, output layer
        self.inodes = inputnodes
        self.hnodes = hiddennodes
        self.onodes = outputnodes
       
        # link weight matrices, wih and who
        # weights inside the arrays are w_i_j, where link is from node i to node j in the next layer
        self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
        self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
        # learning rate
        self.lr = learningrate
       
        # activation function is the sigmoid function
        self.activation_function = lambda x: scipy.special.expit(x)
       
        pass
   
    # train the neural network
    def train(self, inputs_list, targets_list):
        # convert inputs list to 2d array
        inputs = numpy.array(inputs_list, ndmin=2).T
        targets = numpy.array(targets_list, ndmin=2).T
       
        # calculate signals into hidden layer
        hidden_inputs = numpy.dot(self.wih, inputs)
        # calculate the signals emerging from hidden layer
        hidden_outputs = self.activation_function(hidden_inputs)
       
        # calculate signals into final output layer
        final_inputs = numpy.dot(self.who, hidden_outputs)
        # calculate the signals emerging from final output layer
        final_outputs = self.activation_function(final_inputs)
       
        # output layer error is the (target - actual)
        output_errors = targets - final_outputs
        # hidden layer error is the output_errors, split by weights, recombined at hidden nodes
        hidden_errors = numpy.dot(self.who.T, output_errors)
       
        # update the weights for the links between the hidden and output layers
        self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))
       
        # update the weights for the links between the input and hidden layers
        self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
       
        pass
   
    # query the neural network
    def query(self, inputs_list):
        # convert inputs list to 2d array
        inputs = numpy.array(inputs_list, ndmin=2).T
       
        # calculate signals into hidden layer
        hidden_inputs = numpy.dot(self.wih, inputs)
        # calculate the signals emerging from hidden layer
        hidden_outputs = self.activation_function(hidden_inputs)
       
        # calculate signals into final output layer
        final_inputs = numpy.dot(self.who, hidden_outputs)
        # calculate the signals emerging from final output layer
        final_outputs = self.activation_function(final_inputs)
       
        return final_outputs
# number of input, hidden and output nodes
input_nodes = 784
hidden_nodes = 200
output_nodes = 10
# learning rate is 0.1
learning_rate = 0.1
# create instance of neural network
n = neuralNetwork(input_nodes,hidden_nodes,output_nodes,learning_rate)
# load the mnist training data CSV file into a list
training_data_file = open("mnist_train.csv", 'r')
training_data_list = training_data_file.readlines()
training_data_file.close()
# epochs is the number of times the training data set is used for training
epochs = 5
for e in range(epochs):
    # go through all records in the training data set
    for record in training_data_list:
        # split the record by the ',' commas
        all_values = record.split(',')
        # scale and shift the inputs
        inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
        # create the target output values (all 0.01, except the desired label which is 0.99)
        targets = numpy.zeros(output_nodes) + 0.01
        # all_values[0] is the target label for this record
        targets[int(all_values[0])] = 0.99
        n.train(inputs, targets)
        pass
    pass

test_data_file = open("mnist_test.csv", 'r')
test_data_list = test_data_file.readlines()
test_data_file.close()

# test the neural network
# scorecard for how well the network performs, initially empty
scorecard = []
# go through all the records in the test data set
for record in test_data_list:
    # split the record by the ',' commas
    all_values = record.split(',')
    # correct answer is first value
    correct_label = int(all_values[0])
    #print(correct_label, "correct label")
    # scale and shift the inputs
    inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
    # query the network
    outputs = n.query(inputs)
    # the index of the highest value corresponds to the label
    label = numpy.argmax(outputs)
    #print(label, "network's answer")
    # append correct or incorrect to list
    if (label == correct_label):
        # network's answer matches correct answer, add 1 to scorecard
        scorecard.append(1)
    else:
        # network's answer doesn't match correct answer, add 0 to scorecard
        scorecard.append(0)
        pass
   
    pass

# calculate the performance score, the fraction of correct answers
scorecard_array = numpy.asarray(scorecard)
print ("performance = ", scorecard_array.sum() / scorecard_array.size)

2. Simulation results

        60,000 samples from mnist were used to train the neural network, and 10,000 samples were used to test the trained neural network. The recognition accuracy was 97.39%.


Summarize

        The above is all the content I want to share this time. This article introduces the basic principles and training simulation of neural networks in detail. Hope it's helpful to all of you.

Guess you like

Origin blog.csdn.net/m0_66360845/article/details/133942128