Machine Learning (20)---Detailed Explanation of Neural Networks


1. Overview of Neural Networks

1.1 Neuron model

 1. The broadest definition is adopted here: a neural network is an extensively parallel interconnected network composed of adaptable simple units, its organization capable of simulating the interactive responses of biological nervous systems to real-world objects.

 2. The "simple unit" in the definition is the most basic component of the neural network, called the neuron model. The one that is still used today is the "MP neuron model".

Insert image description here

 3. In this model, a neuron receives input signals from n other neurons. These input signals are transmitted through weighted connections. The total input value received by the neuron is equal to the threshold of the neuron. A comparison is made and then processed through an "activation function" to produce the neuron's output.

1.2 Activation function

 1. The activation function in a neural network is a nonlinear function that is applied to the output of a neuron in order to introduce nonlinear characteristics. The function of the activation function is to transform the input signal so that it can better adapt to complex data patterns.

 2. The ideal activation function is a step function, which maps the input value to the output value "0" and "1". Obviously "1" corresponds to neuron excitation and "0" corresponds to neuron inhibition. However, the step function has undesirable properties such as discontinuity and smoothness, so the Sigmoid function is often used as the activation function. It squeezes input values ​​that may vary within a large range into the (0, 1) output value range, so it is sometimes called a "squashi function" (squashi functio).

Insert image description here
 3. Do not use linear activation functions in hidden layers; usually use ReLu activation functions in hidden layers.

Insert image description here

2. Perceptron

2.1 Overview

 1. The perceptron is composed of two layers of neurons. The input layer receives external input signals and passes them to the output layer. The output layer is MP neurons, also known as "threshold logic unit".

Insert image description here
 2. (Handwriting) Perceptron learning method:

Insert image description here

2.2 Implement logical operations

 1. The perceptron can perform basic logical operations: AND, OR, NOT.Note: We need to select the correct parameters!
 As shown in the figure below: Let us take AND as an example. When x1 and x2 take 1 at the same time, the output result will be 1; if one or both take 0, the output result will be 0, realizing the logic of AND. Operation.

Insert image description here

 2. However, single-layer perceptrons cannot implement XOR operations.

2.3 Multi-layer perceptron

 1. A multi-layer perceptron has multiple layers of neurons. The layer of neurons between the output layer and the input layer is called the hidden layer. Both the hidden layer and the output layer neurons have activations. Function of functional neurons.

Insert image description here

 2. Because multi-layer perceptrons can have multiple hidden layers, they can implement XOR operations.

 3. The multi-layer perceptron has powerful representation capabilities. As long as a hidden layer contains enough neurons, the multi-layer perceptron can approximate a continuous function of any complexity with arbitrary accuracy. Multi-layer perceptrons can have multiple hidden layers, and each hidden layer extracts different features. The deeper the hidden layer (closer to the output layer), the higher-level features can be extracted.

Insert image description here
Insert image description here

3. Neural Network

3.1 Working principle

 1. The principle boils down to this: each layer inputs a numeric vector and applies a bunch of logistic regression units to it, then calculates another numeric vector (the output of this layer serves as the input of the next layer), and then passes it from one layer to another layer until reaching the final output layer. The subsequent output results may or may not be predicted.

 2. Analysis of the working principle of the simple neural network model:

Insert image description here

Note: The numbers in the square brackets in the upper right corner of a, w, b represent the layer, and the numbers in the lower right corner represent the neuron in the layer.

 3. From simple to complex, the working principle of multi-layer neural network is the same as described above. Let’s take a four-layer network as an example:

Insert image description here

Note: Generally speaking, when talking about a multi-layer neural network, it includes the output layer and all hidden layers, excluding the input layer.

3.2 Forward propagation

 Neural networks use the forward propagation algorithm (from left to right) during training and calculation.

Insert image description here

3.3 Tensorflow practical demonstration

3.3.1 Import data set for viewing

 The MNIST dataset consists of 60,000 training images and 10,000 test images along with labels representing the numbers present in the images. Each image is represented by 28×28 grayscale pixels, and the API can be called directly here.

import tensorflow as tf
import matplotlib.pyplot as plt

mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# 查看整体情况
print("train_images shape: ", train_images.shape)
print("train_labels shape: ", train_labels.shape)
print("test_images shape: ", test_images.shape)
print("test_labels shape: ", test_labels.shape)

# 展示前9个图像
fig = plt.figure(figsize=(10, 10))

nrows = 3
ncols = 3
for i in range(9):
    fig.add_subplot(nrows, ncols, i + 1) #行数、列数、索引
    plt.imshow(train_images[i])
    plt.title("Digit: {}".format(train_labels[i]))
    plt.axis(False)
plt.show()

Insert image description here

3.3.2 Data preprocessing

 Because there are numbers in it, it prevents the model from treating the numbers as numerical values. Our purpose is to identify numbers rather than numerical values ​​with different sizes, so we mark them with unique labels.

train_images = train_images / 255
test_images = test_images / 255

print("First Label before conversion:")
print(train_labels[0]) #5

# 转换成One-hot标签
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)

print("First Label after conversion:")
print(train_labels[0]) #[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

3.3.3 Building a model

 1. flatten layer: Our input image is a 2D array. The flattening layer converts a 2D array (28 x 28 pixels) into a 1D array (pixels) by row-by-row unstacking. This layer only changes the shape of the data and does not learn any parameters/weights, 28*28=784;Hidden layer: Our only hidden layer consists of a dense layer of fully connected nodes (or neurons), each with an activation function, 512relu;output layer: The output layer of the neural network consists of a dense layer with 10 output neurons, each neuron outputs 10 probabilities, each probability 0 – 9, representing the probability that the image is the corresponding number. The output layer is assigned an activation function to convert input activations into probabilities, softmax.

 2. loss function: This tells our model how to find the error between the actual label and the label predicted by the model. This metric measures the accuracy of our model during training. We want the model to minimize this function value. We will use a loss function for our model;optimization: This tells our model how to update the model's weights/parameters by looking at the data and loss function values. We will use the optimizer adam for our model;index(Optional): It contains a list of metrics for monitoring the training and testing steps. We will use accuracy or the number of images correctly classified by the model.

### 设置图层
model = tf.keras.Sequential([
  # 展平层
  tf.keras.layers.Flatten(),
  # 隐藏层
  tf.keras.layers.Dense(units=512, activation='relu'),
  # 输出层
  tf.keras.layers.Dense(units=10, activation='softmax')
])

### 编译模型
model.compile(
  loss = 'categorical_crossentropy',
  optimizer = 'adam',
  metrics = ['accuracy']
)

3.3.4 Evaluation model

 1. Visualize loss:

Insert image description here

 2. Visualization accuracy:

Insert image description here

4. Backpropagation

 1. Here we use a multi-layer perceptron with one unit per layer for introduction. The K in W and B in the figure below represents the omitted value.

Insert image description here

 2. In the backpropagation calculation, gradient descent is actually performed for each parameter. The key to finding gradient descent is to find the partial derivative of the loss function for each parameter.

Insert image description here

 3. Derivation process:

Insert image description here
 4. Summary of backpropagation: Starting from the last layer, the partial derivative of each parameter is calculated, and the partial derivative value (error) obtained in each layer is back-propagated to the previous layer to facilitate the calculation of the partial derivative of the parameter in the previous layer.

5. Example questions

5.1 Question 1

Insert image description here
Insert image description here

5.2 Question 2

Insert image description here
Insert image description here

5.3 Question 3

Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/m0_62881487/article/details/132986753