Quick learning in one article - let the neural network no longer be mysterious, learn the basics of neural network in one day - output layer (4)


foreword

I have been thinking for a long time whether to publish in-depth learning content. After all, more than half of the machine learning content in the mathematical modeling column has not been updated. I have considered for a long time and decided to come up with a series of articles on neural networks. Otherwise, if neural networks are used in mathematical modeling competitions or other more optimized models in the future (such as using LSTM for time series model prediction), then it will be better to explain to everyone. And explained the principle. However, the content of deep learning is not so easy to master. It contains a lot of mathematical theoretical knowledge and a lot of calculation formula principles that require reasoning. And it is difficult to understand what the code we write represents in the neural network computing framework without actual operation. However, I will try my best to simplify the knowledge and convert it into something we are more familiar with. I will try my best to let everyone understand and be familiar with the neural network framework, so as to ensure smooth understanding and smooth deduction, try not to use too many mathematical formulas and Professional theoretical knowledge. Quickly understand and implement the algorithm in one article, proficient in this knowledge in the most efficient way.

Although many competitions do not limit the use of algorithm frameworks, more and more award-winning teams have used deep learning algorithms, and traditional machine learning algorithms are gradually declining. For example, in the 2022 American College Students Mathematical Modeling Question C, the parameter team used the deep learning network team, and the winning ratio is very high. Now artificial intelligence competitions and data mining competitions are increasing one after another, and the demand for neural network knowledge is also increasing, so it is very useful. It is necessary to master various neural network algorithms.

The blogger has focused on modeling for four years, participated in dozens of mathematical modeling, large and small, and understands the principles of various models, the modeling process of each model, and various topic analysis methods. The purpose of this column is to quickly use various mathematical models, machine learning and deep learning, and codes with zero foundation. Each article contains practical projects and runnable codes. Bloggers keep up with all kinds of digital and analog competitions. For each digital and analog competition, bloggers will write the latest ideas and codes into this column, as well as detailed ideas and complete codes. I hope that friends in need will not miss the column carefully created by the author.


output layer

When neural networks solve different types of problems, they use different activation functions to suit the needs of the task. For classification and regression problems, we choose different strategies to build the network.

First, let's consider a classification problem. In classification problems, we are often faced with the challenge of separating data into different classes. For binary classification problems, this means that we need to divide the data into two possible classes. In this case, we choose to use the Sigmoid function as the activation function for the output layer. The sigmoid function maps output values ​​between 0 and 1, which allows us to think of the output as the probability of a certain class.

Whereas when dealing with multi-classification problems, our goal is to divide the data into multiple possible classes. At this time, we use the Softmax function as the activation function of the output layer. The Softmax function converts the raw scores for each category into a form representing the probability of each category. This allows us to determine which class is most likely to be the correct class for a given input.

In regression problems, however, we are concerned with predicting continuous numerical outputs, not discrete categories. In this case, we usually don't use activation functions, because we want the output of the network to take any real value, so as to match the needs of the problem.

To sum up, the selection of activation functions for neural networks in different problem types depends on the nature and goals of the problem. By choosing an appropriate activation function, we are able to better adapt the network to different types of tasks.

1. Working process

For the convenience of understanding, we still use an actual case to perform multi-classification using Softmax.

A multi-classification problem, C=4. The final output layer of the linear classifier model contains four output values, which are:

V=\begin{pmatrix} -3\\ 2\\ -1\\ 0 \end{pmatrix}

After Softmax processing, the value is transformed into the following probability:

S=\begin{pmatrix} 0.0057\\ 0.8390\\ 0.0418\\ 0.1135 \end{pmatrix}

 Obviously, according to the calculated probability, we can easily find that S2=0.8390 corresponds to the highest probability. Softmax converts continuous values ​​into relative probabilities, which is more conducive to our understanding. Of course, if we get [1000,1001,1002], we will get inf, if it is [-1000,-999,-1000] or not, we will also get -inf.

In practical applications, some numerical processing is required on V: that is, each element in V subtracts the maximum value in V.

def _softmax(x):
    c=np.max(x)
    exp_x = np.exp(x-c)
    return exp_x/np.sum(exp_x)

scores = np.array([123,456,789]) 
p = _softmax(scores)
print(p)

 Calculation in numerical form may be somewhat abstract, so let's take image classification as an example:

Suppose we divide cats into category 1, dogs into category 2, and chickens into category 3. If they do not belong to any of the above categories, they are classified as "others". The first picture from left to right is a chicken, so we Put it in class 3, and so on:

Suppose we input a picture of a cat, and its corresponding real label is 0100 (the category has been converted into one-hot encoding form).

The true value y is 1 \begin{pmatrix} 0\\ 1\\ 0\\ 0 \end{pmatrix}, y^{i=1}and the rest are 0. After Softmax calculation, the predicted value y_predict is obtained. Assuming that the predicted value is, \begin{pmatrix} 0.3\\ 0.2\\ 0.1\\ 0.4 \end{pmatrix}it is a vector including the probability that the sum is 1. In this case, the neural network is calculated. The result tells us that the cat is assigned a probability value of 20%. Generally speaking, the neural network uses the category corresponding to the neuron with the largest output value as the recognition result, and even if the Softmax function is used, it will only change the size of the value but not the position of the neuron. In addition, the operation of the exponential function also requires a certain amount of time. Therefore, the Softmax function can be considered to be omitted in the multi-classification problem.

2. The number of neurons in the output layer

The number of neurons in the output layer should be chosen according to the requirements of the problem to be solved. When dealing with different problems, the number of neurons in the output layer needs to be set accordingly. Especially in classification problems, we need to determine the number of neurons in the output layer according to the number of categories. Taking MNIST (handwritten digit recognition) as an example, this is a problem of dividing handwritten digits into 10 categories from 0 to 9, so in the output layer we will set 10 neurons, each neuron corresponds to a digit category.

Forward propagation based on the MNIST dataset

The MNIST (Modified National Institute of Standards and Technology) dataset is a classic dataset widely used in the field of computer vision for handwritten digit recognition tasks. This dataset was created by the National Institute of Standards and Technology (NIST) and modified for use in machine learning research.

The MNIST dataset contains a series of images of handwritten digits, covering numbers from 0 to 9. Each image has a size of 28x28 pixels and is presented as a grayscale image. The data set is divided into two parts: training set and test set.

Training set: The training set contains 60,000 images of handwritten digits, and each image has a corresponding label indicating the digit represented by the image. These images and labels are widely used to train machine learning models, especially for building handwritten digit recognition models.

Test set: The test set contains 10,000 images of handwritten digits, also with corresponding labels. These images are used to evaluate the performance of the trained model on unseen data. Through the results of the test set, you can understand the generalization ability and accuracy of the model.

You can download it directly through torch:

#MNIST dataset
train_dataset = dsets.MNIST(root = '/ml/pymnist',  #选择数据的根目录
                            train = True,  #选择训练集
                            transform = None,  #不考#MNIST dataset
train_dataset = dsets.MNIST(root = '/ml/pymnist',  #选择数据的根目录
                            train = True,  #选择训练集
                            transform = None,  #不考虑使用任何数据预处理
                            download = True  #从网络上下载图片
                           )
test_dataset = dsets.MNIST(root = '/ml/pymnist',#选择数据的根目录
                           train = False,#选择测试集
                           transform = None, #不考虑使用任何数据预处理
                           download = True #从网络上下载图片
                          )虑使用任何数据预处理
                            download = True  #从网络上下载图片
                           )
test_dataset = dsets.MNIST(root = '/ml/pymnist',#选择数据的根目录
                           train = False,#选择测试集
                           transform = None, #不考虑使用任何数据预处理
                           download = True #从网络上下载图片
                          )

First of all, we need to initialize the network init_network function, we need to set the weight_scale variable to control the random weight, and uniformly set the bias to 1.

def init_network():
    network={}
    weight_scale=1e-3
    network['W1']=np.random.randn(784,50)*weight_scale
    network['b1']=np.ones(50)
    network['W2']=np.random.randn(50,100)*weight_scale
    network['b2']=np.ones(100)
    network['W3']=np.random.randn(100,10)*weight_scale
    network['b3']=np.ones(10)
    return network

After that, the forward propagation process is performed, and the ReLU function is used here:

def _relu(x):
    return np.maximum(0,x)
def forward(network,x):
    w1,w2,w3 = network['W1'],network['W2'],network['W3']
    b1,b2,b3 = network['b1'],network['b2'],network['b3']
    a1 = x.dot(w1) + b1
    z1 = _relu(a1)
    a2 = z1.dot(w2) + b2
    z2 = _relu(a2)
    a3 = z2.dot(w3) + b3
    y = a3
    return y

Finally we calculate the final result:

network = init_network()
accuracy_cnt = 0
x = test_dataset.test_data.numpy().reshape(-1,28*28)
labels = test_dataset.test_labels.numpy() #tensor转numpy
for i in range((len(x))):
    y = forward(network,x[i])
    p = np.argmax(y) #获取概率最高的元素的索引
    if p == labels[i]:
        accuracy_cnt += 1
print("Accuracy:"+ str(float(accuracy_cnt)/len(x)*100)+'%')

 

 Because it is the initial weight, the probability of correct guessing for 10 categories is 10%. However, because we only propagate forward and do not propagate back, all bias weights are not optimal. Then in the next chapter, we will prepare for backpropagation, and describe the role of the loss function in detail. In my previous article, I basically explained all the loss functions. It is very detailed and perfect. I recommend that you have the ability to read it:

Detailed Explanation of Loss Function (Loss Function) - Python Code Implementation of Common Loss Functions for Classification Problems + Analysis of Calculation Principles

The next article will focus on the use and function of the loss function in actual cases.


Guess you like

Origin blog.csdn.net/master_hunter/article/details/132575138