Deep learning method - experiment 4: code comparison softmax and sigmoid

Foreword:

The teacher arranged such an experiment, it should be to let us vividly understand what "position" the softmax and sigmoid functions are more suitable for in the neural network, that is, it is more suitable as the activation function of the middle hidden layer or the activation function of the final output layer .

After the experiment, the difference between the two activation functions can be regarded as a small gain. Unfortunately, if we want to extract some conclusions from it, the design of this experiment is not rigorous enough, so that it is very uncomfortable when doing it, and the experimental summary is also hasty. .

So I will write another article about the content I have gained:  hands-on learning of deep learning-the softmax function is different from the sigmoid function  . The experiment is a process of producing knowledge, so I will not post the code and directly map it.

1. Experimental requirements

Verify and test the principle and algorithm implementation of the multi-layer neural network corresponding to the two activation functions on the computer, test the training effect of the multi-layer neural network, and consult relevant information at the same time.

2. The purpose of the experiment

1. Master the basic principles of Softmax;

2. Master the gradient calculation of Softmax and cross-entropy loss;

3. Master the algorithm process of Softmax backpropagation;

3. Experimental content

Question one:

1. For the same multi-classification scenario, please compare SoftMax with cross-entropy loss function and Sigmoid with cross-entropy loss function. Is there any difference in training speed and learning effect?

  • Use the sigmoid function as the output layer activation function:

  • Set the cross-entropy loss function corresponding to sigmoid :

  • Use the softmax function as the output layer activation function:

  • Set the cross-entropy loss function corresponding to softmax :

  • Display of non-intersection multi-classification results (sigmoid above, softmax below):

 

  • There is an intersection multi-classification result display (sigmoid above, softmax below):

  • in conclusion:

It can be seen that for multi-classification problems, the training speed of the neural network with softmax as the output layer is significantly faster than that of the neural network with sigmoid as the output layer, and for some difficult data sets, the learning effect of softmax is significantly better than that of sigmoid.

Question two:

2. Please read and test the multi-layer neural network code, complete the implementation of the SoftMax class as the hidden layer, and compare and test the training effect with Sigmoid and ReLU.

  • softmax function:

  • for hidden layer:

  • Running results: not ideal, 5000 iterations, the loss is basically stable at around 0.6, the classification results are as follows.

  • The sigmoid function is used in the hidden layer:

  • Running results: The training speed is similar to softmax, the training result is more ideal, and the loss is finally stable at 0.18.

  • The relu function is used in the hidden layer:

  • Running results: The loss starts to be large but drops quickly, and the training result is slightly worse than sigmoid, stabilizing at around 0.23.

4. Experimental summary

Through this experiment, students learned the difference between the softmax function and the sigmoid function as the hidden layer and output layer, and the reasons why they correspond to different cross-entropy loss functions.

Guess you like

Origin blog.csdn.net/qq_50571974/article/details/123968689