1. Activation function
(1) Step-type activation function:
(2) sigmod type activation function:
2. Loss function softmax loss: ①softmax②cross-entroy loss combined loss function
It is used to evaluate the similarity between the model prediction value f(x) and the real value y. The smaller the loss function, the better the model robustness. The loss function guides the model learning. According to the loss function, do backpropagation to modify the model parameters. The purpose of machine learning is to learn a set of parameters so that the predicted value is infinitely close to the true value.
(1)softmax
z: A set of results output by a fully connected layer of a neural network, such as a classification problem, for 4 classifications, z is a 1*4 vector
j: 0~3 subscript
zk: the kth value of the fully connected layer
The vector z value of the fully connected output has no size limit, (1) limits it to 0-1, and becomes a probability value
(2) cross-entroy loss cross-entropy loss function (cross-entropy: as a loss function benefit, use the sigmod function to avoid the problem of reducing the learning rate of the mean square error loss function during gradient descent)
That is, the output value of the softmax function
yc is the true value of the sample
The closer to the true value, the smaller the loss function, and the farther away from the true value, the larger the loss function
Optimization process: continuously increase the probability of being close to the true value and reduce the loss function
(3) Cross entropy as a loss function
①Information volume
②Information entropy
③ Relative entropy (KL divergence)
The same random variable x has two separate probability distributions p(x), q(x)
The kl divergence can be used to measure the difference between two probability distributions
④Cross entropy
Determine how close the actual output distribution is to the expected output distribution: minimize cross-entropy
⑤Mean square error loss function (MSE)