[Learning record] activation function + loss function

1. Activation function

(1) Step-type activation function:

 

 (2) sigmod type activation function:

 2. Loss function softmax loss: ①softmax②cross-entroy loss combined loss function

It is used to evaluate the similarity between the model prediction value f(x) and the real value y. The smaller the loss function, the better the model robustness. The loss function guides the model learning. According to the loss function, do backpropagation to modify the model parameters. The purpose of machine learning is to learn a set of parameters so that the predicted value is infinitely close to the true value.

(1)softmax f(z_{k})=e^{zk}/(\sum_{j}^{}e^{zj}) (1)

z: A set of results output by a fully connected layer of a neural network, such as a classification problem, for 4 classifications, z is a 1*4 vector

j: 0~3 subscript

zk: the kth value of the fully connected layer

The vector z value of the fully connected output has no size limit, (1) limits it to 0-1, and becomes a probability value

(2) cross-entroy loss cross-entropy loss function  l(y,z)=-\sum_{k=0}^{c}y_{c}log(f(z_{c})) (2)(cross-entropy: as a loss function benefit, use the sigmod function to avoid the problem of reducing the learning rate of the mean square error loss function during gradient descent)

f(z_{c})That is, f(f(z_{k}))the output value of the softmax function

yc is the true value of the sample

The closer to the true value, the smaller the loss function, and the farther away from the true value, the larger the loss function

Optimization process: continuously increase the probability of being close to the true value and reduce the loss function

(3) Cross entropy as a loss function

①Information volume

②Information entropy

③ Relative entropy (KL divergence)

The same random variable x has two separate probability distributions p(x), q(x)

The kl divergence can be used to measure the difference between two probability distributions

④Cross entropy

Determine how close the actual output distribution is to the expected output distribution: minimize cross-entropy

⑤Mean square error loss function (MSE)

Guess you like

Origin blog.csdn.net/Jessicaxu123/article/details/130184306