relu function is a piecewise linear function, a nonlinear element will increase Why

relu function is a piecewise linear function, a nonlinear element will increase Why

We know the role of the activation function is non-linear factors in order to increase the neural network so that it can fit any function. So when relu is greater than a linear function, if our output value is always in a state of greater than zero, how can approximate the nonlinear function?

relu nonlinear activation function
doubt the main problem is that the network relu Why this "appears linear" (piecewise linear) activation function formed, was able to increase the non-linear expression.
1. What is the first linear network, if the network as a large linear matrix M. Then the input samples A and B, it will go through the same linear transformation MA, MB (where A and B are subjected to a linear transformation matrix M is the same).
2, indeed for a single sample A, after a neural network constituted by a relu activation function, which may be equivalent process indeed is the result of a linear transformation M1, but for the sample B, when passing through the same network, since each neuron is active (0 or Wx + b) a sample passes through a different situation (different sample), thus experienced by the linear transformation B is not equal to M2 M1. Thus, although the configuration of the neural network are relu linear transformation to each sample, the samples subjected to different linear transformation M are not the same, so that the entire sample passes through the space constituting the network relu actually experienced a non-linear transformation .
3, another explanation is that different samples of the same feature, when RELU constituted by the neural network, the path is not the same flows (RELU activation value is 0, clogging; activating value itself, through), so the final output space is actually non-linear transformation of the input space come.
4, more extreme, whether or tanh sigmoid, you can put them as a piecewise linear approximation of a function (many segments), but still be able to have a non-linear expression; relu although only two, but also the non linear activation function, with the same reason.
5, relu has the advantage of simple operation, fast learning speed

Interpretation is better

Guess you like

Origin www.cnblogs.com/lzida9223/p/10972783.html