Watermelon school exercise book - Chapter V

5.1

Note that, the neural network must have a non-linear activation function, both in the hidden layer or output layer, or all are. If $ f (x) = \ omega ^ {T} x $ do activation function, regardless of how many layers of the neural network are degenerates linear regression.

5.2

Both desired values ​​are mapped to consecutive {0,1}, but the step function is not smooth, a discontinuous nature, so it is selected as a sigmoid mapping function. Except that the activation function is not necessary to use the sigmoid, as long as it can guide nonlinear function may be used.

5.3

 

5.4

If the learning rate is too low, every drop very slowly, so that the number of iterations very much. If the learning rate is too high, there will be shock now iteration in the back, fluctuating back and forth near the minimum.

5.5

5.6

5.7

5.8

5.9

5.10

 

Guess you like

Origin www.cnblogs.com/zwtgyh/p/11429210.html