1. Sometimes data training is performed in a convolutional neural network, and the more and more complex the neural network layer, the better the performance, which may be due to overfitting. Gradient disappearance is also an important reason. When the chain rule multiplies a series of gradients (if gradient <1), the gradient will tend to 0, and the weights will not be effectively updated (Plain net as shown in the figure below). In order to solve this problem, the residual network appeared, and the following simple residual network (Residual net) structure:
In the above figure, the residual network (Residual net) has one more jump connection than the pure connection network (Plain net). When finding the gradient, Plain net: d[H(x)] / dx; Residual net: d[H(x)] / dx = d[F(x)] / dx + 1. Therefore, if d[F(x)] / x is very less than 1, but the total gradient is 1 more, multiplying several times can solve the problem of gradient disappearance.
Build a simple convolutional residual neural network:
Code for a simple residual neural network:
Construct a residual network:
The complete code for constructing the residual network: