CNN Convolutional Neural Network_Deep Residual Network ResNet - Solve the fundamental problem that the neural network is too deep but causes the error to increase

from:https://blog.csdn.net/diamonjoy_zone/article/details/70904212

Environment: Win8.1 TensorFlow1.0.1

Software: Anaconda3 (integrated with Python3 and development environment)

TensorFlow installation: pip install tensorflow (CPU version) pip install tensorflow-gpu (GPU version)

TFLearn安装:pip install tflearn

 

refer to:

Deep Residual Learning for Image Recognition  Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

 

1 Introduction


ResNet (Residual Neural Network) was proposed by 4 Chinese  including Kaiming He from the former Microsoft Research Institute . By using Residual Blocks  to successfully train a 152-layer deep neural network, it won the championship  in the ILSVRC 2015  competition and achieved a top-5 error of 3.57%  . At the same time, the amount of parameters is lower than that of  VGGNet , and the effect is very prominent. The structure of ResNet can speed up the training of ultra-deep neural networks very quickly, and the accuracy of the model is also greatly improved. The last blog post explained  Inception , and  Inception V4 combines Inception Module and ResNet. It can be seen that ResNet is a very popular network structure, and can even be directly applied to Inception Net.

 

On CVPR16 , the Deep Residual Learning for Image Recognition  of He Kaiming, Zhang Xiangyu, Ren Shaoqing and Sun Jian  won the Best Paper without a doubt .

For details on this article, please refer to:

 

  1. Award-winning deep residual learning, another No.1 of Tsinghua Xueba | Best paper of CVPR2016
  2. Understand in seconds! He Kaiming's deep residual network PPT is like this | ICML2016 tutorial

2. Questions

 

The first question the author asks is, is the deeper the deep neural network the better? 

  • Ideally, as long as the network does not fit well, the deeper the better.
  • The actual situation is that the network deepens, but the accuracy decreases, which is called Degradation .

 

Schmidhuber , the author of  LSTM , pointed out in  Highway Network  that the depth of neural network is very important to its performance, but the deeper the network, the more difficult it is to train. The goal of Highway Network is to solve the problem that very deep neural networks are difficult to train. Highway Network is equivalent to modifying the activation function of each layer. The previous activation function only performs a nonlinear transformation on the input, and Highway NetWork allows to retain a certain proportion of the original input x . In this way, a certain proportion of the information in the previous layer can be directly transmitted to the next layer without matrix multiplication and nonlinear transformation, like an information highway, hence the name Highway Network :

 

 


The original inspiration of ResNet came from this problem: when the depth of the neural network is continuously increased, there will be a problem of Degradation , that is, the accuracy rate will first increase and then reach saturation, and then continue to increase the depth will lead to a decrease in the accuracy rate. This is not a problem of overfitting, because not only does the error increase on the test set, but the training set itself also increases.

 

Assuming that there is a shallow network ( Shallow Net ) that achieves saturation accuracy, then a few more congruent mapping layers ( Identity mapping ) are added later, at least the error will not increase, that is, deeper networks should not bring training The error on the set rises. The idea of ​​using congruent mapping to directly pass the output of the previous layer to the back mentioned here is the source of inspiration for ResNet.

 

3. Composition

 

The author proposes a deep residual learning framework to solve this problem of performance degradation due to increasing depth.

 

Assume that the input of a certain neural network is x , and the expected output is H(x) , that is, H(x) is the expected complex latent mapping, but it is difficult to learn; if we directly pass the input x to the output as the initial result, through the following Figure " shortcut connections ", then the goal we need to learn at this time is  F(x)=H(x)-x , so ResNet is equivalent to changing the learning goal, no longer learning a complete output, but the optimal The difference between the solution H(X)  and the congruent map  , that is, the residual

 

Shortcut originally means shortcut, which means cross-layer connection here . In the  Highway Network  , a direct path from x to y is set up, and  T (x, Wt) is used as the  gate to grasp the weight between the two; ResNet shortcut does not have After passing x, each module only learns the residual F(x), and the network is stable and easy to learn. The author also proves that with the increase of network depth, the performance will gradually become better. It can be speculated that when the number of network layers is deep enough, it is easy to optimize a complex nonlinear mapping H(x) by optimizing the Residual Function: F(x)=H(x)−x.


4. Network structure


The figure below shows a comparison of  VGGNet-19 , a 34-layer deep normal convolutional network, and a 34-layer deep ResNet network. It can be seen that the biggest difference between the ordinary direct-connected convolutional neural network and ResNet is that ResNet has many bypass branches to directly connect the input to the following layer, so that the latter layer can directly learn the residual. This structure is also called for  shortcut connections . Traditional convolutional layers or fully connected layers have more or less problems such as information loss and loss when transmitting information. ResNet solves this problem to a certain extent. By directly passing the input information to the output, the integrity of the information is protected. The entire network only needs to learn the part of the difference between the input and output, simplifying the learning goals and difficulty.


 

At the same time, the 34-layer residual network cancels the last few layers of FC, and directly connects the Softmax with an output channel of 1000 through the avg pool, which makes ResNet less computationally intensive than 16-19 layers of VGG.

 

In the ResNet paper, in addition to the two-layer residual learning unit that proposes the residual learning unit, there are three-layer residual learning units. The residual learning unit of the two layers contains two 3×3 convolutions with the same number of output channels (because the residual is equal to the target output minus the input, that is, the input and output dimensions need to be consistent); and the residual of the three layers The network uses the 1´1 convolution in Network In Network and Inception Net, and uses 1´1 convolution before and after the middle 3´3 convolution. The operation of reducing the dimension first and then increasing the dimension reduces the computational complexity. Spend. In addition, if there are different input and output dimensions, we can do a linear mapping transformation on x, and then connect to the following layers.

 

 

5. Experiment

 

After using the structure of ResNet, it can be found that the increase of the error on the training set caused by the deepening of the number of layers has been eliminated, and the training error of the ResNet network will gradually decrease with the increase of the number of layers, and on the test set performance will also improve. Finally won the championship in the  ILSVRC 2015  competition, achieving  a top-5 error rate of 3.57%  .

 

 

tflearn gives an example of ResNet on CIFAR  -10 residual_network_cifar10.py ,  tflearn can easily define residual learning unit  through tflearn.residual_block :

 

[python] view plain copy
  1. # -*- coding: utf-8 -*-  
  2.   
  3. """ Deep Residual Network. 
  4.  
  5. Applying a Deep Residual Network to CIFAR-10 Dataset classification task. 
  6.  
  7. References: 
  8.     - K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image 
  9.       Recognition, 2015. 
  10.     - Learning Multiple Layers of Features from Tiny Images, A. Krizhevsky, 2009. 
  11.  
  12. Links: 
  13.     - [Deep Residual Network](http://arxiv.org/pdf/1512.03385.pdf) 
  14.     - [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html) 
  15.  
  16. """  
  17.   
  18. from __future__ import division, print_function, absolute_import  
  19.   
  20. import tflearn  
  21.   
  22. # Residual blocks  
  23. # 32 layers: n=5, 56 layers: n=9, 110 layers: n=18  
  24. n = 5  
  25.   
  26. # Data loading  
  27. from tflearn.datasets import cifar10  
  28. (X, Y), (testX, testY) = cifar10.load_data()  
  29. Y = tflearn.data_utils.to_categorical(Y, 10)  
  30. testY = tflearn.data_utils.to_categorical(testY, 10)  
  31.   
  32. # Real-time data preprocessing  
  33. img_prep = tflearn.ImagePreprocessing()  
  34. img_prep.add_featurewise_zero_center(per_channel=True)  
  35.   
  36. # Real-time data augmentation  
  37. img_aug = tflearn.ImageAugmentation()  
  38. img_aug.add_random_flip_leftright()  
  39. img_aug.add_random_crop([32, 32], padding=4)  
  40.   
  41. # Building Residual Network  
  42. net = tflearn.input_data(shape=[None, 32, 32, 3],  
  43.                          data_preprocessing=img_prep,  
  44.                          data_augmentation=img_aug)  
  45. net = tflearn.conv_2d(net, 16, 3, regularizer='L2', weight_decay=0.0001)  
  46. net = tflearn.residual_block(net, n, 16)  
  47. net = tflearn.residual_block(net, 1, 32, downsample=True)  
  48. net = tflearn.residual_block(net, n-1, 32)  
  49. net = tflearn.residual_block(net, 1, 64, downsample=True)  
  50. net = tflearn.residual_block(net, n-1, 64)  
  51. net = tflearn.batch_normalization(net)  
  52. net = tflearn.activation(net, 'relu')  
  53. net = tflearn.global_avg_pool(net)  
  54. # Regression  
  55. net = tflearn.fully_connected(net, 10, activation='softmax')  
  56. mom = tflearn.Momentum(0.1, lr_decay=0.1, decay_step=32000, staircase=True)  
  57. net = tflearn.regression(net, optimizer=mom,  
  58.                          loss='categorical_crossentropy')  
  59. # Training  
  60. model = tflearn.DNN(net, checkpoint_path='model_resnet_cifar10',  
  61.                     max_checkpoints=10, tensorboard_verbose=0,  
  62.                     clip_gradients=0.)  
  63.   
  64. model.fit(X, Y, n_epoch=200, validation_set=(testX, testY),  
  65.           snapshot_epoch=False, snapshot_step=500,  
  66.           show_metric=True, batch_size=128, shuffle=True,  
  67.           run_id='resnet_cifar10')  

 

6. Follow-up

 

Shortly after the introduction of ResNet, Google borrowed the essence of ResNet and proposed Inception V4 and Inception-ResNet-V2 , and by fusing these two models, it achieved an astonishing 3.08% error rate on the ILSVRC dataset  . It can be seen that the contribution of ResNet and its ideas to convolutional neural network research is indeed very significant and has a strong generalization.

 

In the second related paper Identity Mappings in Deep Residual Networks by the authors of ResNet , ResNet V2 is presented. The main difference between ResNet V2 and ResNet V1 is that the author found that the feedforward and feedback signals can be directly transmitted by studying the propagation formula of the ResNet residual learning unit, so the nonlinear activation function (such as ReLU) of the shortcut connection is replaced by Identity Mappings . Meanwhile, ResNet V2 uses Batch Normalization in each layer . After doing this, the new residual learning unit will be easier to train and more generalizable than before.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325215103&siteId=291194637