Paper Notes: Deep Residual Learning for Image Recognition

Author: Zen and the Art of Computer Programming

1 Introduction

Convolutional Neural Networks (CNN) have achieved great success in computer vision fields such as image classification and object detection, and their depth, width and complexity can reach the extreme. However, with the deepening of the network and the increase in the complexity of the model, the problem of overfitting has gradually become prominent. ResNet starts from another perspective and believes that the network should be able to train better and more accurately. By introducing a residual structure, it can not only increase the depth of the network, but also avoid the problem of vanishing or exploding gradients. This paper elaborates the main characteristics and network structure of the residual network in detail, analyzes based on the VGG model, and compares the advantages and disadvantages and applicable scenarios of ordinary CNN and ResNet. Finally, the authors summarize multiple applications of residual networks in computer vision.

2. Related work

The research on residual networks started with deep residual networks (DRN), which was later called bottleneck residual networks (BRN). DRN is the basis of AlexNet in 2015. By introducing residual units in the middle layer, the depth of the network is not reduced. BRN is a lightweight network architecture proposed by Google in 2015. Although BRN is slower than DRN in speed, it has a much smaller number of network parameters in comparison. In recent years, deep residual networks have become mainstream network structures due to their simplicity and effectiveness.

A comparison of a regular CNN and a ResNet shows that both can achieve the task of image classification. CNN is usually stacked by a convolutional layer, a pooling layer, and a fully connected layer, and each convolutional layer, pooling layer, and fully connected layer has a large number of parameters, and hyperparameter optimization is required to achieve better results. ; while ResNet only consists of a series of convolutional layers, the output of each layer is the sum of the input and the elements of the previous layer, so hyperparameter optimization is not required. In addition, ResNet

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132681877