CNN: Introduction to the original text of the CNN classic paper (1950~2018) on deep learning, continuous update of the download address (very valuable) - Jason niu

Please do not copy and paste at will, please respect this blogger, sum up the hard work, thank you for your support!
Recommended collection, keep updating! ! ! ! !

Remember: If you want to learn deep learning well, you must read the original paper! ! ! !

Video link: Listen to the song "Chengdu" and read the mainstream deep neural network development framework (1950~2018) in three minutes - Jason niu
related articles:

CNN: Introduction to the original text of the classic CNN paper on deep learning (1950~2018) The continuous update of the frame structure diagram (very valuable) (blood hematemesis) - Jason niu
DL: Listening to the song "Chengdu" in three minutes to read the mainstream deep learning neural network development framework (1950~2018) - Jason niu
CNN: Introduction to the original text of the CNN classic paper (1950~2018) on deep learning, continuous update of the download address (very valuable) - Jason niu

1986《Learning representations by back-propagating errors》

http://www.iro.umontreal.ca/~pift6266/A06/refs/backprop_old.pdf

The original paper on backpropagation published in Nature in 1986 by Geoffrey Hinton et al.


1998《Gradient-Based Learning Applied to Documnet Recognition》

http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf


2006《Reducing the Dimensionality of Data with Neural Networks》

      In 2006, Hinton and their Science Paper mentioned at that time that although the concept of Deep Learning was proposed in 2006, everyone in the academic world was still dissatisfied. At that time, there was a rumor that when Hinton's students were talking about paper on stage, the machine learning giants in the audience were dismissive and asked if there is a theoretical derivation for your stuff? Do you have a math foundation? Can you do something like SVM? Looking back, even if it is true, the big cows are indeed not unreasonable. It is a mule or a horse that comes out for a walk, not just a concept.
       The time finally came to 2012. Alex Krizhevsky, a student of Hinton, built a Deep Learning model with GPU in his dormitory, and won the laurel of the visual field competition ILSVRC 2012 in one fell swoop. On the millions-scale ImageNet data set, the effect was greatly exceeded. The traditional method has increased from the traditional more than 70% to more than 80%. Personally, I think the song that best suited Hinton's mood at that time was "I Haven't Been Big Brother for Many Years".


2012《ImageNet Classification with Deep Convolutional  Neural Networks》

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

3、

year 2013

2013.11《Visualizing and Understanding Convolutional Networks》

https://link.springer.com/chapter/10.1007/978-3-319-10590-1_53

After AlexNet was in the limelight in 2012, a large number of CNN models appeared in 2013. The winner of that year's ILSVRC competition was a model designed by Matthew Zeiler and Rob Fergus from NYU, NYU, called ZF Net. It achieved an error rate of 11.2%. The architecture of ZF Net not only further optimizes the previous AlexNet, but also introduces some new key techniques for performance improvement. In addition, the author of the article has spent a long time explaining the intuitive meaning hidden under the convolutional network ConvNet and how to correctly visualize the filters and their weight coefficients.


2013.12《OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks》

https://arxiv.org/pdf/1312.6229v2.pdfThis
paper combines image classification, positioning and detection. The simple point that OverFeat said is the feature extraction operator, which is equivalent to SIFT, HOG and other operators. Along with this paper, we release a feature extractor named "OverFeat", which is the original definition of overfeat in the literature. The best part of this document is that it makes full use of the feature extraction function of the convolutional neural network. It uses the features extracted during the classification process and uses them for various tasks such as positioning detection. Different tasks can be achieved by only changing the last few layers of the network without training the parameters of the entire network from scratch. It mainly regards the first to fifth layers of the network as feature extraction layers, and then different tasks share this feature extraction layer. The same network architecture model is basically used (the feature extraction layer is the same, and the classification and regression layer is slightly modified and trained according to different tasks), and the basic features are shared at the same time.


Year 2014

2014.03.17DeepFace: Closing the Gap to Human-Level Performance in Face Verification

http://www.dis.uniroma1.it/~bloisi/seminars/Vision-Perception-for-HRI-2015/papers/deepface.pdf
On March 17, 2014, Facebook reached a A major achievement, the DeepFace facial recognition algorithm they developed can correctly identify human faces with 97.25% accuracy, which is almost the average human level (97.5% accuracy). DeepFace uses 120 million parameters to identify these faces, and the software further enhances its facial recognition capabilities with nearly 4.4 million tagged faces from 4,030 Facebook users.


2014.09.12,《Very Deep Convolutional Networks for Large-Scale Image Recognition

https://arxiv.org/pdf/1409.1556.pdf 
http://www.robots.ox.ac.uk/~vgg/research/very_deep/
http://www.robots.ox.ac.uk/~karen /pdf/ILSVRC_2014.pdf The
  full name of VGG is
Visual Geometry Group , which belongs to the Department of Science and Engineering of Oxford University. It has released a series of convolutional network models starting with VGG , which can be used in face recognition, image classification, etc., from VGG16 to VGG19 .



2014.06《Deep Learning Face Representation from Predicting 10,000 Classes》

https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Sun_Deep_Learning_Face_2014_CVPR_paper.pdf

The DeepID series proposed by Tang Xiaoou's team at the Chinese University of Hong Kong is a very representative set of work.

DeepID1 : " DeepID: Deep Learning for Face Recognition " 2014 CVPR . Four layers of convolution are used, the last layer is Softmax , and the middle is Deep Hidden Identity Features , which is the learned face feature representation, and uses Multi-patch to train the model separately and finally combine it into high-dimensional features, the face verification stage uses joint shell Yeas ' method; learning features by learning a multi-class ( 10,000 classes, each class has about 20 instances) face recognition task, the paper points out that with more face classes to be predicted during training, DeepID 's pan- The stronger the chemical ability.

DeepID2 : " Deep Learning Face Representation by Joint Identification-Verification " 2014 NIPS conference ( top conference in the field of machine learning ) . On the basis of DeepID1 , the loss function is improved. On the basis of the original Identification Loss , Verification Loss is added. Verification mainly increases the compactness within the class, and Identification reflects the changes between classes. By increasing the inter-class gap and reducing the intra-class gap, the trained features are more suitable for tasks similar to face recognition. This idea also originates from the early LDA algorithm.

DeepID3 : " DeepID3: Face Recognition with Very Deep Neural Networks " 2015 CVPR . Two very deep neural network structures (based on VGG and GoogleNet ) are proposed for face recognition , but the recognition results are the same as DeepID2 , perhaps when there is more training data, the performance can be improved, and further research is needed.

2014.06.10《Generative Adversarial Nets》

https://arxiv.org/pdf/1406.2661v1.pdf
      这个网络架构可以说又是一个大进步。在介绍这篇文章之前,我们先谈谈对抗样本adversarial examples。例如,有一个经过ImageNet数据训练好的CNN,现在给一副图(如下图左)加一些扰动或微小修改(中,右),输入后导致预测错误率增加了许多。虽然图像看起来跟原来似乎是一样的,但是最终分类却与原先已经不同了。归纳起来,对抗样本就是那些故意愚弄并破坏卷积网络ConvNets结果的图像。


2014.09.17《Going Deeper with Convolutions》

https://arxiv.org/pdf/1409.4842.pdf
    这是GoogLeNet的最早版本,出现在2014年的《Going deeper with convolutions》。之所以名为“GoogLeNet”而非“GoogleNet”,文章说是为了向早期的LeNet致敬。


2014.10 RNN《Rich feature hierarchies for accurate object detection and semantic segmentation  Tech report (v5)》

https://arxiv.org/pdf/1311.2524v5.pdf


2015年

2015.《Going Deeper with Convolutions》

https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

    Google公司基于ImageNet的ILSVRC比赛做的识别图像的深度卷积神经网——深达22层的Inception实例GoogLeNet《Going Deeper with Convolutions》。它以6.7%的错误率赢得了2014年度ILSVRC的冠军。据我所知,这是第一个跟传统方法,也就是卷积层与池化层简单叠加以形成序列结构的方法不同的一种CNN的新架构。文章作者强调,他们的新模型也特别重视内存与计算量的使用(这是之前我们没有提到的:多层堆积以及大量滤波器的使用会耗费很多计算与存储资源,同样也会提升过拟合的几率)。


2015.04《VERY DEEP CONVOLUTIONAL  NETWORKS  FOR  LARGE-SCALE  IMAGE  RECOGNITION》

https://arxiv.org/pdf/1409.1556v6.pdf


2015.04《Deep Visual-Semantic Alignments for Generating Image Descriptions》

https://arxiv.org/pdf/1412.2306v2.pdf

     当你把CNN和RNN(循环神经网络)结合在一起会产生什么?抱歉,别想错了,你并不能得到R-CNN;-);但确实能得到一个很不错的模型。Andrej Karpathy(我个人最喜欢的作者之一)和Fei-Fei Li所写的这篇文章就是着重于研究将CNN与双向RNN bidirectional RNN相结合生成用于描述图像区域的自然语言描述器。


2015.04Spatial Pyramid Pooling in Deep Convolutional  Networks for Visual Recognition

https://arxiv.org/pdf/1406.4729.pdf

2015.06《Spatial Transformer Networks》

https://arxiv.org/pdf/1506.02025.pdf
https://arxiv.org/pdf/1506.02025v1.pdf

      2015.06,Google DeepMind提出的Spatial Transformer Networks空间转换网络,相当于在传统的一层Convolution中间,装了一个“插件”,可以使得传统的卷积带有了裁剪、平移、缩放、旋转等特性;理论上,作者希望可以减少CNN的训练数据量,以及减少做data argument,让CNN自己学会数据的形状变换。这篇论文我相信会启发很多新的改进,也就是对卷积结构作出更多变化,还是比较有创意的。
     它提出了一种空间变形模块Spatial Transformer module。模块将输入图像进行某种变形从而使得后续层处理时更加省时省力。比起修改CNN的主要结构,作者更关注于对输入图像进行改造。它进行的改造主要有两条:姿态正规化pose normalization(主要指图像场景中的物体是否倾斜、是否拉伸)以及空间聚焦spatial attention(主要指在一个拥挤的图像中如何聚焦某个物体)。在传统CNN中,如果想要保证模型对尺度和旋转具有不变性,那么需要对应的大量训练样本。




2015.09 《Fast R-CNN》

https://arxiv.org/pdf/1504.08083.pdf
     Fast R-CNN针对之前模型的改进主要集中在这3个方面的问题。多个阶段的训练(卷积网络ConvNet、SVM、区域边界回归分析)计算负载很大且十分耗时。Fast R-CNN通过优化流程与改变各生成标定区域的顺序,先计算卷积层,再将其结果用于多个不同的功能计算模块,以此解决速度的问题。在模型中,输入图像首先通过一个ConvNet,从其最后输出的特征图层中获取特征标定区域(更多信息参考论文2.1节paper),最后将其同时输入全连通层、回归分析模块以及分类模块。(译者按:这段基本上为字面翻译,然而有许多不合常理的地方。从图中看出标定区域似乎是在ConvNet之前,跟文中所述矛盾;另外图中似乎应该有多个ROI区域,并行地进行ConvNet,输出结果再并行输入FC,regressor等)


2015.12《Deep Residual Learning for Image Recognition》

https://arxiv.org/pdf/1512.03385v1.pdf

微软亚研MRA在2015年提出的架构。ResNet是一个拥有152层网络架构的新秀,它集分类、检测与翻译功能于一身。除开层数破了纪录,ResNet自身的表现也破了ILSVRC2015的记录,达到了不可思议的3.6%(通常人类也只能达到5~10%的出错率,跟专业领域和技能相关。



2016年

2016.01《Faster R-CNN: Towards Real-Time ObjectDetection with Region Proposal Networks》

https://arxiv.org/pdf/1506.01497v3.pdf

    Faster R-CNN is used to solve some complex training processes in R-CNN and Fast R-CNN. The author inserted a region proposal network (RPN) after the last convolutional layer. RPN can generate region proposals from its input feature layer. After that, the process is the same as R-CNN (ROI pooling, full connection, classification and regression).

2016.05.09《You Only Look Once:Unified, Real-Time Object Detection》

https://arxiv.org/pdf/1506.02640v1.pdf
https://arxiv.org/abs/1506.02640v1

    With the emergence of the YOLO algorithm, the deep learning target detection algorithm began to be divided into two-stage ( two-stage ) and single-stage ( single-stage ). Different from the two- step detection algorithm represented by the R-CNN series, YOLO discards the candidate frame extraction branch ( Proposal stage), and directly completes feature extraction , candidate frame regression and classification in the same unbranched convolutional network, making The network structure becomes simpler, and the detection speed is nearly 10 times faster than that of Faster R - CNN . This enables deep learning target detection algorithms to meet the needs of real-time detection tasks with the computing power at that time.

2016.05.20R-FCN: Object Detection via Region-based Fully Convolutional Networks

https://arxiv.org/pdf/1605.06409v1.pdf
https://arxiv.org/abs/1605.06409v1

   2016.05.20,MSRA的Jifeng Dai、Kaiming He等人提出了R-FCN,通过position-positive score maps(位置敏感得分图)来解决这个矛盾。位置敏感得分图通过预测RoI中不同部位的类别投票表决产生该RoI的类别预测。



2016.12.25,《YOLO9000: Better, Faster, Stronger
https://arxiv.org/abs/1612.08242
https://arxiv.org/pdf/1612.08242.pdf

2016.12.29,《SSD:Single Shot MultiBox Detector》

YOLO 有一些缺陷:每个网格只预测一个物体,容易造成漏检;对于物体的尺度相对比较敏感,对于尺度变化较大的物体泛化能力较差。针对 YOLO 中的这些不足,该论文提出的方法 SSD 在这两方面都有所改进,同时兼顾了 mAP 和实时性的要求。

2016.12.29,SSDYOLO进行了改进,达到了和两阶段方法相当的精度,同时又保持了较快的运行速度。


2017年

2017.03.20,《Mask R-CNN》

https://arxiv.org/pdf/1703.06870.pdf
https://arxiv.org/abs/1703.06870
Kaiming He等提出了Mask R-CNN ,并获得ICCV2017 Best Paper Award。


2017.04.09,《Feature Pyramid Networks for Object Detection

https://arxiv.org/abs/1612.03144
https://arxiv.org/pdf/1612.03144.pdf

     2017.04.09,Tsung-Yi Lin、Piotr Dollar, Ross Girshick, Kaiming He齐聚Facebook又在Faster RCNN的基础上提出了Feature Pyramid NetworksFPN)检测算法。原有的目标检测算法通常都是只采用顶层特征做检测,原因是网络顶层特征的语义信息比较丰富。FPN的主要思想就是在网络前馈结束后,又将网络中最顶层的特征图像逐层地反馈并与前层的特征图进行融合,在此基础上,再从网络中不同深度的位置引出多个检测端口对其中的不同尺度的目标进行检测。由于网络在前馈过程中天然形成了金字塔形状的特征图,所以FPN对小目标以及尺度分布较大的目标具有天然的检测优势。


2017.08.27《Densely Connected Convolutional Networks》

https://arxiv.org/pdf/1608.06993.pdf
CVPR 2017, Cornell University postdoctoral fellow Dr. Gao Huang, Tsinghua University undergraduate Zhuang Liu, Facebook AI Research Scientist Laurens van The paper "Densely Connected Convolutional Networks" by der Maaten and Professor Kilian Q. Weinberger of the Department of Computer Science at Cornell University was selected as the best paper in CVPR 2017, and won this award together with Apple's first public paper "Learning From Simulated and Unsupervised Images through Adversarial Training". An honor.


2017.11.07《Dynamic Routing Between Capsules》

https://arxiv.org/abs/1710.09829
https://arxiv.org/pdf/1710.09829.pdf
2017.11.07, Hinton believes that reflection propagation and traditional neural networks are flawed, and proposes Capsule Net capsule network. However, it is currently half effective on data sets such as cifar, and this idea needs to continue to be verified and developed.


2018

2018.04.08,《YOLOv3: An Incremental Improvement
https://arxiv.org/abs/1804.02767
https://arxiv.org/pdf/1804.02767.pdf


Recommended collection, is being updated! ! ! ! !


CNN: The original text of the CNN classic paper on deep learning (1950~2018) Introduction and summary of the frame structure diagram (very valuable) Continuous update (blood hematemesis) - Jason niu

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326712180&siteId=291194637