Which is the most amazing paper in the field of deep learning?

After years of development, deep learning has continuously produced many amazing research papers. As we delve deeper into these striking ideas and paradigm shifts, we will find that these papers have attracted widespread attention from researchers and have left an indelible mark with their disruptive concepts and far-reaching influence.

This article explores amazing papers in the field of deep learning, revealing its importance and profound impact on the scientific community.

Here we introduce 20 papers according to the year they were published. You can click on the link to view the original text of the paper.

1. The human hippocampus in neural networks: Memory Networks

Memory Networks was first proposed in FaceBook's 2014 paper. It proposed a readable and writable external memory module and jointly trained it with the inference component to finally obtain a memory module that can be flexibly operated.

Link: https://www.aminer.cn/pub/5550411a45ce0a409eb388b7/

2. Deep learning framework Caffe, one of the most popular open source deep learning frameworks in the world

The paper "Caffe: Convolutional Architecture for Fast Feature Embedding" published by Trevor Darrell, Ross B. Girshick, Jia Yangqing and others in 2014, caffe is a convolutional neural network framework based on C++/Python, with Python and MATLAB bindings. Generic convolutional neural networks and other deep models can be efficiently trained and deployed on commercial architectures. Caffe makes it easier to experiment and switch seamlessly between different platforms, making it easier to develop and deploy from prototypes to cloud environments. Moreover, the first author of the thesis, Jia Yangqing, relied on Caffe to join Google Brain as an intern after graduation, and also participated in the development of the TensorFlow framework.

Link: https://www.aminer.cn/pub/5550415c45ce0a409eb3a9a8/

3. The end-2-end network was proposed for the first time for semantic segmentation.

A pioneering work in the field of image segmentation, it was selected as a candidate paper for CVPR2015 Best Paper. The paper "Fully Convolutional Networks for Semantic Segmentation" published by neural network guru Jonathan Long in 2014 defines and describes fully convolutional networks in detail, explains their application in spatially dense prediction tasks, and compares them with previous models. Get in touch. The feature extraction network uses VGG and others to fine-tune, transfer and utilize learned features, and proposes a new way to fuse semantic information and pixel details.

Link: https://www.aminer.cn/pub/57a4e91dac44365e35c987bb/

4. Proposed the concept of neural network knowledge distillation for the first time

The paper "Distilling the Knowledge in a Neural Network" published by Hinton in 2015 is the most classic and the work that clearly proposes the concept of knowledge distillation. The article shows that distillation is very effective for converting knowledge from a holistic model or a highly regularized large model to a smaller distilled model. On MNIST, distillation works well even if the migration set used to train the distilled model lacks any examples of one or more classes.

Link: https://www.aminer.cn/pub/5550417545ce0a409eb3b767

5. Faster R-CNN, the birth of RPN network

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, this paper was published in 2016 by Ross B. Girshick, Sun Jian, He Kaiming and Ren Shaoqing, who are experts in the field of CV. This paper can be called one of the classic papers. When the generation of candidate frames takes a lot of time and is still an urgent problem that needs to be solved, Faster R-CNN was born. Faster R-CNN unifies candidate region generation, feature extraction, classification, and position refinement into a deep network framework. All calculations are not repeated and are completely completed in the GPU, greatly improving the running speed.

Link: https://www.aminer.cn/pub/5736986b6e3b12023e730129/

6. Groundbreaking work: using GAP to obtain CAM

A paper "Learning Deep Features for Discriminative Localization" on CVPR in 2016 has greatly inspired the research on weakly supervised learning. This paper mainly proves two conclusions through a series of experiments: 1) The features extracted by CNN contain position information, although we did not mark the position information during training; 2) These position information can be transferred to other among cognitive tasks.

Link: https://www.aminer.cn/pub/5736960e6e3b12023e520be8/

7.CVPR2016 best paper, cited 170,000+

The article "Deep residual learning for image recognition" published by He Kaiming in 2016 proposed up to 152 layers of ResNet. The model is reconstructed through residual learning and preprocessed. If the identity mapping is optimal, the solver can simply approach the identity mapping by approaching the weights of multiple nonlinear layers to zero.

Link: https://www.aminer.cn/pub/573696026e3b12023e515eec/

8.Hinton’s evaluation: You have to listen to it 10,000 times before you can really understand it.

The article "Opening the black box of Deep Neural Networks via Information" published by Prof. Tishby in 2017 uses information bottleneck to explain deep learning. It opens the black box of deep neural networks (DNNs) through information analysis methods and proposes a method. A method of analyzing DNNs on the information plane, that is, analyzing each layer of the network through the mutual information values ​​of input and output variables.

Link: https://www.aminer.cn/pub/5c890edd4895d9cbc6ac47d1/

9. “Irregular” convolutional neural network

The paper "Deformable Convolutional Networks" published in 2017 broke the standard convolution kernel and proposed a new convolution method: deformable convolution. On this basis, a new RoI pooling method was proposed: deformable The advantage of RoI pooling and DCN is that it can easily replace the ordinary convolution module in the existing CNN and does not require additional supervision, thereby achieving a more accurate recognition of irregular objects.

Link: https://www.aminer.cn/pub/599c7949601a182cd262c13a/

10. Best level in 3D point cloud benchmark, adaptive merging of multiple scales

The paper "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space" published in NIPS in 2017. This article introduces a hierarchical neural network to recycle PointNet after hive division on the input point set. Using metric space distance, this network can learn local features and larger semantic scales. Furthermore, the sampling density of point sets is different. When using a consistent density model, the performance will be reduced. A set learning layer is proposed to adaptively merge features at multiple scales.

Link: https://www.aminer.cn/pub/599c7945601a182cd2629f8d

11. ResNet in the eyes of mathematics teachers

Academician E Weinan published an article "A Proposal on Machine Learning via Dynamical Systems" in "Communications in Mathematics and Statistics" in 2017. He discussed the idea of ​​using continuous dynamic systems to model high-dimensional nonlinear functions, combined with ordinary differentials A new perspective on equations and deep residual networks, indicating that deep neural networks can be understood as discrete dynamic systems.

Link: https://www.aminer.cn/pub/5c3e7c43df5b8c0b3ccd0eb8/

12. Classic papers from the Google Brain team

The paper "Attention is All You Need" published by the Google Brain team in 2017. This paper pioneered the transformer model structure and directly opened a new era. It not only brought rapid development in the field of NLP, but also paved the way for the recent emergence of ChatGPT. Transformer models have great potential for machine translation tasks and are cheap to train, a fraction of the cost of the best existing models.

Link: https://www.aminer.cn/pub/599c7987601a182cd2648373/

13.2017 Artificial Intelligence “World Cup” ILSVRC Champion

The paper "Squeeze-and-Excitation Networks" published by the autonomous driving company Momenta in 2017. , giving different weights to each channel of the feature map. SENet is a brand-new image recognition structure that models the correlation between feature channels and strengthens important features to improve accuracy. This structure was the winner of the 2017 ILSVR competition. The top5 error rate reached 2.251%, which was 25% lower than the first place in 2016.

Link: https://www.aminer.cn/pub/5a260c8117c44a4ba8a30771/

14. Break traditional location limitations and model long-term dependencies

Included in an article "Self-Attention Generative Adversarial Networks" in ICML2019. The author proposes a self-attention generative adversarial network (SAGAN) for long-range dependency modeling in image generation tasks. Self-attention GAN achieves a large performance improvement. On the imagenet data set, the best result is improved from 36.8 to 52.52 (Inception score) and reduced from 27.62 to 18.65 (Fréchet Inception distance).

Link: https://www.aminer.cn/pub/5b3d98cc17c44a510f801bd3/

15. The best paper of ICLR2019, which increases the model training speed by 2-4 times

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, proposes a reduction method called the "lottery ticket hypothesis". Treat all parameters of a complex network as a prize pool. There is a sub-network corresponding to a set of sub-parameters in the prize pool (representing the winning number, the winning ticket in the article, that is, the double-color ball has chosen the correct number). To train this sub-network separately, you can Reach the test accuracy of the original network. On the MNIST and CIFAR10 data sets, the size of the "winning lottery ticket" is 10%~20% of many fully connected and convolutional feedforward architectures. Moreover, it learns faster and more accurately than the original network.

Link: https://www.aminer.cn/pub/5c75755bf56def97989e3bd4/

16. Must-read benchmark papers for getting started with recommendation systems

The Sina Weibo machine learning team published the paper "FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction" on RecSys19. The article proposes an advertising recommendation/click-through rate prediction algorithm based on deep learning, by using the Squeeze-Excitation network (SENET) structure to dynamically learn the importance of features and using a bilinear function to better model cross-features.

Link: https://www.aminer.cn/pub/5cf48a36da56291d58299524/

17. Cited nearly a thousand times, graph neural network has far-reaching influence

The paper "Simplifying Graph Convolutional Networks" was published at ICML 2019. In this paper, the authors observe that GCN inherits considerable complexity from its deep learning lineage, which may be burdensome and unnecessary for less demanding tasks. The authors worked to derive a linear model that "could" have preceded GCN if the "traditional" development path was followed, called Simple Graph Convolution (SGC), which can successively remove nonlinearities and collapse the weight matrix between successive layers.

Link: https://www.aminer.cn/pub/5d9edc8347c8f76646042a37

18. Text matching tool, Siamese network generates high-quality sentence embeddings

EMNLP 2019 paper "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks". In the article, the BERT network of Sentence-BERT is introduced, which uses Siamese network and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine similarity. This reduces the effort of finding the most similar pairs from 65 hours with BERT/RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy of BERT.

Link: https://www.aminer.cn/pub/5db9297d47c8f766461f7bb9/

19.GPT-3: Large language model small sample learning

The paper "Language Models are Few-Shot Learners" published by OpenAI in 2020. This article trained the autoregressive language model GPT-3 with 17.5 billion parameters, which is 10 times that of the previous non-sparse language model, and tested its performance under a small number of samples. GPT-3 achieves excellent performance on multiple natural language processing datasets, producing news articles that are difficult for humans to distinguish whether they were written by humans or not.

Link: https://www.aminer.cn/pub/5ed0e04291e011915d9e43ee/

20.CLIP—New visual pre-training model

An article "Learning Transferable Visual Models From Natural Language Supervision" published by OpenAI in 2021. It uses text as a supervision signal to train transferable visual models. After the training was completed, the author applied it to the zero-shot classification task. At the same time, the author also conducted a large number of experiments to prove that CLIP has good performance in terms of representation learning, robustness, and cognitive learning capabilities.

Link: https://www.aminer.cn/pub/603d8d919e795eac93d4c16f/


Content reference: https://www.zhihu.com/question/440729199

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/132299791