29 the field of computer vision papers, Sylvia amazing! Included links!

Author | Microsoft Research Asia

This article is reprinted with permission from Microsoft Research AI headlines (ID: MSRAsia)

1. Deep High-Resolution Representation Learning for Human Pose Estimation

Papers link: https: //arxiv.org/pdf/1902.09212.pdf

This paper presents a new network High-Resolution Network (HRNet), can learn a high spatial resolution and strong semantic precision table. Unlike other networks are the mainstream of network design two key points: remains High resolution characterization; convolutional parallel branches of different resolutions. Achieved a leading result on the human skeleton point detection and object detection, semantic image segmentation, face detection and other key visual problems, it has been widely accepted and used counterparts. The paper was published in CVPR 2019.

Open Source Address: https://github.com/HRNet

https://github.com/leoxiaobin/deep-high-resolution-net.pytorch

2. VL-BERT: Pre-training of Generic Visual-Linguistic Representations

Papers link: https: //arxiv.org/pdf/1908.08530.pdf

This article was published in ICLR 2020, was the first to propose a joint text and image pre-training model of one of the papers. Researchers propose a new generic multimodal pre-training model VL-BERT, the model is simple and powerful backbone network as Transformer model, and inputs the same time extended to form comprises visual and multimodal language input, visual semantics for most downstream task. To make use of VL-BERT model represents a more general feature, the researchers describe a large scale image data set generated in pre-training Conceptual Captions VL-BERT, the pre-training process was demonstrated to significantly improve the visual effect of downstream task semantics including visual commonsense reasoning, visual Q & A with an expression that references to understand and so on.

3. A Relation Network Based Approach to Curved Text Detection

Papers link: https: //icdar2019.org/list-of-accepted-papers/

This paper proposes a new innovative text detection framework based on a network of relationships (Relation Network), and effectively improve the accuracy of generic text line detection. The paper was published in ICDAR 2019 meeting.

4. An Anchor-free Region Proposal Network for Faster R-CNN-based Text Detection Approaches

Papers link: https: //www.springerprofessional.de/en/an-anchor-free-region-proposal-network-for-faster-r-cnn-based-te/17013452

This paper presents an object detection algorithm called anchor-free RPN of the RPN classical algorithm to solve the problem can not predict in any direction of the text box. The algorithm not only get good results at the word level word detection task, and similar ideas in the current object detection field has become mainstream. The paper published in the journal IJDAR.

5. Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-Block Parallel Optimization and Blockwise Model-Update Filtering

Papers link: https: //www.microsoft.com/en-us/research/wp-content/uploads/2016/08/0005880.pdf

This paper presents a general distributed optimization algorithm, introduced by updating the filter block model (BMUF) within the framework of incremental learning algorithm, linear acceleration while deep learning model training, maintaining the accuracy of the model. The paper was published in ICASSP 2016 conference.

6. Compressing CNN-DBLSTM Models for OCR with Teacher-Student Learning and Tucker Decomposition

Papers link: https: //www.sciencedirect.com/science/article/abs/pii/S0031320319302547

This paper proposes a method for compressing acceleration for the largest part of CNN CNN-DBLSTM model operation costs, that under the guidance of LSTM first part of the knowledge on the part of CNN distillation, and then use Tucker decomposition algorithm, further to CNN compression and acceleration, speed up to 14 times compared to the original model model thus obtained is running, solve deployment problems. The paper published in the journal Pattern Recognition.

7. An Open Vocabulary OCR System with Hybrid Word-Subword Language Models

Papers link: https: //ieeexplore.ieee.org/abstract/document/8270022

This paper presents a hybrid language model to word and sub-word units as the basic language to solve the set of outer word (Out of Vocabulary, OOV) issues an optical character recognition (OCR) in. The paper was published in ICDAR 2017 meeting.

8. Relation Networks for Object Detection

Papers link: https: //arxiv.org/pdf/1711.11575.pdf

On CVPR 2018, the paper presents the relationship between object module for plug and play, for the first time to achieve a complete end-to-object detector, which is one of the models in the field of visual attention from the first application.

9. Learning Region Features for Object Detection

Papers link: https: //arxiv.org/pdf/1803.07066.pdf

On ECCV 2018, the paper gives a general expression feature extraction area, and proposed area features a fully learn the extraction method.

10. Local Relation Networks for Image Recognition

Papers link: https: //arxiv.org/pdf/1904.11491.pdf

On ICCV 2019, the paper proposes a new neural network completely without convolution, convolution neural network made beyond the accuracy of the image classification on ImageNet baseline data set.

11. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

Papers link: https: //arxiv.org/pdf/1904.11492.pdf

On ICCVW 2019, the paper changed the local network for non-academic understanding the working mechanism of the popular, and proposed a new efficient global network of relationships.

12. An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Papers link: https: //arxiv.org/pdf/1904.05873.pdf

On ICCV 2019, researchers proposed a universal form of expression mechanism of attention on space and analyzes the performance of this common form of expression in different expression items on a variety of visual tasks, attention mechanism for future space applications for reference.

13. Deep Metric Transfer for Label Propagation with Limited Annotated Data

Papers link: https: //arxiv.org/pdf/1812.08781.pdf

This paper presents a new semi-supervised learning / transfer learning / small sample learning paradigm, the core of the paradigm is to get the initial image features the use of unsupervised pre-training method that obtains nearly 20% (in absolute value on semi-supervised learning ) to enhance the accuracy of the article published in the ICCVW 2019.

14. Deformable ConvNets v2: More Deformable, Better Results

Papers link: https: //arxiv.org/pdf/1811.11168.pdf

On CVPR 2019, the paper proposes a convolutional network more deformable than standard convolution which can significantly improve a wide variety of visual perception tasks accuracy, including image classification, object detection, semantic segmentation, object tracking and the like, for example, in the object detecting COCO benchmark standard convolutional networks under the same conditions can be achieved compared to nearly seven lifting points.

15. RepPoints: Point Set Representation for Object Detection

Papers link: https: //arxiv.org/pdf/1904.11490.pdf

The visual bounding box is a standard method of representing an object, in ICCV 2019, this paper presents a set of points instead of based on the object bounding box represents a new method, the new method has the ability to represent and be more explanatory. Based on this new representation, then we got the best non-anchor detector. This notation was recently extended to instances segmentation and body posture estimation.

16. A Twofold Siamese Network for Real-Time Object Tracking

Papers link: https: //arxiv.org/abs/1802.08817

The article published in CVPR 2018, proposed a twin dual network visual object tracking scheme, referred to as SA-Siam, where S is represented semantics (the Semantic) branch, and A represents the appearance (Appearance) branch. Two branches of both independent and complementary, achieved excellent tracking performance.

17. SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking

Papers link: https: //arxiv.org/abs/1904.04452

On CVPR 2019, the paper proposes a two-stage series-parallel structure matching and innovation to achieve robust object tracking precision. SPM tracker coarse focus stage matching semantic understanding, expressed in the fine focus matching stage appearance, and obtained by a different ideal balance training mode.

18. Unsupervised High-Resolution Depth Learning from Videos With Dual Networks

Papers link: https: //arxiv.org/abs/1910.08897

Articles published in the ICCV 2019, is proposed based on the estimated depth study architecture dual network structure, the use of deep global network to extract feature information input low-resolution image, using a network of shallow extract detailed features high-resolution input image information, both high resolution and then combined to estimate the depth. Compared with the conventional method of obtaining a better effect at a lower depth estimation computation, particularly for depth estimation result of the fine resolution of the sensitive region and the region distant areas of an image to enhance significantly the like.

19. Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments

Papers link: https: //arxiv.org/abs/1910.08898

On ICCV 2019, the paper proposes a more robust optical flow signal reconstruction oversight to address unsupervised depth estimation under more difficult indoor scenes. Compared with the traditional image reconstruction signal, for lack of a serious texture indoor scenes sparse to dense optical flow estimation method to obtain stable optical flow estimation, and optical flow information input camera estimates the network camera movement to overcome complex problems, thereby realized the depth estimation is stable at a depth of more complex and diverse layout of interior scenes.

20. Cross View Fusion for 3D Human Pose Estimation

Papers link: https: //arxiv.org/abs/1909.01203

Articles published in the ICCV 2019, made the first cross-network cameras feature fusion, by the "easy" features the perspective of integration into the "difficult" perspective, to effectively solve the problem of occlusion. On the Benchmark dataset significantly reduces 3D pose estimation error.

21. Optimizing Network Structure for 3D Human Pose Estimation

Papers link: https: //www.chunyuwang.org/img/ICCV_2019_CiHai.pdf

On ICCV 2019, the paper presents a web-based human body model Locally Connected Network, the network parameter is less, can effectively alleviate the Over-fitting.

22. Online Dictionary Learning for Approximate Archetypal Analysis

Papers link: https: //www.microsoft.com/en-us/research/publication/online-dictionary-learning-for-approximate-archetypal-analysis/

This article was published on ECCV 2018, proposed low-dimensional expression method of body posture, to ensure the accuracy of pose estimation projected by the way.

23. Part-Aligned Bilinear Representations for Person Re-identification

Papers link: http: //arxiv.org/pdf/1804.07094.pdf

The paper work in front of a weak supervision Deeply-Learned Part-Aligned Representations ( https://arxiv.org/pdf/1707.07256.pdf) based on the introduction of the body posture alignment to help the body member to enhance the recognition performance heavy pedestrian . This article was published in ECCV 2018.

24. Semantics-Aligned Representation Learning for Person Re-identification

Papers link: https: //arxiv.org/abs/1905.13143

This paper to be published in AAAI 2020, proposed alignment feature learning semantic network re-recognition of pedestrians. We view the introduction of the task of rebuilding the whole human semantic space aligned to achieve a given network by the ability of images to predict the appearance of a single human full view (perspective), to solve the heavy pedestrian recognition semantic space between the image misalignment problems.

25. Uncertainty-aware Multi-shot Knowledge Distillation for Image-based Object Re-identification

Papers link: https: //www.msra.cn/wp-content/uploads/2020/01/Uncertainty-aware-Multi-shot-Knowledge-Distillation-for-Image-based-Object-Re-identification.pdf

Will be published in the AAAI 2020, through a joint learning information to different pictures of the same object, for a more complete expression of the characteristics of the target, and use Teacher-Student Network will be targeted learned more comprehensive information transfer to students network (single image as input), to achieve the testing phase requires only a single image as input, but a more comprehensive and feature high discriminatory power extraction.

26. Mask-Guided Portrait Editing with Conditional GANs

Papers link: https: //arxiv.org/abs/1905.10346

Articles published in CVPR 2019, this model solves three problems people face synthesis: diversity, quality and control. In this paper, researchers have proposed a framework based cGAN can be separately to the eyes, nose, mouth, skin and hair for editing. Our model has many applications, such as faces editor, change the hair, enlarge the eyes, or make it smile. In addition, researchers can modify the appearance of existing local people face.

27. Learning Pyramid Context Encoder Network for High-Quality Image Inpainting

Papers link: http: //openaccess.thecvf.com/content_CVPR_2019/papers/Zeng_Learning_Pyramid-Context_Encoder_Network_for_High-Quality_Image_Inpainting_CVPR_2019_paper.pdf

Papers published CVPR 2019, based on the "from dark to light, multiple completion," the idea, put forward one kind of network coding context mechanism pyramid attention, you can generate a reasonable semantic rich texture and detail of the image content.

28. Learning 2D Temporal Adjacent Network for Moment Localization with Natural Language

Papers link: https: //arxiv.org/pdf/1912.03590.pdf

Paper presented at the AAAI 2020, proposed timing of the information processing problems in a new modeling ideas - time two-dimensional diagram, verify its effective based on the detection and video within video content targeting natural language description of human action two tasks sex.

29. Structured Knowledge Distillation for Semantic Segmentation

Papers link: https: //arxiv.org/abs/1903.04197v1

Published in CVPR 2019. This paper presents a structured knowledge distillation, distillation global structure information to the image segmentation to improve network performance and lightweight.

Open Source Address: https://github.com/irfanICMLL/structure_knowledge_distillation

(* This AI technology base camp due to reprint, please contact the original author)

Highlighted

To help combat the epidemic, reducing staff turnover and gather the line, PyCon community CSDN cooperation with PyCon officially licensed China held " Python Developer Day " online series summit. Dry content through exciting technology, a wide range of interesting online interactive activities, so you can stay at home to communicate with a large coffee to learn, to live through the fight against SARS crucial period. Scan code into the group details of the consultation!

Recommended Reading

    Your point of each "look", I seriously as the AI

Released 1319 original articles · won praise 10000 + · views 5.72 million +

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/104218384