Author | Microsoft Research Asia
This article is reprinted with permission from Microsoft Research AI headlines (ID: MSRAsia)
1. Deep High-Resolution Representation Learning for Human Pose Estimation
Papers link: https: //arxiv.org/pdf/1902.09212.pdf
This paper presents a new network High-Resolution Network (HRNet), can learn a high spatial resolution and strong semantic precision table. Unlike other networks are the mainstream of network design two key points: remains High resolution characterization; convolutional parallel branches of different resolutions. Achieved a leading result on the human skeleton point detection and object detection, semantic image segmentation, face detection and other key visual problems, it has been widely accepted and used counterparts. The paper was published in CVPR 2019.
Open Source Address: https://github.com/HRNet
https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
2. VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Papers link: https: //arxiv.org/pdf/1908.08530.pdf
This article was published in ICLR 2020, was the first to propose a joint text and image pre-training model of one of the papers. Researchers propose a new generic multimodal pre-training model VL-BERT, the model is simple and powerful backbone network as Transformer model, and inputs the same time extended to form comprises visual and multimodal language input, visual semantics for most downstream task. To make use of VL-BERT model represents a more general feature, the researchers describe a large scale image data set generated in pre-training Conceptual Captions VL-BERT, the pre-training process was demonstrated to significantly improve the visual effect of downstream task semantics including visual commonsense reasoning, visual Q & A with an expression that references to understand and so on.
3. A Relation Network Based Approach to Curved Text Detection
Papers link: https: //icdar2019.org/list-of-accepted-papers/
This paper proposes a new innovative text detection framework based on a network of relationships (Relation Network), and effectively improve the accuracy of generic text line detection. The paper was published in ICDAR 2019 meeting.
4. An Anchor-free Region Proposal Network for Faster R-CNN-based Text Detection Approaches
Papers link: https: //www.springerprofessional.de/en/an-anchor-free-region-proposal-network-for-faster-r-cnn-based-te/17013452
This paper presents an object detection algorithm called anchor-free RPN of the RPN classical algorithm to solve the problem can not predict in any direction of the text box. The algorithm not only get good results at the word level word detection task, and similar ideas in the current object detection field has become mainstream. The paper published in the journal IJDAR.
5. Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-Block Parallel Optimization and Blockwise Model-Update Filtering
Papers link: https: //www.microsoft.com/en-us/research/wp-content/uploads/2016/08/0005880.pdf
This paper presents a general distributed optimization algorithm, introduced by updating the filter block model (BMUF) within the framework of incremental learning algorithm, linear acceleration while deep learning model training, maintaining the accuracy of the model. The paper was published in ICASSP 2016 conference.
6. Compressing CNN-DBLSTM Models for OCR with Teacher-Student Learning and Tucker Decomposition
Papers link: https: //www.sciencedirect.com/science/article/abs/pii/S0031320319302547
This paper proposes a method for compressing acceleration for the largest part of CNN CNN-DBLSTM model operation costs, that under the guidance of LSTM first part of the knowledge on the part of CNN distillation, and then use Tucker decomposition algorithm, further to CNN compression and acceleration, speed up to 14 times compared to the original model model thus obtained is running, solve deployment problems. The paper published in the journal Pattern Recognition.
7. An Open Vocabulary OCR System with Hybrid Word-Subword Language Models
Papers link: https: //ieeexplore.ieee.org/abstract/document/8270022
This paper presents a hybrid language model to word and sub-word units as the basic language to solve the set of outer word (Out of Vocabulary, OOV) issues an optical character recognition (OCR) in. The paper was published in ICDAR 2017 meeting.
8. Relation Networks for Object Detection
Papers link: https: //arxiv.org/pdf/1711.11575.pdf
On CVPR 2018, the paper presents the relationship between object module for plug and play, for the first time to achieve a complete end-to-object detector, which is one of the models in the field of visual attention from the first application.
9. Learning Region Features for Object Detection
Papers link: https: //arxiv.org/pdf/1803.07066.pdf
On ECCV 2018, the paper gives a general expression feature extraction area, and proposed area features a fully learn the extraction method.
10. Local Relation Networks for Image Recognition
Papers link: https: //arxiv.org/pdf/1904.11491.pdf
On ICCV 2019, the paper proposes a new neural network completely without convolution, convolution neural network made beyond the accuracy of the image classification on ImageNet baseline data set.
11. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Papers link: https: //arxiv.org/pdf/1904.11492.pdf
On ICCVW 2019, the paper changed the local network for non-academic understanding the working mechanism of the popular, and proposed a new efficient global network of relationships.
12. An Empirical Study of Spatial Attention Mechanisms in Deep Networks
Papers link: https: //arxiv.org/pdf/1904.05873.pdf
On ICCV 2019, researchers proposed a universal form of expression mechanism of attention on space and analyzes the performance of this common form of expression in different expression items on a variety of visual tasks, attention mechanism for future space applications for reference.
13. Deep Metric Transfer for Label Propagation with Limited Annotated Data
Papers link: https: //arxiv.org/pdf/1812.08781.pdf
This paper presents a new semi-supervised learning / transfer learning / small sample learning paradigm, the core of the paradigm is to get the initial image features the use of unsupervised pre-training method that obtains nearly 20% (in absolute value on semi-supervised learning ) to enhance the accuracy of the article published in the ICCVW 2019.
14. Deformable ConvNets v2: More Deformable, Better Results
Papers link: https: //arxiv.org/pdf/1811.11168.pdf
On CVPR 2019, the paper proposes a convolutional network more deformable than standard convolution which can significantly improve a wide variety of visual perception tasks accuracy, including image classification, object detection, semantic segmentation, object tracking and the like, for example, in the object detecting COCO benchmark standard convolutional networks under the same conditions can be achieved compared to nearly seven lifting points.
15. RepPoints: Point Set Representation for Object Detection
Papers link: https: //arxiv.org/pdf/1904.11490.pdf
The visual bounding box is a standard method of representing an object, in ICCV 2019, this paper presents a set of points instead of based on the object bounding box represents a new method, the new method has the ability to represent and be more explanatory. Based on this new representation, then we got the best non-anchor detector. This notation was recently extended to instances segmentation and body posture estimation.
16. A Twofold Siamese Network for Real-Time Object Tracking
Papers link: https: //arxiv.org/abs/1802.08817
The article published in CVPR 2018, proposed a twin dual network visual object tracking scheme, referred to as SA-Siam, where S is represented semantics (the Semantic) branch, and A represents the appearance (Appearance) branch. Two branches of both independent and complementary, achieved excellent tracking performance.
17. SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking
Papers link: https: //arxiv.org/abs/1904.04452
On CVPR 2019, the paper proposes a two-stage series-parallel structure matching and innovation to achieve robust object tracking precision. SPM tracker coarse focus stage matching semantic understanding, expressed in the fine focus matching stage appearance, and obtained by a different ideal balance training mode.
18. Unsupervised High-Resolution Depth Learning from Videos With Dual Networks
Papers link: https: //arxiv.org/abs/1910.08897
Articles published in the ICCV 2019, is proposed based on the estimated depth study architecture dual network structure, the use of deep global network to extract feature information input low-resolution image, using a network of shallow extract detailed features high-resolution input image information, both high resolution and then combined to estimate the depth. Compared with the conventional method of obtaining a better effect at a lower depth estimation computation, particularly for depth estimation result of the fine resolution of the sensitive region and the region distant areas of an image to enhance significantly the like.
19. Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments
Papers link: https: //arxiv.org/abs/1910.08898
On ICCV 2019, the paper proposes a more robust optical flow signal reconstruction oversight to address unsupervised depth estimation under more difficult indoor scenes. Compared with the traditional image reconstruction signal, for lack of a serious texture indoor scenes sparse to dense optical flow estimation method to obtain stable optical flow estimation, and optical flow information input camera estimates the network camera movement to overcome complex problems, thereby realized the depth estimation is stable at a depth of more complex and diverse layout of interior scenes.
20. Cross View Fusion for 3D Human Pose Estimation
Papers link: https: //arxiv.org/abs/1909.01203
Articles published in the ICCV 2019, made the first cross-network cameras feature fusion, by the "easy" features the perspective of integration into the "difficult" perspective, to effectively solve the problem of occlusion. On the Benchmark dataset significantly reduces 3D pose estimation error.
21. Optimizing Network Structure for 3D Human Pose Estimation
Papers link: https: //www.chunyuwang.org/img/ICCV_2019_CiHai.pdf
On ICCV 2019, the paper presents a web-based human body model Locally Connected Network, the network parameter is less, can effectively alleviate the Over-fitting.
22. Online Dictionary Learning for Approximate Archetypal Analysis
Papers link: https: //www.microsoft.com/en-us/research/publication/online-dictionary-learning-for-approximate-archetypal-analysis/
This article was published on ECCV 2018, proposed low-dimensional expression method of body posture, to ensure the accuracy of pose estimation projected by the way.
23. Part-Aligned Bilinear Representations for Person Re-identification
Papers link: http: //arxiv.org/pdf/1804.07094.pdf
The paper work in front of a weak supervision Deeply-Learned Part-Aligned Representations ( https://arxiv.org/pdf/1707.07256.pdf) based on the introduction of the body posture alignment to help the body member to enhance the recognition performance heavy pedestrian . This article was published in ECCV 2018.
24. Semantics-Aligned Representation Learning for Person Re-identification
Papers link: https: //arxiv.org/abs/1905.13143
This paper to be published in AAAI 2020, proposed alignment feature learning semantic network re-recognition of pedestrians. We view the introduction of the task of rebuilding the whole human semantic space aligned to achieve a given network by the ability of images to predict the appearance of a single human full view (perspective), to solve the heavy pedestrian recognition semantic space between the image misalignment problems.
25. Uncertainty-aware Multi-shot Knowledge Distillation for Image-based Object Re-identification
Papers link: https: //www.msra.cn/wp-content/uploads/2020/01/Uncertainty-aware-Multi-shot-Knowledge-Distillation-for-Image-based-Object-Re-identification.pdf
Will be published in the AAAI 2020, through a joint learning information to different pictures of the same object, for a more complete expression of the characteristics of the target, and use Teacher-Student Network will be targeted learned more comprehensive information transfer to students network (single image as input), to achieve the testing phase requires only a single image as input, but a more comprehensive and feature high discriminatory power extraction.
26. Mask-Guided Portrait Editing with Conditional GANs
Papers link: https: //arxiv.org/abs/1905.10346
Articles published in CVPR 2019, this model solves three problems people face synthesis: diversity, quality and control. In this paper, researchers have proposed a framework based cGAN can be separately to the eyes, nose, mouth, skin and hair for editing. Our model has many applications, such as faces editor, change the hair, enlarge the eyes, or make it smile. In addition, researchers can modify the appearance of existing local people face.
27. Learning Pyramid Context Encoder Network for High-Quality Image Inpainting
Papers link: http: //openaccess.thecvf.com/content_CVPR_2019/papers/Zeng_Learning_Pyramid-Context_Encoder_Network_for_High-Quality_Image_Inpainting_CVPR_2019_paper.pdf
Papers published CVPR 2019, based on the "from dark to light, multiple completion," the idea, put forward one kind of network coding context mechanism pyramid attention, you can generate a reasonable semantic rich texture and detail of the image content.
28. Learning 2D Temporal Adjacent Network for Moment Localization with Natural Language
Papers link: https: //arxiv.org/pdf/1912.03590.pdf
Paper presented at the AAAI 2020, proposed timing of the information processing problems in a new modeling ideas - time two-dimensional diagram, verify its effective based on the detection and video within video content targeting natural language description of human action two tasks sex.
29. Structured Knowledge Distillation for Semantic Segmentation
Papers link: https: //arxiv.org/abs/1903.04197v1
Published in CVPR 2019. This paper presents a structured knowledge distillation, distillation global structure information to the image segmentation to improve network performance and lightweight.
Open Source Address: https://github.com/irfanICMLL/structure_knowledge_distillation
(* This AI technology base camp due to reprint, please contact the original author)
◆
Highlighted
◆
To help combat the epidemic, reducing staff turnover and gather the line, PyCon community CSDN cooperation with PyCon officially licensed China held " Python Developer Day " online series summit. Dry content through exciting technology, a wide range of interesting online interactive activities, so you can stay at home to communicate with a large coffee to learn, to live through the fight against SARS crucial period. Scan code into the group details of the consultation!
Recommended Reading
AAAI 2020 paper read: Shang Micro releases new video semantic segmentation and optical flow algorithm Joint Learning
2020 trends at a glance: federal study, the end of the era of oligarchs cloud AutoML
Baidu Map can be found in patients with active trajectory 49 city; Google trademark application for a new operating system; VS Code 1.42 released
Internet collective telecommuting will eventually be short-lived?
Telecommuting FACES: "cloud" to eat, punch bed, dresser programming .....
Your point of each "look", I seriously as the AI