Robot vision based on computer vision: realizing the understanding and application of robot vision

Author: Zen and the Art of Computer Programming

1. Introduction

At present, humans have made great progress in collecting and processing image data. With the continuous iterative upgrade of technology, machine vision systems are also developing rapidly. Researchers in the field of artificial intelligence are applying these technologies to industrial fields, including vision processing for robots. Since the robot itself is a dynamic and changeable object, its visual input will be constantly updated as the environment changes. How to enable robots to accurately identify, track and understand various information in the world around them has become a very important issue.

2. Related research

Human beings can perceive a large amount of visual information in daily life, such as the objects we see with our eyes, the sounds we hear, and the movements we make with our bodies. For mechanical industrial production robots, they can only perform motion control through information obtained from sensors and cannot completely simulate the human visual system. Therefore, how to make robots have visual perception capabilities like humans, and then enable them to have autonomous decision-making capabilities and the ability to perform tasks, is still an important topic.

At present, mainstream methodologies mainly focus on target detection, tracking, classification and other methods based on deep learning. However, these methods still have some limitations. First, a large amount of training data is required, and manual annotation is complex, time-consuming, and error-prone; second, the detection speed is slow, especially in high-resolution images; finally, because there is no consistency between models, the results between different algorithms The gap is too large. Therefore, how to design a unified, efficient, accurate and future-proof machine vision system is a key difficulty in current research.

In recent years, multiple scholars in the field of artificial intelligence have proposed different visual methods. For example, YOLOv3, published in CVPR in 2017, uses a convolutional neural network (CNN) to detect objects and regress the object's bounding box and class probability distribution. Later, the DSOD method for object detection proposed by Facebook improved performance by stacking multiple deep networks. Mask R-CNN published in ICCV in 2019 uses Faster RCNN as the basis

Guess you like

Origin blog.csdn.net/universsky2015/article/details/131821195