An introductory tutorial on computer vision based on Paddle—Lecture 2 Classification of computer vision

B station tutorial address

https://www.bilibili.com/video/BV18b4y1J7a6/

traditional computer vision methods

insert image description here

Traditional computer vision can use Python libraries such as Opencv to perform simple operations on images, such as image scaling , filtering , threshold segmentation , and so on. For a computer, a color picture is a three-channel matrix , corresponding to three colors of **Red, Green and Blue (RGB)**, and a complete color picture can be displayed by changing the value of the color ( 0-255 ) For pictures, traditional computer vision revolves around this three-dimensional matrix, such as setting a color interval, performing filtering, and so on.

This type of visual processing method has relatively weak functions and can handle some simple application scenarios, such as recognizing green objects and recognizing dynamic objects . But for the actual scene with complex background, many problems are difficult to solve .

Recommended Opencv tutorial address: https://github.com/CodecWang/opencv-python-tutorial

deep learning

There are many algorithms for image processing through artificial intelligence, the most classic of which is the convolutional neural network , which performs continuous convolution operations on the original image , fully extracts features , and finally outputs the desired result. This method has been verified by practice. It has a very good precision performance , and can run real-time effects on many current hardware .

insert image description here

Of course, more new types of visual processing algorithms have also emerged, such as the recently popular Transformer algorithm, which was originally applied to NLP ( Natural Language Processing ). Recently, researchers have discovered that it also shows very good performance in the visual field. The best accuracy has been achieved in the field, breaking through the accuracy bottleneck of the convolutional neural network . Our tutorial in this issue still revolves around the convolutional neural network, a classic algorithm, which is still worthy of in-depth study.

insert image description here

Classification of Computer Vision Tasks

Classification

The classification task is to classify the entire picture, such as the most classic cat and dog classification .

insert image description here

Cat and dog classification is to let the computer classify the pictures I specify. If this picture is a cat, after I input the picture into the model, I expect the output to be the cat category. It can be seen that the classification task is to classify the entire picture . If there are both cats and dogs in a picture, then obviously the classification cannot be completed, because the classification task does not need to locate the object . The classification task is the simplest task of computer vision, the least difficult to achieve , and of course the simplest function .

Detection

insert image description here

Compared with the classification task, the detection task needs to accurately locate the target object in the image . Generally, a rectangular frame is used to determine the target position . As shown in the picture above, in a picture, there are dogs, bicycles, and cars. For the detection task, it is necessary to accurately frame their positions and distinguish the categories . The detection task is to identify the features of the objects in the image. Compared with the classification task, it is more difficult. It is also a requirement we often have. It is necessary to accurately determine the position of the feature object in the picture, such as pedestrian detection, face detection, and so on .

Segmentation

insert image description here

The difficulty of the segmentation task increases again. The task requires not only determining the position, but also outlining the outline of the object , similar to the cutout of PS , and filtering to remove the background. For example, the industrial meter reading shown in the figure above , lane line segmentation , etc. These tasks require a greater test of models and algorithms, and have certain applications in specific occasions.

Guess you like

Origin blog.csdn.net/weixin_45747759/article/details/122539215