【Computer Vision】Thousand-word summary: Understand computer vision in one article, and remember to collect it when it is full of dry goods


Computer Vision (Computer Vision), often referred to as CV, is a research field that helps computers "see" and "understand" images through technology, such as enabling computers to understand the content of photos or videos.

I. Introduction

Computer Vision (Computer Vision), often referred to as CV, is a research field that helps computers "see" and "understand" images through technology, such as enabling computers to understand the content of photos or videos.

This post will give an overall introduction to computer vision. This article is divided into six parts, namely:

  • Why Computer Vision Matters
  • What is Computer Vision
  • Fundamentals of Computer Vision
  • Typical tasks in computer vision
  • Application scenarios of computer vision in daily life
  • Computer Vision Challenges

2. Why is computer vision important?

Physiologically, the generation of vision begins with the excitation of sensory cells of the visual organ, and is formed after the visual nervous system processes the collected information.

We humans intuitively understand the shape and state of things in front of us through vision, and most of us rely on vision to complete cooking, navigate obstacles, read street signs, watch videos, and countless other tasks.

In fact, if it is not for special groups such as the blind, the vast majority of people acquire external information through vision, and this proportion is as high as 80% - this proportion is not unfounded, the famous experimental psychologist Treicher has proved through a large number of experiments that 83% of the information obtained by humans comes from vision, 11% from hearing, and the remaining 6% from smell, touch, and taste.

Therefore, for humans, vision is undoubtedly the most important sense.

Not only humans are "visual animals", but for most animals, vision also plays a very important role. Through vision, humans and animals perceive the size, light and shade, color, and movement of external objects, and obtain various information that is important for the survival of the organism. Through this information, we can know what the surrounding world is like and how to interact with the world.

insert image description here

Before the advent of computer vision, images were a black box for computers.

An image is just a file and a string of data to a computer. The computer does not know what the contents of the picture are, but only knows the size of the picture, how much memory it occupies, and what format it is in, and so on.

insert image description here

If computers and artificial intelligence want to play an important role in the real world, they must understand pictures!

Therefore, for half a century, computer scientists have been trying to find ways to make computers have vision, resulting in the field of "computer vision".

insert image description here

The rapid development of the network has also made computer vision more important.

The figure below is a trend chart of the amount of new data on the Internet since 2020. Gray graphs are structured data, blue graphs are unstructured data (mostly images and videos). It can be clearly found that the number of pictures and videos is growing exponentially.

insert image description here

The internet is made up of text and images.

Searching for text is relatively straightforward, but in order to search for images, an algorithm needs to know what the image contains.

For a long period of time, humans did not have enough technology to understand the content of images and videos, and could only rely on manual annotation to obtain descriptions of images or videos.

How to enable computers to better understand these image information is a major challenge for today's computer technology.

To get the most out of image or video data, a computer needs to be able to "see" the image or video, and understand the content.

3. What is computer vision

Computer vision is an important branch of the field of artificial intelligence. Simply put, the problem it has to solve is: let the computer understand the content in the image or video.

for example:

  1. Is the pet in the picture a cat or a dog?
  2. Is the person in the picture Lao Zhang or Lao Wang?
  3. What are the people in the video doing?

Furthermore, computer vision refers to the use of cameras and computers instead of human eyes to identify, track and measure targets, and further graphics processing to obtain images that are more suitable for human observation or sent to instruments for detection.

As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain high-level information from images or multidimensional data.

From an engineering standpoint, it seeks to use automated systems to mimic the human visual system to accomplish tasks.

The ultimate goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously.

But it is very difficult to realize that the computer can perceive the world through the camera, because although the image captured by the camera is the same as what we usually see, for the computer, any image is just an arrangement and combination of pixel values. A pile of rigid numbers.

How to let the computer read meaningful visual clues from these rigid numbers is a problem that computer vision should solve.

4. Basic principles of computer vision

Anyone who has used a camera or a mobile phone knows that computers are good at taking pictures with amazing fidelity and detail. To some extent, the artificial "vision" of computers is much stronger than the natural visual ability of humans. But just like what we usually say "hearing is not the same as understanding", "seeing" is not the same as "understanding". It is not a simple matter for computers to truly "understand" images.

The image is a large grid of pixels, and each pixel has a color, which is a combination of three primary colors: red, green, and blue. By combining the intensities of three colors - called RGB values, we can get any color.

The simplest and most suitable computer vision algorithm for getting started is: to track a colored object, such as a pink ball, we first record the color of the ball, save the RGB value of the most central pixel, and then feed the image to the program , let the program find the closest pixel to this color.

Algorithms can start at the top left corner, examine each pixel, calculate the difference from the target color. After checking each pixel, the closest part of the pixels is likely to be the pixel where the ball is located.

This algorithm is not limited to running on this single picture, we can run the algorithm on every frame of the video, tracking the position of the ball.

Of course, due to the influence of light, shadow and other factors, the color of the ball will change, it will not be exactly the same as the RGB value we saved, but it will be very close. However, in some extreme cases, such as a football game at night, the tracking effect may be very poor; and if one of the team's jerseys is the same color as the ball, the algorithm is completely "dizzy". Therefore, unless the environment can be strictly controlled, such color tracking algorithms are rarely put into practical use.

Nowadays, more computer vision algorithms used generally involve the methods and technologies of "Deep Learning". Among them, the convolutional neural network (CNN) is the most widely used because of its superior performance.

Since the knowledge involved in "deep learning" is too extensive, this article will not describe it in more detail.

5. Typical tasks of computer vision

5.1 Image Classification

Image classification is to distinguish different types of images according to the semantic information of images. It is the core of computer vision and the basis of other high-level visual tasks such as object detection, image segmentation, object tracking, behavior analysis, and face recognition.

For example, in the picture below, through image classification, the computer recognizes people (person), tree (tree), grass (grass), and sky (sky) in the image.

insert image description here

Image classification has a wide range of applications in many fields, such as: face recognition and intelligent video analysis in the security field, traffic scene recognition in the traffic field, content-based image retrieval and automatic album classification in the Internet field, image recognition in the medical field wait.

5.2 Object Detection

The goal of the target detection task is to give an image or a video frame, let the computer find the location of all the targets in it, and give the specific category of each target.

As shown in the figure below, taking the recognition and detection of people as an example, the positions of all people in the image are marked with borders.

insert image description here

insert image description here
In multi-category target detection, borders of different colors are generally used to mark the positions of different detected objects, as shown in the figure below.

5.3 Semantic Segmentation

Semantic segmentation is a fundamental task in computer vision, where we need to classify visual input into different semantically interpretable categories.

It divides the entire image into groups of pixels, which are then labeled and classified.

For example, we might need to distinguish all pixels in an image that belong to cars, and color those pixels blue.

As shown in the figure below, the image is divided into people (red), trees (dark green), grass (light green), sky (blue) tags.

insert image description here

5.4 Instance Segmentation

Instance segmentation is a combination of target detection and semantic segmentation. The target is detected in the image (target detection), and then each pixel is labeled (semantic segmentation).

Comparing the above and below figures, it can be seen that if people are the target, semantic segmentation does not distinguish between different instances belonging to the same category (all are marked in red), and instance segmentation distinguishes different instances of the same type (use different colors to distinguish different people).

insert image description here

5.5 Target Tracking

Target tracking refers to the detection, extraction, identification and tracking of moving targets in the image sequence, obtaining the motion parameters of the moving targets, processing and analysis, and realizing the understanding of the behavior of the moving targets to complete higher-level detection tasks.

insert image description here

6. Application scenarios of computer vision in daily life

The application scenarios of computer vision are very extensive. Here are a few common application scenarios in life.

6.1 Face recognition on access control and Alipay

insert image description here

6.2 License plate recognition in parking lots and toll booths

insert image description here

6.3 Risk identification when uploading videos to websites or apps

insert image description here

6.4 Various selfie props on Douyin and other APPs (need to identify the position of the face first)

insert image description here

7. Challenges faced by computer vision

At present, computer vision technology is developing rapidly and has a preliminary industrial scale. The development of computer vision technology in the future mainly faces the following challenges:

One is how to better combine different application fields with other technologies. Computer vision can make extensive use of big data when solving certain problems. precision;

The second is how to reduce the development time and labor costs of computer vision algorithms. At present, computer vision algorithms require a large amount of data and manual annotation, and require a long R&D cycle to achieve the accuracy and time-consuming required by the application field;

The third is how to speed up the design and development of new algorithms. With the emergence of new imaging hardware and artificial intelligence chips, the design and development of computer vision algorithms for different chips and data acquisition equipment is also one of the challenges.

8. Conclusion

As one of the fastest growing and most widely used technologies in the field of artificial intelligence segmentation, computer vision is like the "eyes" of artificial intelligence, capturing and analyzing more information for all walks of life. With the change of algorithms, the upgrade of hardware computing power, the explosion of data, and the high-speed network brought by the development of 5G technology in the future, computer vision will also have a broader development space in terms of application, let us wait and see!

Guess you like

Origin blog.csdn.net/wzk4869/article/details/131239612