The difference between classification, target detection, semantic segmentation, and instance segmentation

There are many tasks in computer vision, including image classification, target detection, semantic segmentation, instance segmentation, and panoramic segmentation. What is the difference between them?

1. Image Classification

image classification(left in the figure below) is to judge the classification of the image. For example, in the learning classification, there are four types of data sets: person, sheep, dog and cat. Image classification requires a given picture What categories are included in the output image, for example, the example in the figure below contains three types: person, sheep, and dog.

insert image description here

2. Object detection

Target Detection(Above right) Simply put, what is in the picture? Where are they? (frame them with a rectangle)

Currently commonly used target detection algorithms are Faster R-CNN and YOLO-based target detection algorithms

3. Semantic segmentation

Target segmentation in the usual sense refers to semantic segmentation

semantic segmentation(Left in the picture below) It is necessary to distinguish every pixel in the picture, not just the rectangle frame. But different instances of the same object do not need to be segmented separately. On the left side of the picture below, it is marked as people, sheep, dogs, and grass. Instead of sheep 1, sheep 2, sheep 3, sheep 4, sheep 5 and so on.
insert image description here

4. Instance segmentation

instance segmentation(Above right) is actually **Object Detection and Semantic Segmentation**Combined. Compared with the bounding box of target detection, instance segmentation can be accurate to the edge of the object; compared with semantic segmentation, instance segmentation needs to mark different individuals of the same object on the map (sheep 1, sheep 2, sheep 3...)

The currently commonly used instance segmentation algorithm is Mask R-CNN .

Mask R-CNN performs pixel-level segmentation by adding a branch to Faster R-CNN that outputs a binary mask indicating whether a given pixel is part of a target object: this branch is based on convolutional neural network features Mapped fully convolutional network. Taking a given convolutional neural network feature map as input, the output is a matrix in which all positions where the pixel belongs to the object are represented by 1, and other positions are represented by 0, which is the binary mask.

Once these masks are generated, Mask R-CNN combines RoIAlign with classification and bounding boxes from Faster R-CNN for accurate segmentation:

5. Panoramic segmentation

panoramic segmentationIt is a combination of semantic segmentation and instance segmentation . The difference from instance segmentation is that instance segmentation only detects objects in the image and segments the detected objects, while panorama segmentation detects and segments all objects in the image, including the background.

Guess you like

Origin blog.csdn.net/qq_41931453/article/details/126176972