In the CV task, image annotation helps the computer to better understand the image. Based on the known label information, the computer will learn similar rules applicable to new data recognition from the data.
CV labeling is divided into the following ways:
- Bounding Box Annotation
- Polygon callout
- key point labeling
- line callout
- Cuboid Dimensions (3D)
- semantic segmentation
Bounding Box Annotation
Bounding boxes are the most common type of image annotation. As the name implies, the annotator needs to draw a box around the target object according to specific requirements. Object detection models can be trained using bounding boxes .
Polygon callout
Polygon masks are mainly used to label objects with irregular shapes . Annotators must annotate the boundaries of objects in an image with high precision, so that they have a clear idea of the object's shape and size. Different from the way of labeling frame labeling, unnecessary areas around the target can be framed, which may affect the training of the model in some tasks. Polygon labeling can obtain more accurate positioning results in tasks due to its high labeling accuracy.
key point labeling
Landmark annotation is mainly suitable for visual tasks of detecting shape changes and small objects , which helps to better understand the motion changes of each point in the target object. Keypoint annotation can aid in gesture and facial recognition, and can also be used to detect human body parts and accurately estimate their poses.
line callout
Line annotation is applied to the task of training a vehicle perception model for lane detection by drawing lane line annotations . Unlike bounding boxes, it avoids a lot of white space and extra noise.
cuboid callout
3D cuboid annotation is used in vision tasks to calculate the depth of target objects , such as vehicles, buildings or even humans, and thus obtain their total volume. It is mainly used in the fields of construction and autonomous vehicle systems.
semantic segmentation
In semantic segmentation or pixel-level annotation, we group together pixels with similar attributes. It is suitable for detection and localization vision tasks of specific objects at the pixel level . Unlike polygonal segmentation, which is used to detect specific objects of interest (or regions of interest), semantic segmentation provides a complete understanding of each pixel of the scene in the image.