Annotation tool VoTT detailed tutorial

Annotation tool VoTT detailed tutorial

 

 

Article Directory

 

1 Overview

  • I haven't read the source code, just read the official README, and tried to use VoTT. Therefore, many details are definitely not clear.

1.1. Functions of VoTT

  •  Bounding box for image or video
  •  Irregular annotation results (image segmentation)
  •  Label multiple categories of tags (tags) to the same calibration box (bbox)
  •  Manually set the frame rate (fps) of the input video
  •  Built-in model automatic calibration bbox
  •  Annotation result statistics and visualization
  •  Annotation results are exported to various forms
  •  Before and after frame calibration frame association
  •  Calibration results for video clips
  •  Automatically browse video clips
  •  Import the marked bbox to the project

1.2. Download and installation

  • download:
  • installation:
    • I only tried to use the binary file under win, just click to run...

1.3. Impressions

  • I have used VoTT to mark for more than 50 hours, and write some thoughts.
  • Advantage:
    • Easy to install, just download the exe file directly.
    • The interface is good-looking (this is very important, after all, the marking time is generally very long and boring).
    • Functionally meets my needs (video selection frame + label box + multiple labels in the same box)
  • Disadvantages (may be achieved by modifying the source code, but not in the current release):
    • Cannot import existing annotation results (such as bbox).
    • The same project cannot be used on different computers.
    • Connection is very tasteless to me.
      • The basic function of Connection is to import data from different places as input.
      • But in my own use, I only use local files as input, and not the additional Bing/Azure functions provided in VoTT.
    • The Project function is completely useless to me and it adds a lot of work
      • Each project corresponds to two Connections. If there are too many videos in the Connection, it seems to be stuck, so many projects will be created.
      • Project has Security Token related content, which will encrypt part of the project information, which causes Project to be unable to open after changing the computer .
      • If you modify the parameters of the Project, it will cause some unexpected errors. For example, if the FPS is modified, some marked bboxes will not be visible.
    • The program is unstable and errors are often reported
      • When using the keyboard to mark multiple tags, if the keyboard clicks too fast, an error will be reported.
      • When there are more items, the program is often restarted. The computer configuration at home is very good, and restarting has no effect. The unit's computer configuration is not good, it will kill after restarting...
      • After there are more projects (now there are 70+ projects), when opening the historical project, sometimes you will not see the marked bbox (if you export, you will find a lot less bbox)... But if you just move to the marked one Frame, the bbox appears again (if export at this time, the missing bbox will appear again)

2. Basic usage

  • The goal of this chapter is to simply go through the process of using VoTT. The detailed description of some functions will be introduced in Chapter 3.
  • This chapter includes:
    • Ready to work
    • New Project
    • Label bbox
    • Export calibration results

2.1. Preparation

  • Put all the videos or images to be annotated into a folder.
    • Note that only the videos and images in this folder will be processed later, and the data in the subfolders will not be processed.
  • Create a new directory to save VoTT project information and project result output.
  • Be ready to label categories (tags).

2.2. New project

  • After opening VoTT, you can see the new project option.
    • image-20201127135303026
  • In the new project directory, the main contents to be filled include
    • Display Name: project name, just write
    • Security Token: Used to encrypt some sensitive information, I don’t know what it is for, and generally choose the default
    • Source Connection: the original data path (Connection will be introduced separately later)

    • Target Connection: target data storage path (Connection will be introduced separately later), save tags and project information

    • Description: project description
    • Frame Extraction Rate(frames per a video second): Video frame rate
    • Tags (not included in the screenshot): A list of tags to be labeled (such as the names of COCO 80 objects). For more tags related content, please refer to 3.3. Tags 设置
    • image_1ef9hmp4l11ii19ho9rij5n2rsm.png-82.3kBimage_1ef9hnnq1cas1lcrvkriec183t13.png-30.3kB
  • Connection introduction
    • The so-called Connection is actually the data path.
    • Classification: VoTT provides three types of Azure Blob Storge, Bing Image Search, and Local File System.
    • I generally use Local File System, and the parameters that need to be set are Display Name (this Display Name will be displayed in the Connection-related configuration of the new project, I usually set it as the folder name), Description, Folder Path (local folder path) .
      • image_1ef9ikske1tbg123o1vc3b2317491t.png-45.2kB
    • After the setting is complete, you can find the corresponding option in the Source/Target Connection drop-down menu of the newly created project
      • image_1ef9im5601lq3m2e1hparm41bbg2a.png-19.3kB
    • Connection has a single option in the left menu bar
      • image_1ef9itqod2f713oj1bn2nn61mng2n.png-72.8kB

2.3. Mark bbox

  • The basic process is divided into the following steps
    • Select video frame/image
      • Please refer to the video frame for more details 3.2. 视频帧相关
    • Picture frame
    • Set tags (tags), the same bbox can correspond to multiple tags
  • The above basic steps are as follows
    • Note that the bbox in the figure corresponds to three categories of tags (corresponding to 3/6/8 tags)
    • image_1ef9jcja195f1a9u1v98b1d13ns34.png-741.9kB
  • Use the built-in SSD model to automatically obtain bbox when drawing a frame
    • The quality of bbox may not be particularly high, but it can save some effort.
    • For more details, please refer to 3.1. 自动标定
    • The schematic diagram is as follows (click the doctor's hat button first to automatically obtain the calibration frame)
    • image_1ef9jhcam4rq1l632cj12dj7sc3k.png-714.2kB
  • Other details:
    • After drawing/deleting frames and setting/deleting tags, the annotation results will be automatically saved and manually set in disorder.
    • In 3.5. 标框的第三种方式introduces the use of some gadget marked.

 

 

 

 

 

2.4. Export annotation results

  • In the annotation page, you can quickly export the results, as shown in the figure below.
    • image_1ef9jujg7dn61cchhum14mdv2741.png-742.3kB
  • Specific export settings can refer to
    • image_1ef9jvj4q1g4h1hal10vd1ukd18ae4e.png-71.2kB
  • Please refer to the specific export result type and other details 3.4. 标注结果导出

3. Detailed function

3.1. Automatic calibration (Active Learning)

  • Realized function: frame the result of image/video frame.
  • Shortcut point:ctrl+D
  • In the left menu bar, there are options for automatic labeling, including functions
    • Model Provider: COCO SSD model can be used by default, or you can import local model or url model address by yourself.
    • Predict Tag: When using automatic tagging, you need to tag the category of the bbox. If it is not checked, only the box is marked, not the result.
    • Auto Detect: Whether to automatically perform automatic labeling when converting images/video frames.
    • image_1ef9kc9551bc855rego1304kdi4r.png-59.1kB
  • Functions that need to be understood by looking at the source code
    • How to import a custom model? This is to study the output of the model from the source code. I don't know if PyTorch will work.

3.2. Video frame correlation

  • In VoTT, the video is automatically framed according to the input frame rate.
  • Video frame categories are divided into three categories (as shown in the figure below):
    • The first category: video frame containing bbox (green vertical line)
    • The second category: video frames that have been viewed separately but have no results (yellow vertical lines)
    • The third category: video frames that have not been browsed separately (video frames without vertical lines)
    • image_1ef9kmn1j1ho9lfr48a9fk2s858.png-407.1kB
  • The so-called browsing alone refers to stopping alone to a certain frame
    • When playing the video, all the frames may be browsed, but not all of them are marked with yellow vertical lines.
  • Video frame selection
    • Directly select the mouse in the progress bar, and select "Video frames that have not been viewed individually", the current frame will be converted to "Video frames that have been individually viewed but without annotated results".
    • Previous/Next Frame: Select the previous or next frame, the shortcut key is  A/D.
      • Extract frames according to the video frame rate in the input settings.
      • The upper and lower frames selected here are adjacent frames and have nothing to do with the frame category.
    • Previous Tag Frame/Next Tag Frame: The shortcut keys are Q/E
      • The first type of frame is selected here, that is, the video frame that contains the bbox.
    • image_1ef9ngfg91i42abs1sjhp3410p6f.png-649.6kB

3.3. Tags setting

  • Some manifestations of the bbox selected in VoTT (if the bbox is a solid line indicates selected, and a dotted line indicates unselected), as shown in the figure below

    • image_1ef9levqq1qlnkk1g3f83516p05l.png-433.9kB
  • There are two ways to set the tag for the selected bbox

    • Mouse select the tags list on the left
    • The 10 types of money in the tags list can be set by shortcut keys (shortcut keys are [x]the information on the right of tags )
    • image_1ef9lob3nmsn1bfqv8cco4r4d62.png-484.7kB
  • When marking multiple bboxes in the same picture

    • If you use the shortcut key to select a tag, each bbox is marked with a tag by default and the next bbox will be selected.
    • When the last bbox is selected, it will not jump to the first bbox and start from the beginning, but will repeat the mark on the last bbox.
  • The Tag toolbar also includes some other functions, such as reording/lock

    • Reording is to change the order of tags, realized by the up and down arrows in the figure below
    • I don’t know what the lock is for, I don’t understand A tag can be locked for repeated tagging using the lock icon at the top of the tag editor pane.
    • image_1ef9nkdj417cgohv173b14o016rm6s.png-17.1kB
  • How to quickly set up tags

    • The way that VoTT supports is to Tagsinput one by one in the options of creating a project or project settings , which is very inconvenient (for example, to input COCO80 types...).
    • Programmable modified ways: by modifying the project file VoTT  my_project.vott achieve
      • The overall configuration file is a json file, which contains a parameter  tag list, each tag contains two attributes name/color, name is a string, color is an RGB string, such as #008000.
      • The overall structure is like "tags": [{"name": "name1", "color": "#595959"}, ...]
    • Note that there are two ways to open an existing VoTT project
      • Method 1: Open the .vott file manually.
      • Method 2: Select the recently opened project directly on the right.
    • When you use the modified .vottmethod to create a new tag, when you open the project for the first time, you can only open it with method one . If you use method two, the new tags will disappear.
      • image_1ef9qknelsrv11bnialatc1o0k9n.png-59.7kB

3.4. Export annotation results

  • Before introducing the export function, a few definitions should be clarified:
    • Classification of video frames: Please refer 3.2. to the content for details  . They are divided into three categories. The first category corresponds to Tagged, the second category corresponds to Visited, and the third category is no label.
    • The Visited/Tagged attribute of the video/picture, as shown in the figure below (I have seen Visited not marked, and marked Tagged)
      • image_1ef9obmcg7jf18etql1m8d67e7m.png-805.1kB
  • The export function has a separate menu bar (as shown in the figure below), including three options
    • image_1ef9ofrpf4gkkrnr0q1r4f8st83.png-71.3kB
  • Export data format (six types, not detailed)
    • image_1ef9p91a1eek1urv8781m701ftr8t.png-11.8kB
  • export data
    • All Asserts: All data
    • Only Visited Assets: Only Visited related data
    • Only Tagged Assets: Only Tagged related data
  • Include Images: Whether to include pictures in the exported data
  • Example: When exporting to PASCAL VOC format, the data is as follows
    • image_1ef9pdl468d115na1nebfpbqfd9a.png-21.7kB

3.5. The third way to mark frames

  • There are three main methods for bboxing. This section mainly introduces the third method.
    • Draw directly by yourself
    • Use Active Learning first and then adjust
    • Copy the bbox of other pictures/video frames to the current frame: This method is suitable for labeling consecutive frames in the video.
  • On the annotation page, there are several operations on regions (that is, bbox): copy/cut/paste/delete regions
    • The shortcut keys for the regions copy/cut/paste/select all operations are ctrl+c/x/v/a
    • You can mark the current frame, then select all + copy + select the next frame + paste, and then adjust the bbox.
    • image_1ef9oscvevk11n81bpj19blhjp8g.png-452.6kB

3.6. Other content not studied in detail

  • In  Project Settings the calibration results have visualized
    • image_1ef9o8c7oenrjaukuq9d100d79.png-127.8kB
  • Polygon calibration
  • Zoom in/out (not only the image zoom in/out, but also the overall UI zoom in/out, use tag to lock the shortcut keys, such as ctrl+0 to restore the default size)
  • Multiplayer collaboration
  • Security Token

Guess you like

Origin blog.csdn.net/qq_36958104/article/details/110482041