PASCAL Visual Object Classes Challenge 2011 (VOC2011) Annotation Guidelines

http://host.robots.ox.ac.uk/pascal/VOC/voc2012/guidelines.html

Guidelines on what and how to label.

What to label
All objects of the defined categories, unless:
- you are unsure what the object is.
- the object is very small (at your discretion).
- less than 10-20% of the object is visible, such that you cannot be sure what class it is. e.g. if only a tyre is visible it may belong to car or truck so cannot be labelled car, but feet / faces can only belong to a person.
If this is not possible because too many objects, mark image as bad.
如果图片内物体过多导致无法标注，标记为 bad.

Viewpoint
Record the viewpoint of the ‘bulk’ of the object e.g. the body rather than the head. Allow viewpoints within 10-20 degrees.
If ambiguous, leave as ‘Unspecified’. Unusually rotated objects e.g. upside-down people should be left as ‘Unspecified’.
图片中物体相对 camera 的角度，理论上有用，但实际上物体也会自己形变转动，感觉意义有限。
upside-down [‘ʌp,saɪd’daʊn]：颠倒的，乱七八糟的

Bounding box
Mark the bounding box of the visible area of the object (not the estimated total extent of the object).
Bounding box should contain all visible pixels, except where the bounding box would have to be made excessively large to include a few additional pixels (< 5%) e.g. a car aerial.
检测框需要包含物体全部像素，除非物体有过于细长伸出的臂 (伸出部分占像素 < 5%)。

Truncation
If more than 15-20% of the object lies outside the bounding box mark as Truncated. The flag indicates that the bounding box does not cover the total extent of the object.
如果物体超过 15-20% 的区域都不在标注框内，则备注为 Truncated.

Occlusion
If more than 5% of the object is occluded within the bounding box, mark as Occluded. The flag indicates that the object is not totally visible within the bounding box.
如果框内物体超过 5% 的区域被遮挡，则标注为 occluded.

Image quality / illumination
Images which are poor quality (e.g. excessive motion blur) should be marked bad. However, poor illumination (e.g. objects in silhouette) should not count as poor quality unless objects cannot be recognised.
对于图片质量差，比如过多运动模糊，过强光照，除了滤除坏样本，还可以考虑识别出场景，比如夜晚，或者强阳光情况。
Images made up of multiple images (e.g. collages) should be marked bad.

Clothing / mud / snow etc.
If an object is ‘occluded’ by a close-fitting occluder e.g. clothing, mud, snow etc., then the occluder should be treated as part of the object.
如果是物体被衣服，泥或雪这种可穿着的物体部分遮盖，则不算 occluded.
mud [mʌd]：泥，诽谤的话，无价值的东西

Transparency
Do label objects visible through glass, but treat reflections on the glass as occlusion.

Mirrors
Do label objects in mirrors.

Pictures
Label objects in pictures / posters / signs only if they are photorealistic but not if cartoons, symbols etc.

Guidelines on categorisation

分类

Aeroplane
Includes gliders but not hang gliders or helicopters
hang glider：悬挂式滑翔机，滑翔风筝
helicopter [‘helɪkɒptə]：直升飞机

Bicycle
Includes tricycles, unicycles
tricycle [‘traɪsɪk(ə)l]：三轮车
unicycle [‘ju:nisaikl]：独轮脚踏车

Bird
All birds

Boat
Ships, rowing boats, pedaloes but not jet skis
jet ski：摩托艇
pedalo [‘pedələʊ]：脚踏船

Bottle
Plastic, glass or feeding bottles
feeding bottle：奶瓶，哺乳瓶

Bus
Includes minibus but not trams

Car
Includes cars, vans, large family cars for 6-8 people etc.
Excludes go-carts, tractors, emergency vehicles, lorries / trucks etc.
Do not label where only the vehicle interior is shown.
Include toys that look just like real cars, but not ‘cartoony’ toys.
vehicle interior：汽车内饰
go-cart [‘ɡəukɑ:t]：早期的轻便马车，学走器，手推车，竞赛用的微型单座汽车
lorry [‘lɒrɪ]：卡车，货车，运料车

Cat
Domestic cats (not lions etc.)

Chair
Includes armchairs, deckchairs but not stools or benches.
Excludes seats in buses, cars etc.
Excludes wheelchairs.
stool [stuːl]：凳子，粪便，厕所
wheelchair [‘wiːltʃeə]：轮椅

Cow
All cows

Dining table
Only tables for eating at.
Not coffee tables, desks, side tables or picnic benches

Dog
Domestic dogs (not wolves etc.)

Horse
Includes ponies, donkeys, mules etc.
donkey [‘dɒŋkɪ]：驴子，傻瓜，顽固的人

Motorbike
Includes mopeds, scooters, sidecars
scooter [‘skuːtə]：小轮摩托车，速可达，单脚滑行车，小孩滑板车

People
Includes babies, faces (i.e. truncated people)

Potted plant
Indoor plants excluding flowers in vases, or outdoor plants clearly in a pot.

Sheep
Sheep, not goats

Sofa
Excludes sofas made up as sofa-beds

Train
Includes train carriages, excludes trams
tram [træm]：电车轨道，煤车

TV/monitor
Standalone screens (not laptops), not advertising displays
TV/显示器：笔记本电脑不算，电子广告牌不算。

Guidelines on segmentation

What to segment
Objects whose bounding boxes have been labelled according to the above guidelines.
You may need to exclude backpacks, handbags etc. which were included in the bounding box.
You may also need to include hands, chair legs etc. which were outside the bounding box.
分割画边界需要去除背包、手提包之类的身外物体，但不要漏掉椅子腿、手臂之类的这种长出去的部位。

Accuracy
Segment within 5 pixels. Labelled pixels MUST be the object;
pixels outside the 5-pixel border area MUST be background. Border pixels can be either. Use the tri-map displayed by the segmentation tool to ensure these constraints hold.
This may involve labelling pixels outside the bounding box.
分割描边精度有 5 个像素的自由度，保证外部的肯定是背景。

Mixed pixels/ transparency
Pixels which are mixed e.g. due to transparency, motion blur or the presence of a border should be considered to belong to the object whose colour contributes most to the mix.
对混合的像素 (由于运动，透明性等)，如果能确认属于物体，依旧要标注。

Thin structures
Aim to capture thin structures where possible, within the accuracy constraints. Structures of around one pixel thickness can be ignored e.g. wires, rigging, whiskers.
rigging：索具，绳索；装备，传动装置
whisker [‘wɪskə]：晶须；胡须；腮须

Objects on tables etc.
If a number of small objects are occluding an object e.g. cutlery / silverware on a dining table, they can be considered part of that object. The exception is if they are sticking out of the object (e.g. candles) where they should be truncated at the object boundary.
cutlery [‘kʌtlərɪ]：餐具，刀剑制造业
silverware [’sɪlvəweə]：银器，镀银餐具
candle [‘kænd(ə)l]：蜡烛，烛光，烛形物

Difficult images
Images which are overly difficult to segment to the required accuracy can be left unlabelled e.g. a nest of bicycles.
物体过于难标注，比如一堆自行车。

扩展知识

如果按一种固定标准去给出一个布尔值 (True / False)，则会让我们的标注与当前 project 绑定，甚至和算法绑定，这显然并不划算。
让标注人员估计物体可见的百分比，会带来极高的标注代价。加之 “However, training humans to visually inspect a bounding box with IOU of 0.3 and distinguish it from one with IOU 0.5 is surprisingly difficult.”[1]，这也会成为误差的来源。一个相对合理的折中，是可以分 3-5 级，来指示物体的 visibility.
在检测中我们用矩形框标记物体的主体区域，这个主体有些情况下可能只是物体的一部分，也可能互相重叠，而分割中我们必须准确的描出属于物体的全部像素。
不同的任务下，物体的定义会改变，而我们希望数据的标注尽可能重用。
由于现实世界和人类自身的模糊性 (fuzzyness)，以及任务的多样化，经常对于同类物体有不同的标注要求，而我们希望实现数据标注的重用。
规则一致性，是数据重用的必要条件。标注规则可以复杂，但只能有一个！
规则可以演进，而只要保证统一性，向前向后兼容。当出现多规则，规则间必定是不互通的，显然无法重用。
底层的规则尽量是原子的，易组合的，来满足上层的查询需求。

Wordbook

Visual Object Classes，VOC：视觉目标分类
Pattern Analysis, Statistical Modelling and Computational Learning，PASCAL

References