1. Keypoint Evaluation

This page describes the keypoint evaluation metrics used by COCO. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple metrics described below. To obtain results on the COCO test set, for which ground-truth annotations are hidden, generated results must be uploaded to the evaluation server. The exact same evaluation code, described below, is used to evaluate results on the test set.

1.1. Evaluation Overview

The COCO keypoint task requires simultaneously detecting objects and localizing their keypoints (object locations are not given at test time). As the task of simultaneous detection and keypoint estimation is relatively new, we chose to adopt a novel metric inspired by object detection metrics. For simplicity, we refer to this task as keypoint detection and the prediction algorithm as the keypoint detector. We suggest reviewing the evaluation metrics for object detection before proceeding.

The core idea behind evaluating keypoint detection is to mimic the evaluation metrics used for object detection, namely average precision (AP) and average recall (AR) and their variants. At the heart of these metrics is a similarity measure between ground truth objects and predicted objects. In the case of object detection, the IoU serves as this similarity measure (for both boxes and segments). Thesholding the IoU defines matches between the ground truth and predicted objects and allows computing precision-recall curves. To adopt AP/AR for keypoints detection, we only need to define an analogous similarity measure. We do so by defining an object keypoint similarity (OKS) which plays the same role as the IoU.

1.2. Object Keypoint Similarity

For each object, ground truth keypoints have the form [x1,y1,v1,...,xk,yk,vk], where x,y are the keypoint locations and v is a visibility flag defined as v=0: not labeled, v=1: labeled but not visible, and v=2: labeled and visible. Each ground truth object also has a scale s which we define as the square root of the object segment area. For details on the ground truth format please see the download page.

For each object, the keypoint detector must output keypoint locations and an object-level confidence. Predicted keypoints for an object should have the same form as the ground truth: [x1,y1,v1,...,xk,yk,vk]. However, the detector's predicted vi are notcurrently used during evaluation, that is the keypoint detector is not required to predict per-keypoint visibilities or confidences.

We define the object keypoint similarity (OKS) as:

OKS = Σi[exp(-di2/2s2κi2)δ(vi>0)] / Σi[δ(vi>0)]

The di are the Euclidean distances between each corresponding ground truth and detected keypoint and the vi are the visibility flags of the ground truth (the detector's predicted vi are not used). To compute OKS, we pass the di through an unnormalized Guassian with standard deviation sκi, where s is the object scale and κi is a per-keypont constant that controls falloff. For each keypoint this yields a keypoint similarity that ranges between 0 and 1. These similarities are averaged over all labeled keypoints (keypoints for which vi>0). Predicted keypoints that are not labeled (vi=0) do not affect the OKS. Perfect predictions will have OKS=1 and predictions for which all keypoints are off by more than a few standard deviations sκi will have OKS~0. The OKS is analogous to the IoU. Given the OKS, we can compute AP and AR just as the IoU allows us to compute these metrics for box/segment detection.

COCO key point evaluation metric

1. Keypoint Evaluation

1.1. Evaluation Overview

1.2. Object Keypoint Similarity

猜你喜欢