Microsoft COCO: Common Objects in Context - Test Guidelines

Microsoft COCO: Common Objects in Context - Test Guidelines

http://cocodataset.org/#home
http://cocodataset.org/#guidelines
Home -> Evaluate -> Test Guidelines
 
Test Guidelines
The COCO data can be obtained from the download page. Each challenge has a different training / validation / testing set, details are provided on the download page and summarized here:
http://cocodataset.org/#download
COCO 数据可以从 download page 获得。每次比赛有不同的 training / validation / testing set,详细信息在下载页面上提供,并在这里总结:


2014 Train/Val     Detection 2015, Captioning 2015, Detection 2016, Keypoints 2016
2014 Testing       Captioning 2015
2015 Testing       Detection 2015, Detection 2016, Keypoints 2016
2017 Train/Val/Test 
Detection 2017, Keypoints 2017, Stuff 2017,
Detection 2018, Keypoints 2018, Stuff 2018, Panoptic 2018
2017 Unlabeled     [optional data for any competition]

The recommended training data for participating in any COCO challenge consists of the corresponding COCO training set. Validation data may also be used for training when submitting results on the test set (although starting in 2017 the validation set only has 5K images, so the benefits of this are minimized). Note that the 2017 train/val data includes the same images as the 2014 train/val data just organized differently, so there is no benefit of using 2014 training data for the 2017 competition.
参与任何 COCO 挑战的推荐训练数据由相应的 COCO 训练集组成。验证数据也可用于在测试集上提交结果时进行训练 (尽管从 2017 年开始,验证集只有 5K 图像,validation set 用于训练带来的受益最小化)。请注意,2017 年的 train / val 数据包含 2014 年的 train / val 数据,组织形式不同。因此 2017 年的比赛使用 2014 年的训练数据没有任何受益。

External data of any form is allowed. Any and all external data used for training must be specified in the "method description" when uploading results to the server. We emphasize that any form of annotation or use of the COCO test sets for supervised or unsupervised training is strictly forbidden. Note: please explicitly specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
任何形式的外部数据都是允许的。将结果上传到服务器时,必须在“方法说明 (method description)”中指定用于训练的所有外部数据。我们强调,严格禁止任何形式的 COCO 测试集注释或使用的监督或无监督的培训。注意:将结果上传到评估服务器时,请在 “方法描述” 中明确指定用于培训的所有外部数据。

Test Set Splits
测试集拆分

Prior to 2017, the test set had four splits (dev / standard / reserve / challenge). Starting in 2017, we simplified the test set to only the dev / challenge splits, with the other two splits removed. The original purpose of the four splits was to protect the integrity of the challenge while giving researchers flexibility to test their system. After multiple years of running the challenges, we saw no evidence of overfitting to specific splits (the output space complexity and the test set size protect against simple attacks such as wacky boosting). Therefore, we simplified participation in the challenges accordingly in 2017.
http://blog.mrtz.org/2015/03/09/competition.html
在 2017 年之前,测试集有四个拆分 (dev / standard / reserve / challenge)。 从 2017 年开始,我们将测试集简化为只有 dev / challenge 拆分,其他两个拆分被删除。四个拆分的最初目的是为了保护挑战的完整性,同时让研究人员灵活地测试他们的系统。经过多年的挑战之后,我们没有看到过度适合特定拆分的证据 (输出空间的复杂性和测试集的大小可以防止简单的攻击,例如 wacky boosting)。因此,我们在2017年简化了参与挑战。

2017 Test Set Splits
2017测试集拆分

The 2017 COCO Test set consists of ~40K test images. The test set is divided into two roughly equally sized splits of ~20K images each: test-dev and test-challenge. Each is described in detail below. Additionally, when uploading to evaluation servers, we now allow submission of the 5K val split for debugging the upload process. Note that test set guidelines have changed in 2017, you can see the 2015 guidelines for old usage information. The test splits in 2017 are as follows:
2017 年 COCO 测试集包含 ~40K 个测试图像。 测试集被分成两个大致相同大小的分割约 20K 的图像: test-dev 和 test-challenge。每个都在下面详细描述。另外,当上传到评估服务器时,我们现在允许提交 5K val 分割以调试上传过程。请注意,2017年的测试集指南已更改,您可以查看旧版使用

split

#imgs

submit limit

scores available

leaderboard

Val

~5K

no limit

immediate

none

Test-Dev

~20K

5 per day

immediate

year-round

Test-Challenge

~20K

5 total

workshop

workshop


Test-Dev: The test-dev split is the default test data for testing under general circumstances. Results in papers should generally be reported on test-dev to allow for fair public comparison. The number of submissions per participant is limited to 5 uploads per day to avoid overfitting. Note that only a single submission per participant can be published to the public leaderboard (a paper, however, may report multiple test-dev results). The test-dev server will remain open year-round.
Test-Dev: test-dev split (拆分) 是在一般情况下测试的默认测试数据。通常论文中提供的结果应该来自于 test-dev 集,以便公正公开比较。每位参与者的提交次数限制为每天上传 5 次以避免过拟合。请注意,每个参与者只能向公众排行榜发布一次提交 (然而,论文可能会报告多个测试开发结果)。测试开发服务器将保持全年开放。

Test-Challenge: The test-challenge split is used for COCO challenges hosted on a yearly basis. Results are revealed during the relevant workshop (typically at ECCV or ICCV). The number of submissions per participant is limited to a maximum of 5 uploads total over the length of the challenge. If you submit multiple entries, the best results based on test-dev AP is selected as your entry for the competition. Note that only a single submission per participant can be published to the public leaderboard. The test-challenge server will remain open for a fixed amount of time prior to each year's competition.
Test-Challenge: test-challenge split 被用于每年的 COCO 比赛。结果在相关研讨会 (通常是 ECCV 或 ICCV) 中公布。每个参与者的提交数量限制在挑战过程中最多5次上传。如果您提交多个条目,则基于 test-dev AP 的最佳结果将被选中作为参赛者的参赛作品。请注意,每个参与者只能向公众排行榜发布一次提交。测试挑战服务器将在每年的比赛前保持一段固定的时间。

The images belonging to each split are defined in image_info_test-dev2017 (for test-dev) and image_info_test2017 (for combined test-dev and test-challenge). Info for test-challenge images is not explicitly provided. Instead, results must be submitted on the full test set (both test-dev and test-challenge) when participating in the challenge. This serves two goals. First, participants get automatic feedback on their submission by seeing evaluation results on test-dev prior to the challenge workshop. Second, after the challenge workshop, it gives future participants an opportunity to compare against challenge entries on the test-dev split. We emphasize that when submitting to the full test set (image_info_test2017), results must be generated on all images without differentiating between the splits. Finally, we note that 2017 dev / challenge splits contain the same images as the 2015 dev / challenge splits so results across years are directly comparable.
属于每个 split 的图像在 image_info_test-dev2017 (用于 test-dev)和 image_info_test2017 (用于 test-dev 和test-challenge) 中定义。test-challenge 图像的信息没有明确提供。相反,在参与挑战时,必须在完整的测试集 (包括test-dev 和test-challenge) 上提交结果。这有两个目标。首先,参与者在挑战研讨会之前通过在 test-dev 中看到评估结果,获得关于他们提交的自动反馈。其次,在挑战研讨会之后,它为未来的参与者提供了一个机会,可以与test-devsplit 的挑战条目进行比较。我们强调,当提交到完整的测试集 (image_info_test2017) 时,必须在所有图像上生成结果而不区分拆分。最后,我们注意到,2017年的 dev / challenge 分组包含与2015年 dev / challenge 分组相同的图像,因此跨越多年的结果可以直接进行比较。

It is not acceptable to create multiple accounts for a single project to circumvent the submission upload limits. If a group publishes two papers describing unrelated methods, separate user accounts may be created. For challenges, a group may create multiple accounts only if submitting substantially different methods to the challenge (e.g., based on different papers). To debug the upload process, we allow participants to submit unlimited evaluation results on the val set.
为单个项目创建多个帐户来规避提交上传限制是不可接受的。如果一个小组发表两篇描述不相关方法的论文,则可以创建单独的用户账号。对于挑战,只有在提出实质上不同的挑战方法 (例如基于不同的论文) 时,一个小组才可以创建多个帐户。为了调试上传过程,我们允许参与者在 val set 上提交无限的评估结果。

2015 Test Set Splits
This test set was used for 2015 and 2016 detection and keypoint challenges. It is no longer used and the evaluation servers are closed. However, for historical reference, you may obtain full information on the 2015 test splits by clicking here.
该测试集用于 2015 年和 2016 年的检测和关键点比赛。它不再使用,评估服务器关闭。但是,对于历史参考,完整信息如下。

2014 Test Set Splits
The 2014 test set is only used for the captioning challenge. Please see the caption eval page for details.
http://cocodataset.org/#captions-2015
http://cocodataset.org/#captions-eval







猜你喜欢

转载自blog.csdn.net/chengyq116/article/details/80711241