计算机视觉数据集大全 - Part2

转载自http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm

Index by Topic

Other helpful sites are:

Academic Torrents - computer vision - a set of 30+ large datasets available in BitTorrent form
Machine learning datasets - see CV tab
YACVID - a tagged index to some computer vision datasets

Hand, Hand Grasp, Hand Action and Gesture Databases

11k Hands - 11,076 hand images (1600 x 1200 pixels) of 190 subjects, of varying ages between 18 - 75, with metadata (id, gender, age, skin color, handedness, which hand, accessories, etc). (Mahmoud Afifi) [Before 28/12/19]
20bn-Jester - densely-labeled video clips that show humans performing predefined hand gestures in front of a laptop camera or webcam (Twenty Billion Neurons GmbH) [Before 28/12/19]
3D Articulated Hand Pose Estimation with Single Depth Images (Tang, Chang, Tejani, Kim, Yu) [Before 28/12/19]
A Dataset of Human Manipulation Actions - RGB-D of 25 objects and 6 actions (Alessandro Pieropan) [Before 28/12/19]
A Hand Gesture Detection Dataset (Javier Molina et al) [Before 28/12/19]
A-STAR Annotated Hand-Depth Image Dataset and its Performance Evaluation - depth data and data glove data, 29 images of 30 volunteers, Chinese number counting and American Sign Language (Xu and Cheng) [Before 28/12/19]
Bosphorus Hand Geometry Database and Hand-Vein Database (Bogazici University) [Before 28/12/19]
DemCare dataset - DemCare dataset consists of a set of diverse data collection from different sensors and is useful for human activity recognition from wearable/depth and static IP camera, speech recognition for Alzheimmer's disease detection and physiological data for gait analysis and abnormality detection. (K. Avgerinakis, A.Karakostas, S.Vrochidis, I. Kompatsiaris) [Before 28/12/19]
DVS128 Gesture Dataset - Event-based dataset, containing sequences of 11 hand gestures, performed by 29 subjects under several illumination conditions,captured using a DVS128 sensor. Each sequence is annotated with the start and stop times of each gesture. (Amir, Taba, Berg, Melano, McKinstry, Di Nolfo, Nayak, Andreopoulos, Garreau, Mendoza, Kusnitz, Debole, Esser, Delbruck, Flickner, and Modha) [7/1/20]
EgoGesture Dataset - First-person view gestures with 83 classes, 50 subjects, 6 scenes, 24161 RGB-D video samples (Zhang, Cao, Cheng, Lu) [Before 28/12/19]
EgoHands - A large dataset with over 15,000 pixel-level-segmented hands recorded from egocentric cameras of people interacting with each other. (Sven Bambach) [Before 28/12/19]
EgoYouTubeHands dataset - An egocentric hand segmentation dataset consists of 1290 annotated frames from YouTube videos recorded in unconstrained real-world settings. The videos have variation in environment, number of participants, and actions. This dataset is useful to study hand segmentation problem in unconstrained settings. (Aisha Urooj, A. Borji) [Before 28/12/19]
FORTH Hand tracking library (FORTH) [Before 28/12/19]
General HANDS: general hand detection and pose challenge - 22 sequences with different gestures, activities and viewpoints (UC Irvine) [Before 28/12/19]
Grasp UNderstanding (GUN-71) dataset - 12,000 first-person RGB-D images of object manipulation scenes annotated using a taxonomy of 71 fine-grained grasps. (Rogez, Supancic and Ramanan) [Before 28/12/19]
Hand gesture and marine silhouettes (Euripides G.M. Petrakis) [Before 28/12/19]
HandNet: annotated depth images of articulated hands 214971 annotated depth images of hands captured by a RealSense RGBD sensor of hand poses. Annotations: per pixel classes, 6D fingertip pose, heatmap. Train: 202198, Test: 10000, Validation: 2773. Recorded at GIP Lab, Technion. [Before 28/12/19]
HandOverFace dataset - A hand segmentation dataset consists of 300 annotated frames from the web to study the hand-occluding-face problem. (Aisha Urooj, A. Borji) [Before 28/12/19]
IDIAP Hand pose/gesture datasets (Sebastien Marcel) [Before 28/12/19]
Kinect and Leap motion gesture recognition dataset - The dataset contains 1400 different gestures acquired with both the Leap Motion and the Kinect devices(Giulio Marin, Fabio Dominio, Pietro Zanuttigh) [Before 28/12/19]
Kinect and Leap motion gesture recognition dataset - The dataset contains several different static gestures acquired with the Creative Senz3D camera. (A. Memo, L. Minto, P. Zanuttigh) [Before 28/12/19]
LISA CVRR-HANDS 3D - 19 gestures performed by 8 subjects as car driver and passengers (Ohn-Bar and Trivedi) [Before 28/12/19]
MPI Dexter 1 Dataset for Evaluation of 3D Articulated Hand Motion Tracking - Dexter 1: 7 sequences of challenging, slow and fast hand motions, RGB + depth (Sridhar, Oulasvirta, Theobalt) [Before 28/12/19]
MSR Realtime and Robust Hand Tracking from Depth - (Qian, Sun, Wei, Tang, Sun) [Before 28/12/19]
Mobile and Webcam Hand images database - MOHI and WEHI - 200 people, 30 images each (Ahmad Hassanat) [Before 28/12/19]
NTU-Microsoft Kinect HandGesture Dataset - This is a RGB-D dataset of hand gestures, 10 subjects x 10 hand gestures x 10 variations. (Zhou Ren, Junsong Yuan, Jingjing Meng, and Zhengyou Zhang) [Before 28/12/19]
NUIG_Palm1 - Database of palmprint images acquired in unconstrained conditions using consumer devices for palmprint recognition experiments. (Adrian-Stefan Ungureanu) [Before 28/12/19]
NYU Hand Pose Dataset - 8252 test-set and 72757 training-set frames of captured RGBD data with ground-truth hand-pose, 3 views (Tompson, Stein, Lecun, Perlin) [Before 28/12/19]
PRAXIS gesture dataset - RGB-D upper-body data from 29 gestures, 64 volunteers, several repetitions, many volunteers have some cognitive impairment (Farhood Negin, INRIA) [Before 28/12/19]
Rendered Handpose Dataset - Synthetic dataset for 2D/ 3D Handpose Estimation with RGB, depth, segmentation masks and 21 keypoints per hand (Christian Zimmermann and Thomas Brox) [Before 28/12/19]
Sahand Dynamic Hand Gesture Database - This database contains 11 Dynamic gestures designed to convey the functions of mouse and touch screens to computers. (Behnam Maleki, Hossein Ebrahimnezhad) [Before 28/12/19]
Sheffield gesture database - 2160 RGBD hand gesture sequences, 6 subjects, 10 gestures, 3 postures, 3 backgrounds, 2 illuminations (Ling Shao) [Before 28/12/19]
UT Grasp Data Set - 4 subjects grasping a variety of objectss with a variety of grasps (Cai, Kitani, Sato) [Before 28/12/19]
Yale human grasping data set - 27 hours of video with tagged grasp, object, and task data from two housekeepers and two machinists (Bullock, Feix, Dollar) [Before 28/12/19]

Image, Video and Shape Database Retrieval

2D-to-3D Deformable Sketches - A collection of deformable 2D contours in pointwise correspondence with deformable 3D meshes of the same class; around 10 object classes are provided, including humans and animals. (Lahner, Rodola) [Before 28/12/19]
3D Deformable Objects in Clutter - A dataset for 3D deformable object-in-clutter, with point-wise ground truth correspondence across hundreds of scenes and spanning multiple classes (humans, animals). (Cosmo, Rodola, Masci, Torsello, Bronstein) [Before 28/12/19]
ANN_SIFT1M - 1M Flickr images encoded by 128D SIFT descriptors (Jegou et al) [Before 28/12/19]
Brown Univ 25/99/216 Shape Databases (Ben Kimia) [Before 28/12/19]
CIFAR-10 - 60K 32x32 images from 10 classes, with a 512D GIST descriptor (Alex Krizhevsky) [Before 28/12/19]
CLEF-IP 2011 evaluation on patent images [Before 28/12/19]
Contour Drawing Dataset - a dataset of 5,000 paired images and contour drawings for the study of visual understanding and sketch generation (Li, Lin, Měch, Yumer, and Ramanan) [9/1/20]
DeepFashion - Large-scale Fashion Database(Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang) [Before 28/12/19]
EMODB - Thumbnails of images in the picsearch image search engine together with the picsearch emotion keywords (Reiner Lenz etc.) [Before 28/12/19]
ETU10 Silhouette Dataset - The dataset consists of 720 silhouettes of 10 objects, with 72 views per object. (M. Akimaliev and M.F. Demirci) [Before 28/12/19]
European Flood 2013 - 3,710 images of a flood event in central Europe, annotated with relevance regarding 3 image retrieval tasks (multi-label) and important image regions. (Friedrich Schiller University Jena, Deutsches GeoForschungsZentrum Potsdam) [Before 28/12/19]
Fashion-MNIST - A MNIST-like fashion product database. (Han Xiao, Zalando Research) [Before 28/12/19]
Fish Shape Database - It's a Fish Shape Database with 100, 2D point set shapes. (Adrian M. Peter) [Before 28/12/19]
Flickr 30K - images, actions and captions (Peter Young et al) [Before 28/12/19]
Flickr15k - Sketch based Image Retrieval (SBIR) Benchmark - Dataset of 330 sketches and 15,024 photos comprising 33 object categories,benchmark dataset commonly used to evaluate Sketch based Image Retrieval (SBIR) algorithms. (Hu and Collomosse, CVIU 2013) [Before 28/12/19]
Hands in action (HIC) IJCV dataset - Data (images, models, motion) for tracking 1 hand or 2 hands with/o 1 object. Includes both *single-view RGB-D sequences (1 subject, >18 annotated sequences, 4 objects, complete RGB image), and *multi-view RGB sequences (1 subject, HD, 8 views, 8 sequences - 1 annotated, 2 objects). (Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, Juergen Gall) [Before 28/12/19]
IAPR TC-12 Image Benchmark (Michael Grubinger) [Before 28/12/19]
IAPR-TC12 Segmented and annotated image benchmark (SAIAPR TC-12): (Hugo Jair Escalante) [Before 28/12/19]
ImageCLEF 2010 Concept Detection and Annotation Task (Stefanie Nowak) [Before 28/12/19]
ImageCLEF 2011 Concept Detection and Annotation Task - multi-label classification challenge in Flickr photos [Before 28/12/19]
INRIA Copydays dataset - for evaluation of copy detection: JPEG, cropping and "strong" copy attacks. (INRIA) [Before 28/12/19]
INRIA Holidays dataset - for evaluation of image search: 500 queries and 991 corresponding relevant images (Jegou, Douze and Schmid) [Before 28/12/19]
MA14KD (Movie Attraction 14K Dataset) Dataset - 14K movie/TV trailers, 10 features each, links to a rating dataset (Elahi, Moghaddam, Hosseini, Trattner, Tkalčič) [Before 28/12/19]
METU Trademark datasetThe METU Dataset is composed of more than 900K real logos belonging to companies worldwide. (Usta Bilgi Sistemleri A.S. and Grup Ofis Marka Patent A.S) [Before 28/12/19]
McGill 3D Shape Benchmark (Siddiqi, Zhang, Macrini, Shokoufandeh, Bouix, Dickinson) [Before 28/12/19]
MPI MANO & SMPL+H dataset - Models, 4D scans and registrations for the statistical models MANO (hand-only) and SMPL+H (body+hands). For MANO there are ~2k static 3D scans of 31 subjects performing up to 51 poses. For SMPL+H we include 39 4D sequences of 11 subjects. (Javier Romero, Dimitrios Tzionas and Michael J Black) [Before 28/12/19]
Multiview Stereo Evaluation - Each dataset is registered with a "ground-truth" 3D model acquired via a laser scanning process(Steve Seitz et al) [Before 28/12/19]
NIST SHREC - 2014 NIST retrieval contest databases and links (USA National Institute of Standards and Technology) [Before 28/12/19]
NIST SHREC - 2013 NIST retrieval contest databases and links (USA National Institute of Standards and Technology) [Before 28/12/19]
NIST SHREC 2010 - Shape Retrieval Contest of Non-rigid 3D Models (USA National Institute of Standards and Technology) [Before 28/12/19]
NIST TREC Video Retrieval Evaluation Database (USA National Institute of Standards and Technology) [Before 28/12/19]
NUS-WIDE - 269K Flickr images annotated with 81 concept tags, enclded as a 500D BoVW descriptor (Chau et al) [Before 28/12/19]
Princeton Shape Benchmark (Princeton Shape Retrieval and Analysis Group) [Before 28/12/19]
PairedFrames - evaluation of 3D pose tracking error - Synthetic and Real dataset to test 3D pose tracking/refinement with pose initialization close/far to/from minima. Establishes testing frame pairs of increasing difficulty, to measure the pose estimation error separately, without employing a full tracking pipeline. (Dimitrios Tzionas, Juergen Gall) [Before 28/12/19]
Queensland cross media dataset - millions of images and text documents for "cross-media" retrieval (Yi Yang) [Before 28/12/19]
Reconstructing Articulated Rigged Models from RGB-D Videos (RecArt-D) - Dataset of objects deforming during manipulation. Includes 4 RGB-D sequences (RGB image complete), result of deformable tracking for each object, as well as 3D mesh and Ground-Truth 3D skeleton for each object. (Dimitrios Tzionas, Juergen Gall) [Before 28/12/19]
Reconstruction from Hand-Object Interactions (R-HOI) - Dataset of one hand interacting with an unknown object. Includes 4 RGB-D sequences, in total 4 objects, the RGB image is complete. Includes tracked 3D motion and Ground-Truth meshes for the objects. (Dimitrios Tzionas, Juergen Gall) [Before 28/12/19]
Revisiting Oxford and Paris (RevisitOP) - Improved and more challenging version (fixed errors, new annotation and evaluation protocols, new query images) of the well known landmark/building retrieval datasets accompanied with 1M distractor images. (F. Radenovic, A. Iscen, G. Tolias, Y. Avrithis, O. Chum) [Before 28/12/19]
SHREC'16 Deformable Partial Shape Matching - A collection of around 400 3D deformable shapes undergoing strong partiality transformations, with point-to-point ground truth correspondence included. (Cosmo, Rodola, Bronstein, Torsello) [Before 28/12/19]
SHREC 2016 - 3D Sketch-Based 3D Shape Retrieval - data to evaluate the performance of different 3D sketch-based 3D model retrieval algorithms using a hand-drawn 3D sketch query dataset on a generic 3D model dataset (Bo Li) [Before 28/12/19]
SHREC'17 Deformable Partial Shape Retrieval - A collection of around 4000 deformable 3D shapes undergoing severe partiality transformations, in the form of irregular missing parts and range data; ground truth class information is provided. (Lahner, Rodola) [Before 28/12/19]
SHREC Watertight Models Track (of SHREC 2007) - 400 watertight 3D models (Daniela Giorgi) [Before 28/12/19]
SHREC Partial Models Track (of SHREC 2007) - 400 watertight 3D DB models and 30 reduced watertight query models (Daniela Giorgi) [Before 28/12/19]
SBU Captions Dataset - image captions collected for 1 million images from Flickr (Ordonez, Kulkarni and Berg) [Before 28/12/19]
Sketch me That Shoe - Sketch-based object retrieval in a fine-grained setting. Match sketches to specific shoes and chairs. (Qian Yu, QMUL, T. Hospedales Edinburgh/QMUL). [Before 28/12/19]
TOSCA 3D shape database (Bronstein, Bronstein, Kimmel) [Before 28/12/19]
Totally Looks Like - A benchmark for assessment of predicting human-based image similarity (Amir Rosenfeld, Markus D. Solbach, John Tsotsos) [Before 28/12/19]
UCF-CrossView Dataset: Cross-View Image Matching for Geo-localization in Urban Environments - A new dataset of street view and bird's eye view images for cross-view image geo-localization. (Center for Research in Computer Vision, University of Central Florida) [Before 28/12/19]
YouTube-8M Dataset - A Large and Diverse Labeled Video Dataset for Video Understanding Research. (Google Inc.) [Before 28/12/19]

Object Databases

2.5D/3D Datasets of various objects and scenes (Ajmal Mian) [Before 28/12/19]
3D Object Recognition Stereo DatasetThis dataset consists of 9 objects and 80 test images. (Akash Kushal and Jean Ponce) [Before 28/12/19]
3D Photography Dataseta collection of ten multiview data sets captured in our lab(Yasutaka Furukawa and Jean Ponce) [Before 28/12/19]
3D-Printed RGB-D Object Dataset - 5 objects with groundtruth CAD models and camera trajectories, recorded with various quality RGB-D sensors(Siemens & TUM) [Before 28/12/19]
3DNet Dataset - The 3DNet dataset is a free resource for object class recognition and 6DOF pose estimation from point cloud data. (John Folkesson et al.) [Before 28/12/19]
ABC Dataset - A million CAD models, including ground analytical descriptions (spline patches), dense meshes, point clouds, normals. (Koch, Matveev, Jiang, Williams, Artemov, Burnaev, Alexa, Zorin, Panozzo) [2/1/20]
Aligned 2.5D/3D datasets of various objects - Synthesized and real-world datasets for object reconstruction from a single depth view. (Bo Yang, Stefano Rosa, Andrew Markham, Niki Trigoni, Hongkai Wen) [Before 28/12/19]
Amsterdam Library of Object Images (ALOI): 100K views of 1K objects (University of Amsterdam/Intelligent Sensory Information Systems) [Before 28/12/19]
Animals with Attributes 2 - 37322 (freely licensed) images of 50 animal classes with 85 per-class binary attributes. (Christoph H. Lampert, IST Austria) [Before 28/12/19]
ASU Office-Home Dataset - Object recognition dataset of everyday objects for domain adaptation (Venkateswara, Eusebio, Chakraborty, Panchanathan) [Before 28/12/19]
B3DO: Berkeley 3-D Object Dataset - household object detection (Janoch et al) [Before 28/12/19]
Bristol Egocentric Object Interactions Dataset - egocentric object interactions with synchronised gaze (Dima Damen) [Before 28/12/19]
CIFAR-10H - a new dataset of soft labels reflecting human perceptual uncertainty for the 10,000-image CIFAR-10 test set (Peterson, Battleday, Griffiths, Russakovsky) [14/1/20]
CORE image dataset - to help learn more detailed models and for exploring cross-category generalization in object recognition. (Ali Farhadi, Ian Endres, Derek Hoiem, and David A. Forsyth) [Before 28/12/19]
CTU Color and Depth Image Dataset of Spread Garments - Images of spread garments with annotated corners. (Wagner, L., Krejov D., and Smutn V. (Czech Technical University in Prague)) [Before 28/12/19]
Caltech 101 (now 256) category object recognition database (Li Fei-Fei, Marco Andreeto, Marc'Aurelio Ranzato) [Before 28/12/19]
Catania Fish Species Recognition - 15 fish species, with about 20,000 sample training images and additional test images (Concetto Spampinato) [Before 28/12/19]
COCO-Stuff dataset - 164K images labeled with 'things' and 'stuff' (Caesar, Uijlings, Ferrari) [Before 28/12/19]
Columbia COIL-100 3D object multiple views (Columbia University) [Before 28/12/19]
Country Flags in the Wild - 12,854 train images and 6,110 test images of the flags of 224 different countries manually cropped to loosely fit to the inlying flags. (Jetley) [Before 28/12/19]
COWC - Cars Overhead with Context. 32,716 unique annotated cars. 58,247 unique negative examples. 15 cm per pixel resolution, from six distinct locations. (Lawrence Livermore National Laboratory) [Before 28/12/19]
Deeper, Broader and Artier Domain Generalization - Domain generalisation task dataset. (Da Li, QMUL) [Before 28/12/19]
Densely sampled object views: 2500 views of 2 objects, eg for view-based recognition and modeling (Gabriele Peters, Universiteit Dortmund) [Before 28/12/19]
Edinburgh Kitchen Utensil Database - 897 raw and binary images of 20 categories of kitchen utensil, a resource for training future domestic assistance robots (D. Fullerton, A. Goel, R. B. Fisher) [Before 28/12/19]
EDUB-Obj - Egocentric dataset for object localization and segmentation. (Marc Bolaños and Petia Radeva.) [Before 28/12/19]
Ellipse finding dataset (Dilip K. Prasad et al) [Before 28/12/19]
FGVC-Aircraft Benchmark - 10,200 images of aircraft, with 100 images for each of 102 different aircraft model variants (Maji, Kannala, Rahtu, Blaschko, Vedaldi) [Before 28/12/19]
FIN-Benthic - This is a dataset for automatic fine-grained classification of benthic macroinvertebrates. There are 15074 images from 64 categories. The number of images per category varies from 577 to 7. (Jenni Raitoharju, Ekaterina Riabchenko, Iftikhar Ahmad, Alexandros Iosifidis, Moncef Gabbouj, Serkan Kiranyaz, Ville Tirronen, Johanna Arje) [Before 28/12/19]
GERMS - The object set we use for GERMS data collection consists of 136 stuffed toys of different microorganisms. The toys are divided into 7 smaller categories, formed by semantic division of the toy microbes. The motivation for dividing the objects into smaller categories is to provide benchmarks with different degrees of difficulty. (Malmir M, Sikka K, Forster D, Movellan JR, Cottrell G.) [Before 28/12/19]
GDXray:X-ray images for X-ray testing and Computer Vision - GDXray includes five groups of images: Castings, Welds*,Baggages, Nature and Settings. (Domingo Mery, Catholic University of Chile) [Before 28/12/19]
GMU Kitchens Dataset - instance level annotation of 11 common household products from BigBird dataset across 9 different kitchens (George Mason University) [Before 28/12/19]
Grasping In The Wild - Egocentric video dataset of natural everyday life objects. 16 objects in 7 kitchens. (Benois-Pineau, Larrousse, de Rugy) [Before 28/12/19]
GRAZ-02 Database (Bikes, cars, people) (A. Pinz) [Before 28/12/19]
GREYC 3D - The GREYC 3D Colored mesh database is a set of 15 real objects with different colors, geometries and textures that were acquired using a 3D color laser scanner. (Anass Nouri, Christophe Charrier, Olivier Lezoray) [Before 28/12/19]
GTSDB: German Traffic Sign Detection Benchmark and GTSRB: German Traffic Sign Recognition Benchmark (Ruhr-Universitat Bochum) [Before 28/12/19]
ICubWorld - iCubWorld datasets are collections of images acquired by recording from the cameras of the iCub humanoid robot while it observes daily objects. (Giulia Pasquale, Carlo Ciliberto, Giorgio Metta, Lorenzo Natale, Francesca Odone and Lorenzo Rosasco.) [Before 28/12/19]
Industrial 3D Object Detection Dataset (MVTec ITODD) - depth and gray value data of 28 objects in 3500 labeled scenes for 3D object detection and pose estimation with a strong focus on industrial settings and applications (MVTec Software GmbH, Munich) [Before 28/12/19]
Instagram Food Dataset - A database of 800,000 food images and associated metadata posted to Instagram over 6 week period. Supports food type recognition and social network analysis. (T. Hospedales. Edinburgh/QMUL) [Before 28/12/19]
Keypoint-5 dataset - a dataset of five kinds of furniture with their 2D keypoint labels (Jiajun Wu, Tianfan Xue, Joseph Lim, Yuandong Tian, Josh Tenenbaum, Antonio Torralba, Bill Freeman) [Before 28/12/19]
KTH-3D-TOTAL - RGB-D Data with objects on desktops annotated. 20 Desks, 3 times per day, over 19 days. (John Folkesson et al.) [Before 28/12/19]
Laval 6 DOF Object Tracking Dataset - A Dataset of 297 RGB-D sequences with 11 objects for 6 DOF object Tracking. (Mathieu Garon, Denis Laurendeau, Jean-Francois Lalonde) [Before 28/12/19]
LISA Traffic Light Dataset - 6 light classes in various lighting conditions (Jensen, Philipsen, Mogelmose, Moeslund, and Trivedi) [Before 28/12/19]
LISA Traffic Sign Dataset - video of 47 US sign types with 7855 annotations on 6610 frames (Mogelmose, Trivedi, and Moeslund) [Before 28/12/19]
Linkoping 3D Object Pose Estimation Database (Fredrik Viksten and Per-Erik Forssen) [Before 28/12/19]
Linkoping Traffic Signs Dataset - 3488 traffic signs in 20K images (Larsson and Felsberg) [Before 28/12/19]
Longterm Labeled - This dataset contains a subset of the observations from the longterm dataset (longterm dataset above). (John Folkesson et al.) [Before 28/12/19]
Main Product Detection Dataset - Contains textual metadata of fashion products and their images with bounding boxes of the main product (the one referred by the text). (A. Rubio, L. Yu, E. Simo-Serra and F. Moreno-Noguer) [Before 28/12/19]
MCIndoor20000 - 20,000 digital images from three different indoor object categories: doors, stairs, and hospital signs. (Bashiri, LaRose, Peissig, and Tafti) [Before 28/12/19]
Mexculture142 - Mexican Cultural heritage objects and eye-tracker gaze fixations (Montoya Obeso, Benois-Pineau, Garcia-Vazquez, Ramirez Acosta) [Before 28/12/19]
MIT CBCL Car Data (Center for Biological and Computational Learning) [Before 28/12/19]
MIT CBCL StreetScenes Challenge Framework: (Stan Bileschi) [Before 28/12/19]
Microsoft COCO - Common Objects in Context (Tsung-Yi Lin et al) [Before 28/12/19]
Microsoft Object Class Recognition image databases (Antonio Criminisi, Pushmeet Kohli, Tom Minka, Carsten Rother, Toby Sharp, Jamie Shotton, John Winn) [Before 28/12/19]
Microsoft salient object databases (labeled by bounding boxes) (Liu, Sun Zheng, Tang, Shum) [Before 28/12/19]
Moving Labled - This dataset extends the longterm datatset with more locations within the same office environment at KTH. (John Folkesson et al.) [Before 28/12/19]
NABirds Dataset - 70,000 annotated photographs of the 400 species of birds commonly observed in North America (Grant Van Horn) [Before 28/12/19]
NEC Toy animal object recognition or categorization database (Hossein Mobahi) [Before 28/12/19]
NORB 50 toy image database (NYU) [Before 28/12/19]
NTU-VOI: NTU Video Object Instance Dataset - video clips with frame-level bounding box annotations of object instances for evaluating object instance search and localization in large scale videos. (Jingjing Meng, et. al.) [Before 28/12/19]
Object Pose Estimation Database - This database contains 16 objects, each sampled at 5 degrees angle increments along two rotational axes (F. Viksten etc.) [Before 28/12/19]
Object Recognition DatabaseThis database features modeling shots of eight objects and 51 cluttered test shots containing multiple objects. (Fred Rothganger, Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. ) [Before 28/12/19]
Omniglot - 1623 different handwritten characters from 50 different alphabets (Lake, Salakhutdinov, Tenenbaum) [Before 28/12/19]
Open Images Dataset V415,440,132 boxes on 600 categories, 30,113,078 image-level labels on 19,794 categories. (Ferrari, Duerig, Gomes) [Before 28/12/19]
Open Museum Identification Challenge (Open MIC)Open MIC contains photos of exhibits captured in 10 distinct exhibition spaces (painting, sculptures, jewellery, etc.) of several museums and the protocols for the domain adaptation and few-shot learning problems. (P. Koniusz, Y. Tas, H. Zhang, M. Harandi, F. Porikli, R. Zhang) [Before 28/12/19]
Osnabrück Synthetic Scalable Cube Dataset - 830000 different cubes captured from 12 different viewpoints for ANN training (Schöning, Behrens, Faion, Kheiri, Heidemann & Krumnack) [Before 28/12/19]
Princeton ModelNet - 127,915 CAD Models, 662 Object Categories, 10 Categories with Annotated Orientation (Wu, Song, Khosla, Yu, Zhang, Tang, Xiao) [Before 28/12/19]
PacMan datasets - RGB and 3D synthetic and real data for graspable cookware and crockery (Jeremy Wyatt) [Before 28/12/19]
PACS (Photo Art Cartoon Sketch) - An object category recognition dataset dataset for testing domain generalisation: How well can a classifier trained on object images in one domain recognise objects in another domain? (Da Li QMUL, T. Hospedales. Edinburgh/QMUL) [Before 28/12/19]
PASCAL 2007 Challange Image Database (motorbikes, cars, cows) (PASCAL Consortium) [Before 28/12/19]
PASCAL 2008 Challange Image Database (PASCAL Consortium) [Before 28/12/19]
PASCAL 2009 Challange Image Database (PASCAL Consortium) [Before 28/12/19]
PASCAL 2010 Challange Image Database (PASCAL Consortium) [Before 28/12/19]
PASCAL 2011 Challange Image Database (PASCAL Consortium) [Before 28/12/19]
PASCAL 2012 Challange Image Database Category classification, detection, and segmentation, and still-image action classification (PASCAL Consortium) [Before 28/12/19]
PASCAL Image Database (motorbikes, cars, cows) (PASCAL Consortium) [Before 28/12/19]
PASCAL Parts dataset - PASCAL VOC with segmentation annotation for semantic parts of objects (Alan Yuille) [Before 28/12/19]
PASCAL-Context dataset - annotations for 400+ additional categories (Alan Yuille) [Before 28/12/19]
PASCAL 3D/Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild - 12 class, 3000+ images each with 3D annotations (Yu Xiang, Roozbeh Mottaghi, Silvio Savarese) [Before 28/12/19]
Physics 101 dataset - a video dataset of 101 objects in five different scenarios (Jiajun Wu, Joseph Lim, Hongyi Zhang, Josh Tenenbaum, Bill Freeman) [Before 28/12/19]
Plant seedlings dataset - High-resolution images of 12 weed species. (Aarhus University) [Before 28/12/19]
Raindrop Detection - Improved Raindrop Detection using Combined Shape and Saliency Descriptors with Scene Context Isolation - Evaluation Dataset (Breckon, Toby P., Webster, Dereck D.) [Before 28/12/19]
ReferIt Dataset (IAPRTC-12 and MS-COCO) - referring expressions for objects in images from the IAPRTC-12 and MS-COCO datasets (Kazemzadeh, Matten, Ordonez, and Berg) [Before 28/12/19]
SAIL-VOS - The Semantic Amodal Instance Level Video Object Segmentation (SAIL-VOS) dataset provides accurate ground truth annotations to develop methods for reasoning about occluded parts of objects while enabling to take temporal information into account (Hu, Chen, Hui, Huang, Schwing) [29/12/19]
SeaShips - 31455 side images of boats near land, from 7 classes, extracted from surveillance video (Shao, Wu, Wang, Du, Li) [Before 28/12/19]
ShapeNet - 3D models of 55 common object categories with about 51K unique 3D models. Also 12K models over 270 categories. (Princeton, Stanford and TTIC) [Before 28/12/19]
SHORT-100 dataset - 100 categories of products found on a typical shopping list. It aims to benchmark the performance of algorithms for recognising hand-held objects from either snapshots or videos acquired using hand-held or wearable cameras. (Jose Rivera-Rubio, Saad Idrees, Anil A. Bharath) [Before 28/12/19]
SOR3D - The SOR3D dataset consists of over 20k instances of human-object interactions, 14 object types, and 13 object affordances. (pyridon Thermos) [Before 28/12/19]
Stanford Dogs Dataset - The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for the task of fine-grained image categorization. (Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, Li Fei-Fei, Stanford University) [Before 28/12/19]
SVHN: Street View House Numbers Dataset - like MNIST, but an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). (Netzer, Wang, Coates, Bissacco, Wu, Ng) [Before 28/12/19]
Swedish Leaf Dataset - These images contains leaves from 15 treeclasses (Oskar J. O. S?derkvist) [Before 28/12/19]
T-LESS - An RGB-D dataset for 6D pose estimation of texture-less objects. (Tomas Hodan, Pavel Haluza, Stepan Obdrzalek, Jiri Matas, Manolis Lourakis, Xenophon Zabulis) [Before 28/12/19]
Taobao Commodity Dataset - TCD contains 800 commodity images (dresses, jeans, T-shirts, shoes and hats) for image salient object detection from the shops on the Taobao website. (Keze Wang, Keyang Shi, Liang Lin, Chenglong Li ) [Before 28/12/19]
tieredImageNet dataset - a larger subset of ILSVRC-12 with 608 classes (779,165 images) grouped into 34 higher-level nodes in the ImageNet human-curated hierarchy. (Ren, Triantafillou, Ravi, Snell, Swersky, Tenenbaum, Larochelle, Zemel) [17/1/20]
ToolArtec point clouds - 50 kitchen tool 3D scans (ply) from an Artec EVA scanner. See also ToolKinect - 13 scans using a Kinect 2 and ToolWeb - 116 point clouds of synthetic household tools with mass and affordance groundtruth for 5 tasks. (Paulo Abelha) [Before 28/12/19]
TUW Object Instance Recognition Dataset - Annotations of object instances and their 6DoF pose for cluttered indoor scenes observed from various viewpoints and represented as Kinect RGB-D point clouds (Thomas, A. Aldoma, M. Zillich, M. Vincze) [Before 28/12/19]
TUW dat sets - Several RGB-D Ground truth and annotated data sets from TUW. (John Folkesson et al.) [Before 28/12/19]
UAH Traffic Signs Dataset (Arroyo etc.) [Before 28/12/19]
UIUC Car Image Database (UIUC) [Before 28/12/19]
UIUC Dataset of 3D object categories (S. Savarese and L. Fei-Fei) [Before 28/12/19]
USPS Handwritten Digits dataset - 7291 train and 2007 test images. The images are 16*16 grayscale pixels (Hull) [Before 28/12/19]
VAIS - VAIS contains simultaneously acquired unregistered thermal and visible images of ships acquired from piers, and it was created to faciliate autonomous ship development. (Mabel Zhang, Jean Choi, Michael Wolf, Kostas Daniilidis, Christopher Kanan) [Before 28/12/19]
Venezia 3D object-in-clutter recognition and segmentation (Emanuele Rodola) [Before 28/12/19]
Visual Attributes Dataset visual attribute annotations for over 500 object classes (animate and inanimate) which are all represented in ImageNet. Each object class is annotated with visual attributes based on a taxonomy of 636 attributes (e.g., has fur, made of metal, is round). [Before 28/12/19]
Visual Hull Data Setsa collection of visual hull datasets (Svetlana Lazebnik, Yasutaka Furukawa, and Jean Ponce) [Before 28/12/19]
VOC-360 - Dataset for object detection and segmentation in fisheye images (Fu, Bajic, and Vaughan) [29/12/19]
YCB Benchmarks – Object and Model Set - 77 objects in 5 categories (food, kitchen, tool, shape, task) each with 600 RGBD and high-res RGB images, calibration data, segmentation masks, mesh models (Calli, Dollar, Singh, Walsman, Srinivasa, Abbeel) [Before 28/12/19]
YouTube-BoundingBoxes - 5.6 million accurate human-annotated BB from 23 object classes tracked across frames, from 240,000 YouTube videos, with a strong focus on the person class (1.3 million boxes) (Real, Shlens, Pan, Mazzocchi, Vanhoucke, Khan, Kakarla et al) [Before 28/12/19]

People (static and dynamic), human body pose

3D articulated body - 3D reconstruction of an articulated body with rotation and translation. Single camera, varying focal. Every scene may have an articulated body moving. There are four kinds of data sets included. A sample reconstruction result included which uses only four images of the scene. (Prof Jihun Park) [Before 28/12/19]
BUFF dataset - About 10K scans of people in clothing and the estimated body shape of people underneath. Scans contain texture so synthetic videos/images are easy to generate. (Zhang, Pujades, Black and Pons-Moll) [Before 28/12/19]
CASR: Cyclist Arm Sign Recognition - Small clips of ~10 seconds showing cyclists performing arm signs. The videos are acquired with a consumer-graded camera. There are 219 arm sign actions annotated. (Zhijie Fang, Antonio M. Lopez) [13/1/20]
Dynamic Dyna - More than 40K 4D 60fps high resolution scans and models of people very accurately registered. Scans contain texture so synthetic videos/images are easy to generate. (Pons-Moll, Romero, Mahmood and Black) [Before 28/12/19]
Dynamic Faust - More than 40K 4D 60fps high resolution scans of people very accurately registered. Scans contain texture so synthetic videos/images are easy to generate. (Bogo, Romero, Pons-Moll and Black) [Before 28/12/19]
EHF dataset - 100 curated frames (+ code) of one subject in minimal clothing performing various expressive poses involving the body, hands and face. Each frame contains a full-body RGB image, detected 2D OpenPose features (body, hands, face), a 3D scan of the subject, and a 3D SMPL-X mesh as pseudo ground-truth (Pavlakos, Choutas, Ghorbani, Bolkart, Osman, Tzionas, Black) [Before 28/12/19]
Extended Chictopia dataset - 14K image Chictopia dataset with additional processed annotations (face) and SMPL body model fits to the images. (Lassner, Pons-Moll and Gehler) [Before 28/12/19]
Frames Labeled In Cinema (FLIC) - 20928 frames labeled with human pose (Sapp, Taskar) [Before 28/12/19]
GPA: geometric pose affordance dataset - Dataset of real 3D people interacting with real 3D scenes. 300k static RGB frames of 13 subject in 8 scenes with ground-truth scene meshes, and motion capture script focus on the interaction between subject and scene geometry, human dynamics, and mimic of human action with scene geometry around. (Wang, Chen, Rathore, Shin, Fowlkes) [29/12/19]
KIDS dataset - A collection of 30 high-resolution 3D shapes undergoing nearly-isometric and non-isometric deformations, with point-to-point ground truth as well as ground truth for left-to-right bilateral symmetry. (Rodola, Rota Bulo, Windheuser, Vestner, Cremers) [Before 28/12/19]
Kinect2 Human Pose Dataset (K2HPD) - Kinect2 Human Pose Dataset (K2HPD) includes about 100K depth images with various human poses under challenging scenarios. (Keze Wang, Liang Lin, Shengfu Zhai, Dengke Dong) [Before 28/12/19]
Leeds Sports Pose Dataset - 2000 pose annotated images of mostly sports people (Johnson, Everingham) [Before 28/12/19]
Look into Person Dataset - 50,000 images with elaborated pixel-wise annotations with 19 semantic human part labels and 2D hposes with 16 key points. (Gong, Liang, Zhang, Shen, Lin) [Before 28/12/19]
Manga109: manga (comic) dataset - 109 volumes, more than 21,000 pages, 109 volumes, more than 21,000 pages (Kiyoharu Aizawa) [29/12/19]
Mannequin in-bed pose datasets via RGB webcam - This in-bed pose dataset is collected via regular webcam in a simulated hospital room at Northeastern University. (Shuangjun Liu and Sarah Ostadabbas, ACLab) [Before 28/12/19]
Mannequin IRS in-bed dataset - This in-bed pose dataset is collected via our infrared selective (IRS) system in a simulated hospital room at Northeastern University. (Shuangjun Liu and Sarah Ostadabbas, ACLab) [Before 28/12/19]
MoPoTS-3D - Multi-person 3D body pose benchmark for monocular RGB based methods, with 20 sequences in indoor and outdoor settings (MPI For Informatics) [Before 28/12/19]
MPI-INF-3DHP - Single-person 3D body pose dataset and evaluation benchmark, with extensive pose coverage across a broad set of activities, and extensive scope of appearance augmentation. Multi-view RGB frames are available for the training set, and monocular view frames for the test set. (MPI For Informatics) [Before 28/12/19]
MPI MANO & SMPL+H dataset - Models, 4D scans and registrations for the statistical models MANO (hand-only) and SMPL+H (body+hands). For MANO there are ~2k static 3D scans of 31 subjects performing up to 51 poses. For SMPL+H we include 39 4D sequences of 11 subjects. (Javier Romero, Dimitrios Tzionas and Michael J Black) [Before 28/12/19]
MPII Human Pose Dataset - 25K images containing over 40K people with annotated body joints, 410 human activities {Andriluka, Pishchulin, Gehler, Schiele) [Before 28/12/19]
MPII Human Pose Dataset - MPII Human Pose dataset is a de-facto standard benchmark for evaluation of articulated human pose estimation. (Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, Bernt Schiele) [Before 28/12/19]
MuCo-3DHP - Large scale dataset of composited multi-person RGB images with 3D pose annotations, generated from MPI-INF-3DHP dataset (MPI For Informatics) [Before 28/12/19]
MVOR: A Multi-view Multi-person RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation - multi-view images captured by 3 RGB-D cameras during real clinical interventions (Padoy) [Before 28/12/19]
People In Photo Albums - Social media photo dataset with images from Flickr, and manual annotations on person heads and their identities. (Ning Zhang and Manohar Paluri and Yaniv Taigman and Rob Fergus and Lubomir Bourdev) [Before 28/12/19]
People Snapshot Dataset - Monocular video of 24 subjects rotating in front of a fixed camera. Annotation in form of segmentation and 2D joint positions is provided. (Alldieck, Magnor, Xu, Theobalt, Pons-Moll) [Before 28/12/19]
Person Recognition in Personal Photo Collections - we introduced three harder splits for evaluation and long-term attribute annotations and per-photo timestamp metadata. (Oh, Seong Joon and Benenson, Rodrigo and Fritz, Mario and Schiele, Bernt) [Before 28/12/19]
Pointing'04 ICPR Workshop Head Pose Image Database [Before 28/12/19]
Pose estimation - This dataset has a total of 155,530 images. These images were obtained through the recording of members of CIDIS, in 4 sessions. In total, 10 videos with a duration of 4 minutes each were obtained. The participants were asked to bring different clothes, in order to give variety to the images. After this, the frames of the videos were separated at a rate of 5 frames per second. All these images were captured from a top view perspective. The original images have a resolution of 1280x720 pixels. (CIDIS) [Before 28/12/19]
PROX dataset - Dataset (+code) of real 3D people interacting with real 3D scenes. "Quantitative PROX": 180 static RGB-D frames of 1 subject in 1 scene with ground-truth SMPL-X meshes. "Qualitative PROX": 100K dynamic RGB-D sequences of 20 subjects in 12 scenes with pseudo ground-truth SMPL-X meshes. (Hassan, Choutas, Tzionas, Black) [Before 28/12/19]
SHREC'16 Topological KIDS - A collection of 40 high-resolution and low-resolution 3D shapes undergoing nearly-isometric deformations in addition to strong topological artifacts, self-contacts and mesh gluing, with point-to-point ground truth. (Lahner, Rodola) [Before 28/12/19]
SURREAL - 60,000 synthetic videos of people under large variations in shape, texture, view-point and pose. (Varol, Romero, Martin, Mahmood, Black, Laptev, Schmid) [Before 28/12/19]
TNT 15 dataset - Several sequences of video synchronised by 10 Inertial Sensors (IMU) worn at the extremities. (von Marcard, Pons-Moll and Rosenhahn) [Before 28/12/19]
UC-3D Motion Database - Available data types encompass high resolution Motion Capture, acquired with MVN Suit from Xsens and Microsoft Kinect RGB and depth images. (Institute of Systems and Robotics, Coimbra, Portugal) [Before 28/12/19]
United People (UP) Dataset - ˜8,000 images with keypoint and foreground segmentation annotations as well as 3D body model fits. (Lassner, Romero, Kiefel, Bogo, Black, Gehler) [Before 28/12/19]
VGG Human Pose Estimation datasets including the BBC Pose (20 videos with an overlaid sign language interpreter), Extended BBC Pose (72 additional training videos), Short BBC Pose (5 one hour videos with sign language signers), and ChaLearn Pose (23 hours of Kinect data of 27 persons performing 20 Italian gestures). (Charles, Everingham, Pfister, Magee, Hogg, Simonyan, Zisserman) [Before 28/12/19]
VRLF: Visual Lip Reading Feasibility - audio-visual corpus of 24 speakers recorded in Spanish (Fernandez-Lopez, Martinez and Sukno) [Before 28/12/19]

People Detection and Tracking Databases

3D KINECT Gender Walking data base (L. Igual, A. Lapedriza, R. Borràs from UB, CVC and UOC, Spain) [Before 28/12/19]
AAU VAP Trimodal People Segmentation Dataset - People detection and segmentation dataset captured with depth, RGB, and thermal sensors (Palmero, Clapés, Bahnsen, Møgelmose, Moeslund, Escalera) [Before 28/12/19]
Aerial Gait Dataset - people walking as viewed from an aerial (moving) platform (Perera, Law, Chahl) [Before 28/12/19]
AGORASET: a dataset for crowd video analysis (Nicolas Courty et al) [Before 28/12/19]
CASIA gait database (Chinese Academy of Sciences) [Before 28/12/19]
CAVIAR project video sequences with tracking and behavior ground truth (CAVIAR team/Edinburgh University - EC project IST-2001-37540) [Before 28/12/19]
CMU Panoptic Studio Dataset - Multiple people social interaction dataset captured by 500+ synchronized video cameras, with 3D full body skeletons and calibration data. (H. Joo, T. Simon, Y. Sheikh) [Before 28/12/19]
CUHK Crowd Dataset - 474 video clips from 215 crowded scenes (Shao, Loy, and Wang) [Before 28/12/19]
CUHK01 Dataset : Person re-id dataset with 3, 884 images of 972 pedestrians (Rui Zhao et al) [Before 28/12/19]
CUHK02 Dataset : Person re-id dataset with five camera view settings. (Rui Zhao et al) [Before 28/12/19]
CUHK03 Dataset : Person re-id dataset with 13,164 images of 1,360 pedestrians (Rui Zhao et al) [Before 28/12/19]
Caltech Pedestrian Dataset (P. Dollar, C. Wojek, B. Schiele and P. Perona) [Before 28/12/19]
Daimler Pedestrian Detection Benchmark 21790 images with 56492 pedestrians plus empty scenes. (D. M. Gavrila et al) [Before 28/12/19]
Datasets (Color & Infrared) for Fusion A series of images in color and infrared captured from a parallel two-camera setup under different environmental conditions. (Juan Serrano-Cuerda, Antonio Fernandez-Caballero, Maria T. Lopez) [Before 28/12/19]
Driver Monitoring Video Dataset (RobeSafe + Jesus Nuevo-Chiquero) [Before 28/12/19]
DukeMTMC: Duke Multi-Target Multi-Camera tracking dataset - 8 cameras, 85 min, 2m frames, 2000 people of video (Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, Carlo Tomasi) [Before 28/12/19]
Edinburgh overhead camera person tracking dataset (Bob Fisher, Bashia Majecka, Gurkirt Singh, Rowland Sillito) [Before 28/12/19]
GVVPerfcapEva - Repository of human shape and performance capture data, including full body skeletal, hand tracking, body shape, face performance, interactions (Christian Theobalt) [Before 28/12/19]
HAT Database of 27 human attributes (Gaurav Sharma, Frederic Jurie) [Before 28/12/19]
Immediacy Dataset - This dataset is designed for estimation personal relationships. (Xiao Chu et al.) [Before 28/12/19]
Inria Dressed human bodies in motion benchmark - Benchmark containing 3D motion sequences of different subjects, motions, and clothing styles that allows to quantitatively measure the accuracy of body shape estimates. (Jinlong Yang, Jean-Sbastien Franco, Franck H=E9troy-Wheeler, and Stefanie Wuhrer) [Before 28/12/19]
INRIA Person Dataset (Navneet Dalal) [Before 28/12/19]
IU ShareView - IU ShareView dataset consists of nine sets of synchronized (two first-person) videos with a total of 1,227 pixel-level ground truth segmentation maps of 2,654 annotated person instances. (Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S. Ryoo, David J. Crandall) [Before 28/12/19]
Izmir - omnidirectional and panoramic image dataset (with annotations) to be used for human and car detection (Yalin Bastanlar) [Before 28/12/19]
Joint Attention in Autonomous Driving (JAAD) - The dataset includes instances of pedestrians and cars intended primarily for the purpose of behavioural studies and detection in the context of autonomous driving. (Iuliia Kotseruba, Amir Rasouli and John K. Tsotsos) [Before 28/12/19]
JTL Stereo Tacking Dataset for Person Following Robots - 11 different indoor and outdoor places for the task of robots following people under challenging situations (Chen, Sahdev, Tsotsos) [Before 28/12/19]
KAIST Multispectral Pedestrian Detection Benchmark - 95k color-thermal pairs (640x480, 20Hz) images, with 103,128 dense annotations and 1,182 unique pedestrians (Hwang, Park, Kim, Choi, Kweon) [Before 28/12/19]
MAHNOB: MHI-Mimicry database - A 2 person, multiple camera and microphone database for studying mimicry in human-human interaction scenarios. (Sun, Lichtenauer, Valstar, Nijholt, and Pantic) [Before 28/12/19]
MIT CBCL Pedestrian Data (Center for Biological and Computational Learning) [Before 28/12/19]
MPI DYNA - A Model of Dynamic Human Shape in Motion (Max Planck Tubingen) [Before 28/12/19]
MPI FAUST Dataset A data set containing 300 real, high-resolution human scans, with automatically computed ground-truth correspondences (Max Planck Tubingen) [Before 28/12/19]
MPI JHMDB dataset - Joint-annotated Human Motion Data Base - 21 actions, 928 clips, 33183 frames (Jhuang, Gall, Zuffi, Schmid and Black) [Before 28/12/19]
MPI MOSH Motion and Shape Capture from Markers. MOCAP data, 3D shape meshes, 3D high resolution scans. (Max Planck Tubingen) [Before 28/12/19]
MVHAUS-PI - a multi-view human interaction recognition dataset (Saeid et al.) [Before 28/12/19]
Market-1501 Dataset - 32,668 annotated bounding boxes of 1,501 identities from up to 6 cameras (Liang Zheng et al) [Before 28/12/19]
Modena and Reggio Emilia first person head motion videos (Univ of Modena and Reggio Emilia) [Before 28/12/19]
Multimodal Activities of Daily Living - including video, audio, physiological, sleep, motion and plug sensors. (Alexia Briasouli) [Before 28/12/19]
Multiple Object Tracking Benchmark - A collection of datasets with ground truth, plus a performance league table (ETHZ, U. Adelaide, TU Darmstadt) [Before 28/12/19]
Multispectral visible-NIR video sequences - Annotated multispectral video, visible + NIR (LE2I, Universit de Bourgogne) [Before 28/12/19]
NYU Multiple Object Tracking Benchmark (Konrad Schindler et al) [Before 28/12/19]
Occluded Articulated Human Body Dataset - Body pose extraction and tracking under occlusions, 6 RGB-D sequences in total (3500 frames) with one, two and three users, marker-based ground truth data. (Markos Sigalas, Maria Pateraki, Panos Trahanias) [Before 28/12/19]
OxUva - A large-scale long-term tracking dataset composed of 366 long videos of about 14 hours in total, with separate dev (public annotations) and test sets (hidden annotations), featuring target object disappearance and continuous attributes. (Jack Valmadre, Luca Bertinetto, Joao F. Henriques, Ran Tao, Andrea Vedaldi, Arnold Smeulders, Philip Torr, Efstratios Gavves) [Before 28/12/19]
OU-ISIR Gait Database - six video-based gait data sets, two inertial sensor-based gait datasets, and a gait-relevant biometric score data set. (Yasushi Makihara) [Before 28/12/19]
PARSE Dataset Additional Data - facial expression, gaze direction, and gender (Antol, Zitnick, Parikh) [Before 28/12/19]
PARSE Dataset of Articulated Bodies - 300 images of humans and horses (Ramanan) [Before 28/12/19]
PathTrack dataset: a large-scale MOT dataset - PathTrack is a large scale multi-object tracking dataset of more than 15,000 person trajectories in 720 sequences. (Santiago Manen, Michael Gygli, Dengxin Dai, Luc Van Gool) [Before 28/12/19]
PDbm: People Detection benchmark repository - realistic sequences, manually annotated people detection ground truth and a complete evaluation framework (Garc??a-Mart??n, Mart??nez, Besc??s) [Before 28/12/19]
PDds: A Person Detection dataset - several annotated surveillance sequences of different levels of complexity (Garc??a-Mart??n, Mart??nez, Besc??s) [Before 28/12/19]
PETS 2009 Crowd Challange dataset (Reading University & James Ferryman) [Before 28/12/19]
PETS Winter 2009 workshop data (Reading University & James Ferryman) [Before 28/12/19]
PETS: 2015 Performance Evaluation of Tracking and Surveillance (Reading University & James Ferryman) [Before 28/12/19]
PETS: 2015 Performance Evaluation of Tracking and Surveillance (Reading University & Luis Patino) [Before 28/12/19]
PETS 2016 datasets - multi-camera (including thermal cameras) video recordings of human behavior around a stationary vehicle and around a boat (Thomas Cane) [Before 28/12/19]
PIROPO - People in Indoor ROoms with Perspective and Omnidirectional cameras, with more than 100,000 annotated frames (GTI-UPM, Spain) [Before 28/12/19]
People-Art - a databased containing people labelled in photos and artwork (Qi Wu and Hongping Cai) [Before 28/12/19]
Photo-Art-50 - a databased containing 50 object classes annoted in photos and artwork (Qi Wu and Hongping Cai) [Before 28/12/19]
Pixel-based change detection benchmark dataset (Goyette et al) [Before 28/12/19]
Precarious Dataset - unusual people detection dataset (Huang) [Before 28/12/19]
RAiD - Re-Identification Across Indoor-Outdoor Dataset: 43 people, 4 cameras, 6920 images (Abir Das et al) [Before 28/12/19]
RPIfield - Person re-identification dataset containing 4108 person images with timestamps. (Meng Zheng, Srikrishna Karanam, Richard J. Radke) [Before 28/12/19]
Singapore Maritime Dataset - Visible range videos and Infrared videos. (Dilip K. Prasad) [Before 28/12/19]
SLP (Simultaneously-collected multimodal Lying Pose) - large scale dataset on in-bed poses includes: 2 Data Collection Settings: (a) Hospital setting: 7 participants, and (b) Home setting: 102 participants (29 females, age range: 20-40). 4 Imaging Modalities: RGB (regular webcam), IR (FLIR LWIR camera), DEPTH (Kinect v2) and Pressure Map (Tekscan Pressure Sensing Map). 3 Cover Conditions: uncover, bed sheet, and blanket. Fully labeled poses with 14 joints. (Ostadabbas and Liu) [2/1/20]
SYNTHIA - Large set (~half million) of virtual-world images for training autonomous cars to see. (ADAS Group at Computer Vision Center) [Before 28/12/19]
Shinpuhkan 2014 - A Person Re-identification dataset containing 22,000 images of 24 people captured by 16 cameras. (Yasutomo Kawanishi et al.) [Before 28/12/19]
Stanford Structured Group Discovery dataset - Discovering Groups of People in Images (W. Choi et al) [Before 28/12/19]
TrackingNet - Large-scale dataset for tracking in the wild: more than 30k annotated sequences for training, more than 500 sequestered sequences for testing, evaluation server and leaderboard for fair ranking. (Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi and Bernard Ghanem) [Before 28/12/19]
Transient Biometrics Nails Dataset V01 (Igor Barros Barbosa) [Before 28/12/19]
Temple Color 128 - Color Tracking Benchmark - Encoding Color Information for Visual Tracking (P. Liang, E. Blasch, H. Ling) [Before 28/12/19]
TUM Gait from Audio, Image and Depth (GAID) database - containing tracked RGB video, tracked depth video, and audio for 305 subjects (Babaee, Hofmann, Geiger, Bachmann, Schuller, Rigoll) [Before 28/12/19]
TVPR (Top View Person Re-identification) dataset - person re-identification using an RGB-D camera in a Top-View configuration: indoor 23 sessions, 100 people, 8 days (Liciotti, Paolanti, Frontoni, Mancini and Zingaretti) [Before 28/12/19]
UCLA Aerial Event Dataset - Human activities in aerial videos with annotations of people, objects, social groups, activities and roles (Shu, Xie, Rothrock, Todorovic, and Zhu) [Before 28/12/19]
Univ of Central Florida - Crowd Dataset (Saad Ali) [Before 28/12/19]
Univ of Central Florida - Crowd Flow Segmentation datasets (Saad Ali) [Before 28/12/19]
VIPeR: Viewpoint Invariant Pedestrian Recognition - 632 pedestrian image pairs taken from arbitrary viewpoints under varying illumination conditions. (Gray, Brennan, and Tao) [Before 28/12/19]
Visual object tracking challenge datasets - The VOT datasets is a collection of fully annotated visual object tracking datasets used in the single-target short-term visual object tracking challenges. (The VOT committee) [Before 28/12/19]
WIDER Attribute Dataset - WIDER Attribute is a large-scale human attribute dataset, with 13789 images belonging to 30 scene categories, and 57524 human bounding boxes each annotated with 14 binary attributes. (Li, Yining and Huang, Chen and Loy, Chen Change and Tang, Xiaoou) [Before 28/12/19]
WUds: Wheelchair Users Dataset - wheelchair users detection data, to extend people detection, providing a more general solution to detect people in environments such as independent and assisted living, hospitals, healthcare centers and senior residences (Mart??n-Nieto, Garc??a-Mart??n, Mart??nez) [Before 28/12/19]
xR-EgoPose - Photorealistic synthetic dataset for 3D human pose estimation from an ego-centric perspective (Tome, Peluse, Agapito and Badino) [4/1/2020]
YouTube-BoundingBoxes - 5.6 million accurate human-annotated BB from 23 object classes tracked across frames, from 240,000 YouTube videos, with a strong focus on the person class (1.3 million boxes) (Real, Shlens, Pan, Mazzocchi, Vanhoucke, Khan, Kakarla et al) [Before 28/12/19]

Remote Sensing

Aerial Imagery for Roof Segmentation (AIRS) - 457 km2 coverage of orthorectified aerial images with over 220,000 buildings for roof segmentation. (Lei Wang, Qi Chen) [Before 28/12/19]
Brazilian Cerrado-Savanna Scenes Dataset - Composition of IR-R-G scenes taken by RapidEye sensor for vegetation classification in Brazilian Cerrado-Savanna. (K. Nogueira, J. A. dos Santos, T. Fornazari, T. S. Freire, L. P. Morellato, R. da S. Torres) [Before 28/12/19]
Brazilian Coffee Scenes Dataset - Composition of IR-R-G scenes taken by SPOT sensor for identification of coffee crops in Brazilian mountains. (O. A. B. Penatti, K. Nogueira, J. A. dos Santos.) [Before 28/12/19]
Building Detection Benchmark -14 images acquired from IKONOS (1 m) and QuickBird (60 cm)(Ali Ozgun Ok and Caglar Senaras) [Before 28/12/19]
CBERS-2B, Landsat 5 TM, Geoeye, Ikonos-2 MS and ALOS-PALSAR - land-cover classification using optical images(D. Osaku et al. ) [Before 28/12/19]
Data Fusion Contest 2015 (Zeebruges) - This dataset provides a RGB aerial dataset (5cm) and a Lidar point cloud (65pts/m2) over the harbor of the city of Zeebruges (Belgium). It also provided a DSM derived from the point cloud and a semantic segmentation ground truth of five of the seven 10000 x 10000 pixels tiles. An evaluation server is used to evaluate the results on the two other tiles. (Image analysis and Data Fusion Technical Committee, IEEE Geoscience, Remote Sensing Society) [Before 28/12/19]
Data Fusion Contest 2017 - This dataset provides satellite (Landsat, Sentinel 2) and vector GIS layers (e.g. buildings and road footprint) for nine cities worldwide. The task is to predict land use classes useful for climate models at a 100m prediction grid, given data of different resolution and types of features. 5 cities come with labels, 4 others are kept hidden for scoring on an evaluation server. (Image analysis and Data Fusion Technical Committee, IEEE Geoscience, Remote Sensing Society) [Before 28/12/19]
deepGlobe challenge - This datasets comprises three challenges, road extraction, buildings detection and semantic segmentation of land cover. A series of satellite images from Digital Globe (RGB, 50 cm resolution) and labels over several countries worldwide are provided. The results were presented at the DeepGlobe workshop at CVPR 2018. (Facebook, Digital Globe) [Before 28/12/19]
DeepGlobe Satellite Image Understanding Challenge - Datasets and evaluation platforms for three deep learning tasks on satellite images: road extraction, building detection, and land type classification. (Demir, Ilke and Koperski, Krzysztof and Lindenbaum, David and Pang, Guan and Huang, Jing and Basu, Saikat and Hughes, Forest and Tuia, Devis and Raskar, Ramesh) [Before 28/12/19]
DOTA - 2806 large aerial images with 188,282 over 15 categories (Xia, Bai, Ding, Zhu, Belongie, Luo, Datcu, Pelillo, Zhang) [Before 28/12/19]
DublinCity: Annotated LiDAR Point Cloud and its Applications - Annotated (13 labels) aerial lidar scan of central Dublin (Zolanvari, Ruano, Rana, Cummins, da Silva, Rahbar, Smolic) [Before 28/12/19]
FORTH Multispectral Imaging (MSI) datasets - 5 datasets for Multispectral Imaging (MSI), annotated with ground truth data (Polykarpos Karamaoynas) [Before 28/12/19]
Furnas and Tiete - sediment yield classification( Pisani et al.) [Before 28/12/19]
HSRC - High Resolution Optical Satellite Image Dataset for Ship Recognition. 1061 ships images over 3 subclass levels (Liu, Yuan, Weng, Yang) [Before 28/12/19]
ISPRS 2D semantic labeling - Height models and true ortho-images with a ground sampling distance of 5cm have been prepared over the city of Potsdam/Germany (Franz Rottensteiner, Gunho Sohn, Markus Gerke, Jan D. Wegner) [Before 28/12/19]
ISPRS 3D semantic labeling - nine class airborne laser scanning data (Franz Rottensteiner, Gunho Sohn, Markus Gerke, Jan D. Wegner) [Before 28/12/19]
Inria Aerial Image Labeling Dataset - 9000 square kilometeres of color aerial imagery over U.S. and Austrian cities. (Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, Pierre Alliez.) [Before 28/12/19]
Lampert's Spectrogram Analysis - Passive sonar spectrogram images derived from time-series data,??these spectrograms are generated from recordings of acoustic energy radiated from propeller and engine machinery in underwater sea recordings. (Thomas Lampert) [Before 28/12/19]
Linkoping Thermal InfraRed dataset - The LTIR dataset is a thermal infrared dataset for evaluation of Short-Term Single-Object (STSO) tracking (Linkoping University) [Before 28/12/19]
MASATI: MAritime SATellite Imagery dataset - MASATI is a dataset composed of optical aerial imagery with 6212 samples which were obtained from Microsoft Bing Maps. They were labeled and classified into 7 classes of maritime scenes: land, coast, sea, coast-ship, sea-ship, sea with multi-ship, sea-ship in detail. (University of Alicante) [Before 28/12/19]
MUUFL Gulfport Hyperspectral and LiDAR data set - Co-registered aerial hyperspectral and lidar data over the University of Southern Mississippi Gulfpark campus containing several sub-pixel targets. (Gader, Zare, Close, Aitken, Tuell) [Before 28/12/19]
NWPU-RESISC45 - A large-scale benchmark dataset used for remote sensing image scene classification containing 31500 images covered by 45 scene classes. (Cheng, Han, Lu) [Before 28/12/19]
NWPU VHR-10 dataset - 800 high resolution satellite images of 10 classes (airplane, ship, storage tank, baseballdiamond, tennis court, basketball court, ground track field, harbor, bridge, and vehicle) (Cheng, Han, Zhou, Guo) [Before 28/12/19]
RIT-18 - a high-resolution multispectral dataset for semantic segmentation. (Ronald Kemker, Carl Salvaggio, Christopher Kanan) [Before 28/12/19]
SAR SHIP DATASET - 43 Synthetic Aperture Radar images (Schwegmann, Kleynhans, Salmon, Mdakane, Meyer) [Before 28/12/19]
Semantic Drone Dataset - 20 houses from nadir (bird's eye) view acquired at 5 to 30 meters above ground. 400 public and 200 private high resolution images of 6000x4000px (24Mpx). [Before 28/12/19]
UC Merced Land Use Dataset 21 class land use image dataset with 100 images per class, largely urban, 256x256 resolution, 1 foot pixels (Yang and Newsam) [Before 28/12/19]
UCF-CrossView Dataset: Cross-View Image Matching for Geo-localization in Urban Environments - A new dataset of street view and bird's eye view images for cross-view image geo-localization. (Center for Research in Computer Vision, University of Central Florida) [Before 28/12/19]
Zurich Summer dataset - t is intended for semantic segmentation of very high resolution satellite images of urban scenes, with incomplete ground truth (Michele Volpi and Vitto Ferrari.) [Before 28/12/19]
Zurich Urban Micro Aerial Vehicle Dataset - time synchronized aerial high-resolution images of 2 km of Zurich, with associated other data (Majdik, Till, Scaramuzza) [Before 28/12/19]

Robotics

Edinburgh Kitchen Utensil Database - 897 raw and binary images of 20 categories of kitchen utensil, a resource for training future domestic assistance robots (D. Fullerton, A. Goel, R. B. Fisher) [Before 28/12/19]
Event-Camera Dataset - This presents the world's first collection of datasets with an event-based camera for high-speed robotics (E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, D. Scaramuzza) [Before 28/12/19]
Improved 3D Sparse Maps for High-performance Structure from Motion with Low-cost Omnidirectional Robots - Evaluation Dataset - Data set used in research paper doi:10.1109/ICIP.2015.7351744 (Breckon, Toby P., Cavestany, Pedro) [Before 28/12/19]
Indoor Place Recognition Dataset for localization of Mobile Robots - The dataset contains 17 different places built from 2 different robots (virtualMe and pioneer) (Raghavender Sahdev, John K. Tsotsos.) [Before 28/12/19]
JTL Stereo Tacking Dataset for Person Following Robots - 11 different indoor and outdoor places for the task of robots following people under challenging situations (Chen, Sahdev, Tsotsos) [Before 28/12/19]
Meta rooms - RGB-D data comprised of 28 aligned depth camera images collected by having robot go to specific place and do 360 degrees of pan with various tilts. (John Folkesson et al.) [Before 28/12/19]
PanoNavi dataset - A panoramic dataset for robot navigation, consisted of 5 videos lasting about 1 hour. (Lingyan Ran) [Before 28/12/19]
Robotic 3D Scan Repository - 3D point clouds from robotic experiments of scenes (Osnabruck and Jacobs Universities) [Before 28/12/19]
Solving the Robot-World Hand-Eye(s) Calibration Problem with Iterative Methods - These datasets were generated for calibrating robot-camera systems. (Amy Tabb) [Before 28/12/19]
ViDRILO - ViDRILO is a dataset containing 5 sequences of annotated RGB-D images acquired with a mobile robot in two office buildings under challenging lighting conditions. (Miguel Cazorla, J. Martinez-Gomez, M. Cazorla, I. Garcia-Varea and V. Morell.) [Before 28/12/19]
Witham Wharf - For RGB-D of eight locations collect by robot every 10 min over ~10 days by the University of Lincoln. (John Folkesson et al.) [Before 28/12/19]

Scenes or Places, Scene Segmentation or Classification

Barcelona - 15,150 images, urban views of Barcelona (Tighe and Lazebnik) [Before 28/12/19]
Cross-modal Landmark Identification Benchmark - Dandmark-identification benchmark taken under varying weather conditions, which consists of 17 landmark images taken under several weather conditions, e.g., sunny, cloudy, snowy, and sunset. (Yonsei University) [Before 28/12/19]
CMU Visual Localization Data Set - Dataset collected over the period of a year using the Navlab 11 equipped with IMU, GPS, INS, Lidars and cameras. (Hernan Badino, Daniel Huber and Takeo Kanade) [Before 28/12/19]
COLD (COsy Localization Database) - place localization (Ullah, Pronobis, Caputo, Luo, and Jensfelt) [Before 28/12/19]
DAVIS: Video Object Segmentation dataset 2016 - A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation (F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung) [Before 28/12/19]
DAVIS: Video Object Segmentation dataset 2017 - The 2017 DAVIS Challenge on Video Object Segmentation (J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbelaez, A. Sorkine-Hornung, and L. Van Gool) [Before 28/12/19]
EDUB-Seg- Egocentric dataset for event segmentation. (Mariella Dimiccoli, Marc Bolaños, Estefania Talavera, Maedeh Aghaei, Stavri G. Nikolov, and Petia Radeva.) [Before 28/12/19]
European Flood 2013 - 3,710 images of a flood event in central Europe, annotated with relevance regarding 3 image retrieval tasks (multi-label) and important image regions. (Friedrich Schiller University Jena, Deutsches GeoForschungsZentrum Potsdam) [Before 28/12/19]
Fieldsafe - A multi-modal dataset for obstacle detection in agriculture. (Aarhus University) [Before 28/12/19]
Fifteen Scene Categories - A dataset of fifteen natural scene categories. (Fei-Fei Li and Aude Oliva) [Before 28/12/19]
FIGRIM (Fine Grained Image Memorability Dataset) - A subset of images from the SUN database used for human memory experiments, and provided along with memorability scores. (Bylinskii, Isola, Bainbridge, Torralba, Oliva) [Before 28/12/19]
Geometric Context - scene interpretation images (Derek Hoiem) [Before 28/12/19]
HyKo: A Spectral Dataset for Scene Understanding - The HyKo dataset was captured with compact, low-cost, snapshot mosaic (SSM) imaging cameras, which are able to capture a whole spectral cube in one shot recorded from a moving vehicle enabling hyperspectral scene analysis for road scene understanding. (Active Vision Group, University of Koblenz-Landau) [Before 28/12/19]
iNaturalist Species Classification and Detection Dataset - The iNaturalist 2017 species classification and detection dataset has been collected and annotated by citizen scientists and contains 859,000 images from over 5,000 different species of plants and animals. (Caltech) [Before 28/12/19]
Indoor Place Recognition Dataset for localization of Mobile Robots - The dataset contains 17 different places built from 2 different robots (virtualMe and pioneer) (Raghavender Sahdev, John K. Tsotsos.) [Before 28/12/19]
Indoor Scene Recognition - 67 Indoor categories, 15620 images (Quattoni and Torralba) [Before 28/12/19]
Intrinsic Images in the Wild (IIW) - Intrinsic Images in the Wild, is a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes (Sean Bell, Kavita Bala, Noah Snavely) [Before 28/12/19]
IRS: Large Synthetic Indoor Robotics Stereo Dataset - 103,316 samples covering a wide range of indoor scenes, such as home, office, store and restaurant (Wang, Zheng, Yan, Deng, Zhao, Chu) [Before 28/12/19]
LM+SUN - 45,676 images, mainly urban or human related scenes (Tighe and Lazebnik) [Before 28/12/19]
Mallscape dataset - a collection of 33K localized and time-stamped images captured in two large shopping malls during two different sessions temporally separated by several months, enabling to evaluate Point-of-Interests (POI) change detection methods in realistic conditions (Revaud, Sampaio De Rezende, Heo, You, Jeong) [2/1/20]
Maritime Imagery in the Visible and Infrared Spectrums - VAIS contains simultaneously acquired unregistered thermal and visible images of ships acquired from piers (Zhang, Choi, Daniilidis, Wolf, & Kanan) [Before 28/12/19]
MASATI: MAritime SATellite Imagery dataset - MASATI is a dataset composed of optical aerial imagery with 6212 samples which were obtained from Microsoft Bing Maps. They were labeled and classified into 7 classes of maritime scenes: land, coast, sea, coast-ship, sea-ship, sea with multi-ship, sea-ship in detail. (University of Alicante) [Before 28/12/19]
Materials in Context (MINC) - The Materials in Context Database (MINC) builds on OpenSurfaces, but includes millions of point annotations of material labels. (Sean Bell, Paul Upchurch, Noah Snavely, Kavita Bala) [Before 28/12/19]
MIT Intrinsic Images - 20 objects (Roger Grosse, Micah K. Johnson, Edward H. Adelson, and William T. Freeman) [Before 28/12/19]
NYU V2 Mixture of Manhattan Frames Dataset - We provide the Mixture of Manhattan Frames (MMF) segmentation and MF rotations on the full NYU depth dataset V2 by Silberman et al. (Straub, Julian and Rosman, Guy and Freifeld, Oren and Leonard, John J. and Fisher III, John W.) [Before 28/12/19]
OpenSurfaces - OpenSurfaces consists of tens of thousands of examples of surfaces segmented from consumer photographs of interiors, and annotated with material parameters, texture information, and contextual information . (Kavita Bala et al.) [Before 28/12/19]
Oxford Audiovisual Segmentation Dataset - Oxford Audiovisual Segmentation Dataset with Oxford Audiovisual Segmentation Dataset including audio recordings of objects being struck (Arnab, Sapienza, Golodetz, Miksik and Torr) [Before 28/12/19]
Thermal Road Dataset - Our thermal-road dataset provides around 6000 thermal-infrared images captured in the road scene with manually annotated ground-truth. (3500: general road, 1500: complicated road, 1000: off-road). (Jae Shin Yoon) [Before 28/12/19]
Places 2 Scene Recognition database -365 scene categories and 8 millions of images (Zhou, Khosla, Lapedriza, Torralba and Oliva) [Before 28/12/19]
Places Scene Recognition database - 205 scene categories and 2.5 millions of images (Zhou, Lapedriza, Xiao, Torralba, and Oliva) [Before 28/12/19]
RGB-NIR Scene Dataset - 477 images in 9 categories captured in RGB and Near-infrared (NIR) (Brown and Susstrunk) [Before 28/12/19]
RMS2017 - Reconstruction Meets Semantics outdoor dataset - 500 semantically annotated images with poses and point cloud from a real garden (Tylecek, Sattler) [Before 28/12/19]
RMS2018 - Reconstruction Meets Semantics virtual dataset - 30k semantically annotated images with poses and point cloud from 6 virtual gardens (Le, Tylecek) [Before 28/12/19]
SceneNet RGB-D - 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth including RGB and depth (McCormac, Handa, Leutenegger, Davison) [Before 28/12/19]
Southampton-York Natural Scenes Dataset 90 scenes, 25 indoor and outdoor scene categories, with spherical LiDAR, HDR intensity, stereo intensity panorama. (Adams, Elder, Graf, Leyland, Lugtigheid, Muryy) [Before 28/12/19]
SUN 2012 - 16,873 fully annotated scene images for scene categorization (Xiao et al) [Before 28/12/19]
SUN 397 - 397 scene categories for scene classification (Xiao et al) [Before 28/12/19]
SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite - 10,000 RGB-D images, 146,617 2D polygons and 58,657 3D bounding boxes (Song, Lichtenberg, and Xiao) [Before 28/12/19]
SYNTHIA - Large set (~half million) of virtual-world images for training autonomous cars to see. (ADAS Group at Computer Vision Center) [Before 28/12/19]
Sift Flow (also known as LabelMe Outdoor, LMO) - 2688 images, mainly outdoor natural and urban (Tighe and Lazebnik) [Before 28/12/19]
Stanford Background Dataset - 715 images of outdoor scenes containing at least one foreground object (Gould et al) [Before 28/12/19]
Surface detection - Real-time traversable surface detection by colour space fusion and temporal analysis - Evaluation Dataset (Breckon, Toby P., Katramados, Ioannis) [Before 28/12/19]
Taskonomy - Over 4.5 million real images each with ground truth for 25 semantic, 2D, and 3D tasks. (Zamir, Sax, Shen, Guibas, Malik, Savarese) [Before 28/12/19]
TUM City Campus - Urban point clouds taken by Mobile Laser Scanning (MLS) for classification, object extraction and change detection (Stilla, Hebel, Xu, Gehrung) [3/1/20]
ViDRILO - ViDRILO is a dataset containing 5 sequences of annotated RGB-D images acquired with a mobile robot in two office buildings under challenging lighting conditions. (Miguel Cazorla, J. Martinez-Gomez, M. Cazorla, I. Garcia-Varea and V. Morell.) [Before 28/12/19]
Virtual Gallery - a synthetic dataset that targets multiple challenges such as varying lighting conditions and different occlusion levels for various tasks such as depth estimation, instance segmentation and visual localization (Weinzaepfel, Csurka, Cabon, Humenberger) [7/1/20]
Wireframe dataset - A set of RGB images of man-made scenes are annotated with junctions and lines, which describes the large-scale geometry of the scenes. (Huang et al.) [Before 28/12/19]

Segmentation (General)

A Dataset for Sky Segmentation - sentence describing it: This Sky dataset was used to evaluate the method IFT-SLIC and other superpixel algorithms, using the superpixel-based sky segmentation method proposed by Juraj Kostolansky. It contains a collection of 60 images based on the Caltech Airplanes Side dataset by R. Fergus with ground truth for sky segmentation. (Eduardo B. Alexandre, Paulo A. V. Miranda, R. Fergus) [Before 28/12/19]
Aberystwyth Leaf Evaluation Dataset - Timelapse plant images with hand marked up leaf-level segmentations for some time steps, and biological data from plant sacrifice. (Bell, Jonathan; Dee, Hannah M.) [Before 28/12/19]
ADE20K - 22+K hierarchically segmented and labeled scene images (900 scene categories, 3+K classes and subpart classes) (Zhou, Zhao, Puig, Fidler, Barriuso, Torralba) [Before 28/12/19]
Alpert et al. Segmentation evaluation database (Sharon Alpert, Meirav Galun, Ronen Basri, Achi Brandt) [Before 28/12/19]
BMC (Background Model Challenge) - A dataset for comparing background subtraction algorithms, composed of real and synthetic videos(Antoine) [Before 28/12/19]
Berkeley Segmentation Dataset and Benchmark (David Martin and Charless Fowlkes) [Before 28/12/19]
CAD 120 affordance dataset - Pixelwise affordance annotation in human context (Sawatzky, Srikantha, Gall) [Before 28/12/19]
COLT - The dataset contains 40 imagenet categories with manually annotated per-pixel object masks. (Jia Li) [Before 28/12/19]
CO-SKEL dataset - This dataset consists of categorized skeleton and segmentation masks for evaluating co-skeletonization methods. (Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan) [Before 28/12/19]
Crack detection on 2D pavement images - five sets of pavement images that contain cracks with the manual ground truth associated and 5 automatic segmentations obtained with existing approaches (Sylvie Chambon) [Before 28/12/19]
CTU Color and Depth Image Dataset of Spread Garments - Images of spread garments with annotated corners. (Wagner, L., Krejov D., and Smutn V. (Czech Technical University in Prague)) [Before 28/12/19]
CTU Garment Folding Photo Dataset - Color and depth images from various stages of garment folding. (Sushkov R., Melkumov I., Smutn y V. (Czech Technical University in Prague)) [Before 28/12/19]
DeformIt 2.0 - Image Data Augmentation Tool: Simulate novel images with ground truth segmentations from a single image-segmentation pair (Brian Booth and Ghassan Hamarneh) [Before 28/12/19]
EVIMO - Dataset for motion segmentation, egomotion estimation and tracking using an event camera; the dataset is collected with DAVIS 346C and provides 3D poses for camera and independently moving objects, and pixelwise motion segmentation masks. (Mitrokhin, Ye, Fermuller, Aloimonos, Delbruck) [14/1/20]
GrabCut Image database (C. Rother, V. Kolmogorov, A. Blake, M. Brown) [Before 28/12/19]
Histology Image Collection Library (HICL) - The HICL is a compilation of 3870histopathological images (so far) from various diseases, such as brain cancer,breast cancer and HPV (Human Papilloma Virus)-Cervical cancer. (Medical Image and Signal Processing (MEDISP) Lab., Department of BiomedicalEngineering, School of Engineering, University of West Attica) [Before 28/12/19]
ICDAR'15 Smartphone document capture and OCR competition - challenge 1 - videos of documents filmed by a user with a smartphone to simulate mobile document capture, and ground truth coordinates of the document corners to detect. (Burie, Chazalon, Coustaty, Eskenazi, Luqman, Mehri, Nayef, Ogier, Prum and Rusinol) [Before 28/12/19]
Intrinsic Images in the Wild (IIW) - Intrinsic Images in the Wild, is a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes (Sean Bell, Kavita Bala, Noah Snavely) [Before 28/12/19]
LabelMe images database and online annotation tool (Bryan Russell, Antonio Torralba, Kevin Murphy, William Freeman) [Before 28/12/19]
LITS Liver Tumor Segmentation - 130 3D CT scans with segmentations of the liver and liver tumor. Public benchmark with leaderboard at Codalab.org (Patrick Christ) [Before 28/12/19]
Materials in Context (MINC) - The Materials in Context Database (MINC) builds on OpenSurfaces, but includes millions of point annotations of material labels. (Sean Bell, Paul Upchurch, Noah Snavely, Kavita Bala) [Before 28/12/19]
Multi-species fruit flower detection - This dataset consists of four sets of flower images, from three different tree species: apple, peach, and pear, and accompanying ground truth images. (Philipe A. Dias, Amy Tabb, Henry Medeiros) [Before 28/12/19]
Objects with thin and elongated parts - The three datasets used to evaluate our method Oriented Image Foresting Transform with Connectivity Constraints, which contain objects with thin and elongated parts. These databases are composed of 280 public images of birds and insects with ground truths. (Lucy A. C. Mansilla (IME-USP), Paulo A. V. Miranda) [Before 28/12/19]
OpenSurfaces - OpenSurfaces consists of tens of thousands of examples of surfaces segmented from consumer photographs of interiors, and annotated with material parameters, texture information, and contextual information . (Kavita Bala et al.) [Before 28/12/19]
Osnabrück gaze tracking data - 318 video sequences from several different gaze tracking data sets with polygon based object annotation. (Schöning, Faion, Heidemann, Krumnack, Gert, Açik, Kietzmann, Heidemann & König) [Before 28/12/19]
PASCAL-Scribble Dataset - Our PASCAL-Scribble Dataset provides scribble-annotations on 59 object/stuff categories. (Di Lin) [Before 28/12/19]
PetroSurf3D - 26 high resolution (sub-millimeter accuracy) 3D scans of rock art with pixelwise labeling of petroglyphs for segmentation. (Poier, Seidl, Zeppelzauer, Reinbacher, Schaich, Bellandi, Marretta, Bischof) [Before 28/12/19]
SAIL-VOS - The Semantic Amodal Instance Level Video Object Segmentation (SAIL-VOS) dataset provides accurate ground truth annotations to develop methods for reasoning about occluded parts of objects while enabling to take temporal information into account (Hu, Chen, Hui, Huang, Schwing) [29/12/19]
Shadow Detection/Texture Segmentation Computer Vision Dataset - Video based sequences for shadow detection/suppression, with ground truth (Newey, C., Jones, O., & Dee, H. M.) [Before 28/12/19]
SYNTHIA - Large set (~half million) of virtual-world images for training autonomous cars to see. (ADAS Group at Computer Vision Center) [Before 28/12/19]
Stony Brook University Shadow Dataset (SBU-Shadow5k) - Large scale shadow detection dataset from a wide variety of scenes and photo types, with human annotations (Tomas F.Y. Vicente, Le Hou, Chen-Ping Yu, Minh Hoai, Dimitris Samaras) [Before 28/12/19]
TRoM: Tsinghua Road Markings - This is a dataset which contributes to the area of road marking segmentation for Automated Driving and ADAS. (Xiaolong Liu, Zhidong Deng, Lele Cao, Hongchao Lu) [Before 28/12/19]
UVA Intrinsic Images and Semantic Segmentation Dataset - RGB dataset with ground-truth albedo, shading, and semantic annotations (Baslamisli, Groenestege, Das, Le, Karaoglu, Gevers)> [Before 28/12/19]
VOS - A dataset with 200 Internet videos for video-based salient object detection and segmentation. (Jia Li, Changqun Xia) [Before 28/12/19]
XPIE - An image dataset with 10000 images containing manually annotated salient objects and 8596 containing no salient objects. (Jia Li, Changqun Xia) [Before 28/12/19]

Simultaneous Localization and Mapping

Collaborative SLAM Dataset (CSD) - The dataset consists of four different subsets - Flat, House, Priory and Lab - each containing several RGB-D sequences that can be reconstructed and successfully relocalised against each other to form a combined 3D model. Each sequence was captured using an Asus ZenFone AR, and we provide an accurate local 6D pose for each RGB-D frame in the dataset. We also provide the calibration parameters for the depth and colour sensors, optimised global poses for the sequences in each subset, and a pre-built mesh of each sequence. (Golodetz, Cavallari, Lord, Prisacariu, Murray, Torr) [Before 28/12/19]
Event-Camera Data for Pose Estimation, Visual Odometry, and SLAMThe data also include intensity images, inertial measurements, and ground truth from a motion-capture system. (ETH) [Before 28/12/19]
EVIMO - Dataset for motion segmentation, egomotion estimation and tracking using an event camera; the dataset is collected with DAVIS 346C and provides 3D poses for camera and independently moving objects, and pixelwise motion segmentation masks. (Mitrokhin, Ye, Fermuller, Aloimonos, Delbruck) [14/1/20]
House3D - House3D is a virtual 3D environment which consists of thousands of indoor scenes equipped with a diverse set of scene types, layouts and objects sourced from the SUNCG dataset. It consists of over 45k indoor 3D scenes, ranging from studios to two-storied houses with swimming pools and fitness rooms. All 3D objects are fully annotated with category labels. Agents in the environment have access to observations of multiple modalities, including RGB images, depth, segmentation masks and top-down 2D map views. The renderer runs at thousands frames per second, making it suitable for large-scale RL training. (Yi Wu, Yuxin Wu, Georgia Gkioxari, Yuandong Tian, facebook research) [Before 28/12/19]
Indoor Dataset of Quadrotor with Down-Looking Camera - This dataset contains the recording of the raw images, IMU measurements as well as the ground truth poses of a quadrotor flying a circular trajectory in an office size environment. (Scaramuzza, ETH Zurich, University of Zurich) [Before 28/12/19]
InLoc - Benchmark for evaluating the accuracy of 6DoF visual localization algorithms in challenging indoor scenarios. (Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, Akihiko Torii) [Before 28/12/19]
Long-term visual localization - TBenchmark for evaluating visual localization and mapping algorithms under various illumination and seasonal condition. (Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, Fredrik Kahl, Tomas Pajdla) [Before 28/12/19]
PanoNavi dataset - A panoramic dataset for robot navigation, consisted of 5 videos lasting about 1 hour. (Lingyan Ran) [Before 28/12/19]
RAWSEEDS SLAM benchmark datasets (Rawseeds Project) [Before 28/12/19]
Rijksmuseum Challenge 2014 - It consist of 100K art objects from the rijksmuseum and comes with an extensive xml files describing each object. (Thomas Mensink and Jan van Gemert) [Before 28/12/19]
RSM dataset of Visual Paths - Visual dataset of indoor spaces to benchmark localisation/navigation methods. It consists of 1.5 km of corridors and indoor spaces with ground truth for every frame, measured as distance in centimetres from starting point. Includes a synthetically generated corridor for benchmark. (Jose Rivera-Rubio, Ioannis Alexiou, Anil A. Bharath) [Before 28/12/19]
The Multi Vehicle Stereo Event Camera Dataset - Multiple sequences containing a stereo pair of DAVIS 346b event cameras with ground truth poses, depth maps and optical flow. (lex Zihao Zhu, Dinesh Thakur, Tolga Ozaslan, Bernd Pfrommer, Vijay Kumar, Kostas Daniilidis) [Before 28/12/19]
TUM RGB-D Benchmark - Dataset and benchmark for the evaluation of RGB-D visual odometry and SLAM algorithms (BCrgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard and Daniel Cremers) [Before 28/12/19]
TUM VI Benchmark - 28 sequences, indoor and outdoor, sensor data from stereo camera and IMU, accurate ground truth at beginning and end segments. (David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, Joerg Stueckler, Daniel Cremers) [Before 28/12/19]
Visual Odometry / SLAM Evaluation - The odometry benchmark consists of 22 stereo sequences (Andreas Geiger and Philip Lenz and Raquel Urtasun) [Before 28/12/19]
Visual Odometry Dataset with Plenoptic and Stereo Data - The dataset contains 11 sequences recorded by a hand-held platform consisting of a plenoptic camera and a pair of stereo cameras. The sequences comprising different indoor and outdoor sequences with trajectory length ranging from 25 meters up to several hundred meters. The recorded sequences show moving objects as well as changing lighting conditions. (Niclas Zeller and Franz Quint, Hochschule Karlsruhe, Karlsruhe University of Applied Sciences) [Before 28/12/19]

Surveillance and Tracking

A collection of challenging motion segmentation benchmark datasets - These datasets enclose real-life long and short sequences, with increased number of motions and frames per sequence, and also real distortions with missing data. The ground truth is provided on all the frames of all the sequences. (Muhammad Habib Mahmood, Yago Diez, Joaquim Salvi, Xavier Llado) [Before 28/12/19]
ATOMIC GROUP ACTIONS dataset - (Ricky J. Sethi et al.) [Before 28/12/19]
AUT MULTIDRONE video dataset for racing bicycle detection/tracking from UAV footage - 7 Youtube videos (resolution: 1920 x 1080) at 25fps (Mademlis) [Before 28/12/19]
AVSS07: Advanced Video and Signal based Surveillance 2007 datasets (Andrea Cavallaro) [Before 28/12/19]
Activity modeling and abnormality detection dataset - The dataset containes a 45 minutes video with annotated anomalies. (Jagan Varadarajan and Jean-Marc Odobez) [Before 28/12/19]
Background subtraction - a list of datasets about background subtraction(Thierry BOUWMANS ) [Before 28/12/19]
CAMO-UOW Dataset - 10 high resolution videos captured in real scenes for camouflaged background subtraction (Shuai Li and Wanqing Li) [Before 28/12/19]
CCTV-Fights - 1,000 videos picturing real-world fights, recorded from CCTVs or mobile cameras, and temporally annotated at the frame level. (Mauricio Perez, ROSE Lab, NTU) [Before 28/12/19]
CMUSRD: Surveillance Research Dataset - multi-camera video for indoor surveillance scenario (K. Hattori, H. Hattori, et al) [Before 28/12/19]
DukeMTMC: Duke Multi-Target Multi-Camera tracking dataset - 8 cameras, 85 min, 2m frames, 2000 people of video (Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, Carlo Tomasi) [Before 28/12/19]
DukeMTMC-reID - A subset of the DukeMTMC for image-based person re-identification (8 cameras,16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images.) (Zheng, Zheng, and Yang) [Before 28/12/19]
ETISEO Video Surveillance Download Datasets (INRIA Orion Team and others) [Before 28/12/19]
FMO dataset - FMO dataset contains annotated video sequences with Fast Moving Objects - objects which move over a projected distance larger than their size in one frame. (Denys Rozumnyi, Jan Kotera, Lukas Novotny, Ales Hrabalik, Filip Sroubek, Jiri Matas) [Before 28/12/19]
HDA+ Multi-camera Surveillance Dataset - video from a network of 18 heterogeneous cameras (different resolutions and frame rates) distributed over 3 floors of a research institute with 13 fully labeled sequences, 85 persons, and 64028 bounding boxes of persons. (D. Figueira, M. Taiana, A. Nambiar, J. Nascimento and A. Bernardino) [Before 28/12/19]
Human click data - 20K human clicks on a tracking target (including click errors) (Zhu and Porikli) [Before 28/12/19]
Immediacy Dataset - This dataset is designed for estimation personal relationships. (Xiao Chu et al.) [Before 28/12/19]
MAHNOB Databases -including Laughter Database,HCI-tagging Database,MHI-Mimicry Database( M. Pantic. etc.) [Before 28/12/19]
Moving INfants In RGB-D (MINI-RGBD) - A synthetic, realistic RGB-D data set for infant pose estimation containing 12 sequences of moving infants with ground truth joint positions. (N. Hesse, C. Bodensteiner, M. Arens, U. G. Hofmann, R. Weinberger, A. S. Schroeder) [Before 28/12/19]
MSMT17 - Person re-identification dataset. 180 hours of videos, 12 outdoor cameras, 3 indoor cameras, and 12 time slots. (Wei Longhui, Zhang Shiliang, Gao Wen, Tian Qi) [Before 28/12/19]
MULTIDRONE boat detection/tracking - 3 HD videos (720p - 1280 x 720) subsamplbed at 25 fps (Mademlis,) [Before 28/12/19]
MVHAUS-PI - a multi-view human interaction recognition dataset (Saeid et al.) [Before 28/12/19]
Multispectral visible-NIR video sequences - Annotated multispectral video, visible + NIR (LE2I, Universit de Bourgogne) [Before 28/12/19]
Openvisor - Video surveillance Online Repository (Univ of Modena and Reggio Emilia) [Before 28/12/19]
Parking-Lot dataset - Parking-Lot dataset is a car dataset which focus on moderate and heavily occlusions on cars in the parking lot scenario. (B. Li, T.F. Wu and S.C. Zhu) [Before 28/12/19]
Pornography Database - The Pornography database is a pornography detection dataset containing nearly 80 hours of 400 pornographic and 400 non-pornographic videos extracted from pornography websites and Youtube. (Avila, Thome, Cord, Valle, de Araujo) [Before 28/12/19]
Princeton Tracking Benchmark - 100 RGBD tracking datasets (Song and Xiao) [Before 28/12/19]
QMUL Junction Dataset 1 and 2 - Videos of busy road junctions. Supports anomaly detection tasks. (T. Hospedales Edinburgh/QMUL) [Before 28/12/19]
Queen Mary Multi-Camera Distributed Traffic Scenes Dataset (QMDTS) - The QMDTS is collected from urban surveillance environment for the study of surveillance behaviours in distributed scenes. (Dr. Xun Xu. Prof. Shaogang Gong and Dr. Timothy Hospedales) [Before 28/12/19]
Road Anomaly Detection - 22km, 11 vehicles, normal + 4 defect categories (Hameed, Mazhar, Hassan) [Before 28/12/19]
S-Hock dataset - A new Benchmark for Spectator Crowd Analysis. (Francesco Setti, Davide Conigliaro, Paolo Rota, Chiara Bassetti, Nicola Conci, Nicu Sebe, Marco Cristani) [Before 28/12/19]
SALSA: Synergetic sociAL Scene Analysis - A Novel Dataset for Multimodal Group Behavior Analysis(Xavier Alameda-Pineda etc.) [Before 28/12/19]
SBMnet (Scene Background Modeling.NET) - A dataset for testing background estimation algorithms(Jodoin, Maddalena, and Petrosino) [Before 28/12/19]
SBM-RGBD dataset - 35 Kinect indoor RGBD videos to evaluate and compare scene background modelling methods for moving object detection (Camplani, Maddalena, Moy?? Alcover, Petrosino, Salgado) [Before 28/12/19]
SCOUTER - video surveillance ground truthing (shifting perspectives, different setups/lighting conditions, large variations of subject). 30 videos and approximately 36,000 manually labeled frames. (Catalin Mitrea) [Before 28/12/19]
SJTU-BESTOne surveillance-specified datasets platform with realistic, on-using camera-captured, diverse set of surveillance images and videos (Shanghai Jiao Tong University) [Before 28/12/19]
SPEVI: Surveillance Performance EValuation Initiative (Queen Mary University London) [Before 28/12/19]
Shinpuhkan 2014 - A Person Re-identification dataset containing 22,000 images of 24 people captured by 16 cameras. (Yasutomo Kawanishi et al.) [Before 28/12/19]
Stanford Drone Dataset - 60 images and videos of various types of agents (not just pedestrians, but also bicyclists, skateboarders, cars, buses, and golf carts) that navigate in a real world outdoor environment such as a university campus (Robicquet, Sadeghian, Alahi, Savarese) [Before 28/12/19]
Stuttgart Artificial Background Subtraction Dataset [Before 28/12/19]
Tracking in extremely cluttered scenes - this single object tracking dataset has 28 highly cluttered sequences with per frame annotation(Jingjing Xiao,Linbo Qiao,Rustam Stolkin,Ale Leonardis) [Before 28/12/19]
TrackingNet - Large-scale dataset for tracking in the wild: more than 30k annotated sequences for training, more than 500 sequestered sequences for testing, evaluation server and leaderboard for fair ranking. (Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi and Bernard Ghanem) [Before 28/12/19]
UCF-Crime Dataset: Real-world Anomaly Detection in Surveillance Videos - A large-scale dataset for real-world anomaly detection in surveillance videos. It consists of 1900 long and untrimmed real-world surveillance videos (of 128 hours), with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. (Center for Research in Computer Vision, University of Central Florida) [Before 28/12/19]
UCLA Aerial Event Dataset - Human activities in aerial videos with annotations of people, objects, social groups, activities and roles (Shu, Xie, Rothrock, Todorovic, and Zhu) [Before 28/12/19]
UCSD Anomaly Detection Dataset - a stationary camera mounted at an elevation, overlooking pedestrian walkways, with unusual pedestrian or non-pedestrian motion. [Before 28/12/19]
UCSD trajectory clustering and analysis datasets - (Morris and Trivedi) [Before 28/12/19]
USC Information Sciences Institute's ATOMIC PAIR ACTIONS dataset - (Ricky J. Sethi et al.) [Before 28/12/19]
Udine Trajectory-based anomalous event detection dataset - synthetic trajectory datasets with outliers (Univ of Udine Artificial Vision and Real Time Systems Laboratory) [Before 28/12/19]
Visual Tracker Benchmark - 100 object tracking sequences with ground truth with Visual Tracker Benchmark evaluation, including tracking results from a number of trackers (Wu, Lim, Yang) [Before 28/12/19]
WIDER Attribute Dataset - WIDER Attribute is a large-scale human attribute dataset, with 13789 images belonging to 30 scene categories, and 57524 human bounding boxes each annotated with 14 binary attributes. (Li, Yining and Huang, Chen and Loy, Chen Change and Tang, Xiaoou) [Before 28/12/19]

Textures

Brodatz Texture, Normalized Brodatz Texture, Colored Brodatz Texture, Multiband Brodatz Texture 154 new images plus 112 original images with various transformations (A. Safia, D. He) [Before 28/12/19]
Color texture images by category (textures.forrest.cz) [Before 28/12/19]
Columbia-Utrecht Reflectance and Texture Database (Columbia & Utrecht Universities) [Before 28/12/19]
DynTex: Dynamic texture database (Renaud Piteri, Mark Huiskes and Sandor Fazekas) [Before 28/12/19]
Houses dataset - Benchmark dataset for houses prices that contains both visual and textual information about 535 houses. (Ahmed, Eman and Moustafa, Mohamed) [Before 28/12/19]
Intrinsic Images in the Wild (IIW) - Intrinsic Images in the Wild, is a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes (Sean Bell, Kavita Bala, Noah Snavely) [Before 28/12/19]
KTH TIPS & TIPS2 textures - pose/lighting/scale variations (Eric Hayman) [Before 28/12/19]
Materials in Context (MINC) - The Materials in Context Database (MINC) builds on OpenSurfaces, but includes millions of point annotations of material labels. (Sean Bell, Paul Upchurch, Noah Snavely, Kavita Bala) [Before 28/12/19]
OpenSurfaces - OpenSurfaces consists of tens of thousands of examples of surfaces segmented from consumer photographs of interiors, and annotated with material parameters, texture information, and contextual information . (Kavita Bala et al.) [Before 28/12/19]
Oulu Texture Database (Oulu University) [Before 28/12/19]
Oxford Describable Textures Dataset - 5640 images in 47 categories (M.Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi) [Before 28/12/19]
Prague Texture Segmentation Data Generator and Benchmark (Mikes, Haindl) [Before 28/12/19]
Salzburg Texture Image Database (STex) - a large collection of 476 color texture image that have been captured around Salzburg, Austria. (Roland Kwitt and Peter Meerwald) [Before 28/12/19]
Synthetic SVBRDFs and renderings - The dataset contains 200000 renderings of 20000 different materials associated with their ground truth representation in the Cook-Torrance model. Distributed under research only, non commercial use license. ("GraphDeco" team, Inria) [Before 28/12/19]
Texture DatabaseThe texture database features 25 texture classes, 40 samples each(Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce) [Before 28/12/19]
Uppsala texture dataset of surfaces and materials - fabrics, grains, etc. [Before 28/12/19]
Vision Texture (MIT Media Lab) [Before 28/12/19]

Urban Datasets

Barcelona - 15,150 images, urban views of Barcelona (Tighe and Lazebnik) [Before 28/12/19]
Cityscapes - a large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5.000 frames in addition to a larger set of 20.000 weakly annotated frames. (Cityscpes Team) [Before 28/12/19]
CMP Facade Database - Includes 606 rectified images of facades from various places with 12 architectural classes annotated. (Radim Tylecek) [Before 28/12/19]
DeepGlobe Satellite Image Understanding Challenge - Datasets and evaluation platforms for three deep learning tasks on satellite images: road extraction, building detection, and land type classification. (Demir, Ilke and Koperski, Krzysztof and Lindenbaum, David and Pang, Guan and Huang, Jing and Basu, Saikat and Hughes, Forest and Tuia, Devis and Raskar, Ramesh) [Before 28/12/19]
DroNet: Learning to Fly by Driving - Videos from a bicycle with labeled collision data used for learning to predict potentially dangerous situations for vehicles. (Loquercio, Maqueda, Del Blanco, Scaramuzza) [Before 28/12/19]
European Flood 2013 - 3,710 images of a flood event in central Europe, annotated with relevance regarding 3 image retrieval tasks (multi-label) and important image regions. (Friedrich Schiller University Jena, Deutsches GeoForschungsZentrum Potsdam) [Before 28/12/19]
Houses dataset - Benchmark dataset for houses prices that contains both visual and textual information about 535 houses. (Ahmed, Eman and Moustafa, Mohamed) [Before 28/12/19]
LM+SUN - 45,676 images, mainly urban or human related scenes (Tighe and Lazebnik) [Before 28/12/19]
MIT CBCL StreetScenes Challenge Framework: (Stan Bileschi) [Before 28/12/19]
Queen Mary Multi-Camera Distributed Traffic Scenes Dataset (QMDTS) - The QMDTS is collected from urban surveillance environment for the study of surveillance behaviours in distributed scenes. (Dr. Xun Xu. Prof. Shaogang Gong and Dr. Timothy Hospedales) [Before 28/12/19]
Robust Global Translations with 1DSfMthe numerical data describing global structure from motion problems for each dataset (Kyle Wilson and Noah Snavely) [Before 28/12/19]
Sift Flow (also known as LabelMe Outdoor, LMO) - 2688 images, mainly outdoor natural and urban (Tighe and Lazebnik) [Before 28/12/19]
Street-View Change Detection with Deconvolutional Networks - Database with aligned image pairs from street-view imagery with structural,lighting, weather and seasonal changes. (Pablo F. Alcantarilla, Simon Stent, German Ros, Roberto Arroyo and Riccardo Gherardi) [Before 28/12/19]
SydneyHouse - Streetview house images with accurate 3D house shape, facade object label, dense point correspondence, and annotation toolbox. (Hang Chu, Shenlong Wang, Raquel Urtasun,Sanja Fidler) [Before 28/12/19]
Traffic Signs Dataset - recording sequences from over 350 km of Swedish highways and city roads (Fredrik Larsson) [Before 28/12/19]
nuTonomy scenes dataset (nuScenes) - The nuScenes dataset is a large-scale autonomous driving dataset. It features: Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS), 1000 scenes of 20s each, 1,440,000 camera images, 400,000 lidar sweeps, two diverse cities: Boston and Singapore, left versus right hand traffic, detailed map information, manual annotations for 25 object classes, 1.1M 3D bounding boxes annotated at 2Hz, attributes such as visibility, activity and pose. (Caesar et al) [Before 28/12/19]
TUM City Campus - Urban point clouds taken by Mobile Laser Scanning (MLS) for classification, object extraction and change detection (Stilla, Hebel, Xu, Gehrung) [3/1/20]

Vision and Natural Language

INRIA BL-database - an audio-visual speech corpus multimodal automatic speech recognition, audio/visual synchronization or speech-driven lip animation systems (Benezeth, Bachman, Lejan, Souviraa-Labastie, Bimbot) [Before 28/12/19]
CrisisMMD: Multimodal Twitter Datasets from Natural Disasters - The CrisisMMD multimodal Twitter dataset consists of several thousands of manually annotated tweets and images collected during seven major natural disasters including earthquakes, hurricanes, wildfires, and floods that happened in the year 2017 across different parts of the World. (Firoj Alam, Ferda Ofli, Muhammad Imran) [Before 28/12/19]
DAQUAR - A dataset of human question answer pairs about images, which manifests our vision on a Visual Turing Test. (Mateusz Malinowski, Mario Fritz) [Before 28/12/19]
Dataset of Structured Queries and Spatial Relations - Dataset of structured queries about images with the emphasise on spatial relations. (Mateusz Malinowski, Mario Fritz) [Before 28/12/19]
DVQA: Understanding Data Visualization through Question Answering - a dataset for VQA about bar charts: 3 types of questions, 300,000 Images, 3,487,194 question-answer pairs, detailed metadata (Kafle, Cohen, Price, Kanan) [Before 28/12/19]
Multimodal Ferramenta dataset - 88010 images belonging to 52 classes described using more than 20K different words (Gallo, Calefati, Nawaz) [Before 28/12/19]
FigureQA - a dataset for VQA about bar and pie charts, and numerical graphs: 100,000 images, 1,327,368 question-answer pairs, 100 colors and figure plot element names, 15 question types (Kahou, Michalski, Atkinson, Kadar, Trischler, Bengio) [Before 28/12/19]
Hannah and her sisters database - a dense audio-visual person-oriented ground-truth annotation of faces, speech segments, shot boundaries (Patrick Perez, Technicolor) [Before 28/12/19]
Large scale Movie Description Challenge (LSMDC) - A large scale dataset and challenge for movie description, including over 128K video-sentence pairs, mainly sourced from Audio Description (also known as DVS). (Rohrbach, Torabi, Rohrbach, Tandon, Pal, Larochelle, Courville and Schiele) [Before 28/12/19]
MPII dataset - A dataset about correcting inaccurate sentences based on the videos. (Amir Mazaheri) [Before 28/12/19]
MPI Movie Description dataset - text and video - A dataset of movie clips associated with natural language descriptions sourced from movie scripts and Audio Description. (Rohrbach, Rohrbach, Tandon and Schiele) [Before 28/12/19]
nocaps - a large-scale benchmark for novel object captioning; the task of describing images containing visual concepts not seen in paired image-caption training data (Agrawal, Desai, Wang, Chen, Jain, Johnson, Batra, Parikh, Lee, Anderson) [2/1/20]
Recipe1M - A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images - Recipe1M is a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. (Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, Antonio Torralba) [Before 28/12/19]
Room to Room (R2R) dataset for Vision-and-Language Navigation - A corpus of visually-grounded natural language navigation instructions paired with trajectories in reconstructed indoor buildings from the Matterport3D dataset (Anderson, Wu, Teney, Bruce, Johnson, Sunderhauf, Reid, Gould, van den Hengel) [2/1/20]
SemArt dataset - A dataset for semantic art understanding, including 21,384 fine-art painting images with attributes and artistic comments. (Noa Garcia, George Vogiatzis) [Before 28/12/19]
SpatialSense - a dataset of spatial relations in 2D images, which is constructed with the goal of reducing dataset bias and sampling more challenging relations in the long tail (Yang, Russakovsky, Deng) [2/1/20]
TACoS Multi-Level Corpus - Dataset of cooking videos associated with natural language descriptions at three levels of detail (long, short and single sentence). (Rohrbach, Rohrbach, Qiu, Friedrich, Pinkal and Schiele) [Before 28/12/19]
TallyQA - The largest dataset for open-ended counting as of 2018, and it includes test sets that evaluate both simple and more advanced capabilities. (Manoj Acharya, Kushal Kafle, Christopher Kanan) [Before 28/12/19]
TDIUC (Task-driven image understanding) - As of 2018, this is the largest VQA dataset and it faciliates analysis for 12 kinds of questions. (Kushal Kafle, Christopher Kanan) [Before 28/12/19]
TGIF - 100K animated GIFs from Tumblr and 120K natural language descriptions. (Li, Song, Cao, Tetreault, Goldberg, Jaimes, Luo) [Before 28/12/19]
Toronto COCO-QA Dataset - Automatically generated from image captions. 123287 images 78736 train questions 38948 test questions 4 types of questions: object, number, color, location Answers are all one-word. (Mengye Ren, Ryan Kiros, Richard Zemel) [Before 28/12/19]
Totally Looks Like - A benchmark for assessment of predicting human-based image similarity (Amir Rosenfeld, Markus D. Solbach, John Tsotsos) [Before 28/12/19]
Twitter for Sentiment Analysis (T4SA) - About 1 million tweets (text and associated images) labelled according to the sentiment polarity of the text; the data can be used for sentiment analysis as well as other analysis in the wild since the tweets were randomly sampled tweets from the stream of all globally produced tweets. (Lucia Vadicamo, Fabio Carrara, Andrea Cimino, Stefano Cresci, Felice Dell'Orletta, Fabrizio Falchi, Maurizio Tesconi) [Before 28/12/19]
UCF-CrossView Dataset: Cross-View Image Matching for Geo-localization in Urban Environments - A new dataset of street view and bird's eye view images for cross-view image geo-localization. (Center for Research in Computer Vision, University of Central Florida) [Before 28/12/19]
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Visual Genome is a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language. (Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li Jia-Li, David Ayman Shamma, Michael Bernstrein, Li Fei-Fei) [Before 28/12/19]
Visual Relationship Detection with Language Priors - 5000 images, 37,993 thousand relationships, 100 object categories, 70 predicate categories (Lu, Krishna, Bernstein, Fei-Fei) [Before 28/12/19]
VQA: Visual Question Answering - a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. (Yash Goyal, Tejas Khot, Georgia Institute of Technology, Army Research Laboratory, Virginia Tech) [Before 28/12/19]
VQA v1 - VQA: Visual Question Answering - For every image, we collected 3 free-form natural-language questions with 10 concise open-ended answers each. We provide two formats of the VQA task. (Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu) [Before 28/12/19]
YouCook2 - 2000 long YouTube cooking videos, where each recipe step is temporally localized and described by an imperative English sentence. Bounding box annotations are available for the validation & test splits. (Luowei Zhou, Chenliang Xu, and Jason Corso) [Before 28/12/19]
YouTube Movie Summaries - movie summary videos from YouTube, annotated with the correspondence between the video segments and the movie synopsis text at the sentence level and the phrase level. (Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross) [Before 28/12/19]

Other Collections

4D Light Field Dataset - 24 synthetic scenes with 9x9x512x512x3 input images, depth and disparity ground truth, camera parameters, and evaluation masks. (Katrin Honauer, Ole Johannsen, Daniel Kondermann, Bastian Goldluecke) [Before 28/12/19]
AMADI_LontarSet - Balinese Palm Leaf Manuscript Images Dataset for Binarization, Query-by-Example Word Spotting, and Isolated Character Recognition of Balinese Script. (The AMADI Project et al.) [Before 28/12/19]
Annotated Web Ears Dataset (AWE Dataset) - All images were acquired by cropping ears from images from the internet of known persons. ( Ziga Emersic, Vitomir Struc and Peter Peer) [Before 28/12/19]
Biometrics Evaluation and Testing - Evaluation of identification technologies, including Biometrics( European computing e-infrastructure) [Before 28/12/19]
CALVIN research group datasets - object detection with eye tracking, imagenet bounding boxes, synchronised activities, stickman and body poses, youtube objects, faces, horses, toys, visual attributes, shape classes (CALVIN group) [Before 28/12/19]
CANTATA Video and Image Database Index site (Multitel) [Before 28/12/19]
Chinese University of Hong Kong datasets - Face sketch, face alignment, image search, public square observation, occlusion, central station, MIT single and multiple camera trajectories, person re-identification (Multimedia lab) [Before 28/12/19]
Computer Vision Homepage list of test image databases (Carnegie Mellon Univ) [Before 28/12/19]
Computer Vision Lab OCR DataBase (CVL OCR DB) - CVL OCR DB is a public annotated image dataset of 120 binary annotated images of text in natural scenes. (Andrej Ikica and Peter Peer.) [Before 28/12/19]
ETHZ various datasets - including ETH 3D head pose, BIWI audiovisual data, ETHZ shape classes, BIWI walking pedestrians, pedestrians, buildings, 4D MRI, personal events, liver untrasound, Food 101. (ETH Zurich, Computer Vision Lab) [Before 28/12/19]
Event-Camera Dataset - This presents the world's first collection of datasets with an event-based camera for high-speed robotics (E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, D. Scaramuzza) [Before 28/12/19]
Finger Vein USM (FV-USM) Database - An infrared finger image database consists of finger vein and also finger geometry information. (Bakhtiar Affendi Rosdi, Universiti Sains Malaysia) [Before 28/12/19]
General 100 Dataset - General-100 dataset contains 100 bmp-format images (with no compression), which are well-suited for super-resolution training(Dong, Chao and Loy, Chen Change and Tang, Xiaoou) [Before 28/12/19]
GPDS Bengali and Devanagari Synthetic Signature Databases - Dual Off line and On line signature databases of Bengali and Devanagari signatures. (Miguel A. Ferrer, GPDS, ULPGC) [Before 28/12/19]
GPDS Synthetic OnLine and OffLine Signature database - Dual Off line and On line Latin signature database. (Miguel A. Ferrer, GPDS, ULPGC) [Before 28/12/19]
HKU-IS - 4447 images with pixel labeling groundtruth for salient object detection. (Guanbin Li, Yizhou Yu) [Before 28/12/19]
High-res 3D-Models - it includes high-res renderings of these data-sets. ( Hubert etc.) [Before 28/12/19]
I3 - Yahoo Flickr Creative Commons 100M - This dataset contains a list of photos and videos. (B. Thomee, D.A. Shamma, G. Friedland et al.) [Before 28/12/19]
The Int. Assoc. for Pattern Recognition's Technical Committee TC11 on Reading Systems index of datasets concerning document text reading [Before 28/12/19]
IDIAP dataset collection - 26 different datasets - multimodal, attack, biometric, cursive characters, discourse, eye gaze, posters, maya codex, MOBIO, face spoofing, game playing, finger vein, youtube-personality traits (IDIAP team) [Before 28/12/19]
Kinect v2 dataset - Dataset for evaluating unwrapping in kinect2 depth decoding (Felix etc.) [Before 28/12/19]
Laval HDR Sky Database - The database contains 800 hemispherical, full HDR photos of the sky that can be used for outdoor lighting analysis. (Jean-Francois Lalonde et al.) [Before 28/12/19]
Leibe's Collection of people/vehicle/object databases (Bastian Leibe) [Before 28/12/19]
Lotus Hill Image Database Collection with Ground Truth (Sealeen Ren, Benjamin Yao, Michael Yang) [Before 28/12/19]
MIT Saliency Benchmark dataset - collection (pointers to 23 datasets) (Bylinskii, Judd, Borji, Itti, Durand, Oliva, Torralba} [Before 28/12/19]
Michael Firman's List of RGBD datasets [Before 28/12/19]
Msspoof:2D multi-spectral face spoofing - Presentation attack (spoofing) dataset with samples from both real data subjects and spoofed data subjects performed with paper to a NIR and VIS camera(Idiap research institute) [Before 28/12/19]
Multiview Stereo Evaluation - Each dataset is registered with a "ground-truth" 3D model acquired via a laser scanning process(Steve Seitz et al) [Before 28/12/19]
Oxford Misc, including Buffy, Flowers, TV characters, Buildings, etc (Oxford Visual geometry Group) [Before 28/12/19]
PEIPA Image Database Summary (Pilot European Image Processing Archive) [Before 28/12/19]
PalmVein spoofing - Presentation attack (spoofing) dataset with samples from spoofed data subjects (corresponding to VERA Palmvein) performed with paper(Idiap research institute) [Before 28/12/19]
RSBA dataset - Sequences for evaluating rolling shutter bundle adjustment (Per-Erik etc.) [Before 28/12/19]
Replay Attack:2D face spoofing - Presentation attack (spoofing) dataset with samples from both real data subjects and spoofed data subjects performed with paper, photos and videos from a mobile device to a laptop. (Idiap research institute) [Before 28/12/19]
Replay Mobile:2D face spoofing - Presentation attack (spoofing) dataset with samples from both real data subjects and spoofed data subjects performed with paper, photos and videos to/from a mobile device. (Idiap research institute) [Before 28/12/19]
Synthetic Sequence Generator - Synthetic Sequence Generator (G. Hamarneh) [Before 28/12/19]
USC Annotated Computer Vision Bibliography database publication summary (Keith Price) [Before 28/12/19]
USC-SIPI image databases: texture, aerial, favorites (eg. Lena) (USC Signal and Image Processing Institute) [Before 28/12/19]
Univ of Bern databases on handwriting, online documents, string edit and graph matching (Univ of Bern, Computer Vision and Artificial Intelligence) [Before 28/12/19]
VERA Fingervein spoofing - Presentation attack (spoofing) dataset with samples from spoofed data subjects (corresponding to VERA Fingervein) performed with paper(Idiap research institute) [Before 28/12/19]
VERA Fingervein - Fingervein dataset with data subjects recorded with a open fingervein sensor(Idiap research institute) [Before 28/12/19]
VERA PalmVein:PalmVein - Palmvein dataset with data subjects recorded with a open palmvein sensor(Idiap research institute) [Before 28/12/19]
Vehicle Detection in Aerial Imagery - VEDAI is a dataset for Vehicle Detection in Aerial Imagery, provided as a tool to benchmark automatic target recognition algorithms in unconstrained environments. (Sebastien Razakarivony and Frederic Jurie) [Before 28/12/19]
Video Stacking Dataset - Dataset for evaulating video stacking on cell-phones (Erik Ringaby etc.) [Before 28/12/19]
World from a cat perspective - videos recorded from the head of a freely behaving cat (Belinda Y. Betsch, Wolfgang Einh?user) [Before 28/12/19]
Wrist-mounted camera video dataset - Activities of Daily Living videos captured from a wrist- mounted camera and a head-mounted camera(Katsunori Ohnishi, Atsushi Kanehira,Asako Kanezaki, Tatsuya Harada) [Before 28/12/19]
Yummly-10k dataset - The goal was to understand human perception, in this case of food taste similarity. (SE(3) Computer Vision Group at Cornell Tech) [Before 28/12/19]

Miscellaneous

3D mesh watermarking benchmark dataset (Guillaume Lavoue) [Before 28/12/19]
4D Light Field Dataset - 24 synthetic scenes with 9x9x512x512x3 input images, depth and disparity ground truth, camera parameters, and evaluation masks. (Katrin Honauer, Ole Johannsen, Daniel Kondermann, Bastian Goldluecke) [Before 28/12/19]
A Dataset for Real Low-Light Image Noise Reduction - It contains pixel and intensity aligned pairs of images corrupted by low-light camera noise and their low-noise counterparts. (J. Anaya, A. Barbu) [Before 28/12/19]
AF 4D dataset - Based on our observations, we settled on 10 representative scenes that are categorized into three types: (1) scenes containing no face (NF), (2) scenes with a face in the foreground (FF), and (3) scenes with faces in the background (FB). For each of these scenes, we allowed different arrangements in terms of textured backgrounds, whether the camera moves, and how many types of objects in the scene change their directions (referred to as motion switches). (Abdullah Abuolaim, York University) [Before 28/12/19]
AMADI_LontarSet - Balinese Palm Leaf Manuscript Images Dataset for Binarization, Query-by-Example Word Spotting, and Isolated Character Recognition of Balinese Script. (The AMADI Project et al.) [Before 28/12/19]
Active Appearance Models datasets (Mikkel B. Stegmann) [Before 28/12/19]
Aircraft tracking (Ajmal Mian) [Before 28/12/19]
Annotated Web Ears Dataset (AWE Dataset) - All images were acquired by cropping ears from images from the internet of known persons. (Ziga Emersic, Vitomir Struc and Peter Peer) [Before 28/12/19]
CITIUS Video Database - A database of 72 videos with eye-tracking data for evaluate dynamic saliency visual models. (Xose) [Before 28/12/19]
CrowdFlow - Optical flow dataset and benchmark for crowd analyt\ ics (Gregory Schroeder, Tobias Senst, Erik Bochinski, Thomas Sikora) [Before 28/12/19]
CVSSP 3D data repository - The datasets are designed to evaluate general multi-view reconstruction algorithms. (Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut and Adrian Hilton) [Before 28/12/19]
California-ND - 701 photos from a personal photo collection, including many challenging real-life non-identical near-duplicates (Vassilios Vonikakis) [Before 28/12/19]
Cambridge Motion-based Segmentation and Recognition Dataset (Brostow, Shotton, Fauqueur, Cipolla) [Before 28/12/19]
Catadioptric camera calibration images (Yalin Bastanlar) [Before 28/12/19]
Chars74K dataset - 74 English and Kannada characters (Teo de Campos - [email protected]) [Before 28/12/19]
Coin Image Dataset - The coin image dataset is a dataset of 60 classes of Roman Republican coins (Sebastian Zambanini, Klaus Vondrovec) [Before 28/12/19]
Columbia Camera Response Functions: Database (DoRF) and Model (EMOR) (M.D. Grossberg and S.K. Nayar) [Before 28/12/19]
Columbia Database of Contaminants' Patterns and Scattering Parameters (Jinwei Gu, Ravi Ramamoorthi, Peter Belhumeur, Shree Nayar) [Before 28/12/19]
Conflict Escalation Resolution (CONFER) Database - 120 audio-visual episodes (~142 mins) of naturalistic interactions from televised political debates, annotated frame-by-frame in terms of real-valued conflict intensity. (Christos Georgakis, Yannis Panagakis, Stefanos Zafeiriou,Maja Pantic) [Before 28/12/19]
COVERAGE - copy-move forged (CMFD) images and their originals with similar but genuine objects (SGOs), which highlight and address tamper detection ambiguity of popular methods, caused by self-similarity within natural images (Wen, Zhu, Subramanian, Ng, Shen, and Winkler) [Before 28/12/19]
Crime Scene Footwear Impression Database - crime scene and reference foorware impression images (Adam Kortylewski) [Before 28/12/19]
Curve tracing database for an automatic grading system. - The ground truth database of 70 public images used to evaluate our method Bandeirantes and other curve tracing methods in an automatic grading system. (Marcos A. Tejada Condori, Paulo A. V. Miranda) [Before 28/12/19]
D-HAZY - : A DATASET TO EVALUATE QUANTITATIVELY DEHAZING ALGORITHMS (Cosmin Ancuti et al.) [Before 28/12/19]
DR(eye)VE - A driver's attention dataset (University of Modena and Reggio Emilia) [Before 28/12/19]
DTU controlled motion and lighting image dataset (135K images) (Henrik Aanaes) [Before 28/12/19]
Database for Visual Eye Movements (DOVES) - A set of eye movements collected from 29 human observers as they viewed 101 natural calibrated images. (van der Linde, I., Rajashekar, U., Bovik, A. C. etc.) [Before 28/12/19]
DeformIt 2.0 - Image Data Augmentation Tool: Simulate novel images with ground truth segmentations from a single image-segmentation pair (Brian Booth and Ghassan Hamarneh) [Before 28/12/19]
Dense outdoor correspondence ground truth datasets, for optical flow and local keypoint evaluation (Christoph Strecha) [Before 28/12/19]
EISATS: .enpeda.. Image Sequence Analysis Test Site (Auckland University Multimedia Imaging Group) [Before 28/12/19]
Featureless object tracking - This dataset contains several videosequences with limited texture, intended for visual tracking, including manually annotated per-frame pose. (Lebeda, Hadfield, Matas, Bowden) [Before 28/12/19]
FlickrLogos-32 - 8240 images of 32 product logos (Stefan Romberg) [Before 28/12/19]
General 100 Dataset - General-100 dataset contains 100 bmp-format images (with no compression), which are well-suited for super-resolution training(Dong, Chao and Loy, Chen Change and Tang, Xiaoou) [Before 28/12/19]
Geometry2view - This dataset contains image pairs for 2-view geometry computation, including manually annotated point coordinates. (Lebeda, Matas, Chum) [Before 28/12/19]
Hannover Region Detector Evaluation Data Set - Feature detector evaluation sequences in multiple image resolutions from 1.5 up to 8 megapixels (Kai Cordes) [Before 28/12/19]
Hillclimb and CubicGlobe datasets - a video of a rally car, separated into several independent shots (for visual tracking and modelling). (Lebeda, Hadfield, Bowden) [Before 28/12/19]
Houston Multimodal Distracted Driving Dataset - 68 volunteers that drove the same simulated highway under four different conditions (Dcosta, Buddharaju, Khatri, and Pavlidis) [Before 28/12/19]
HyperSpectral Salient Object Detection Dataset (HS-SOD Dataset) - Hyperspectral (visible spectrum) image data for benchmarking on salient object detection with a collection of 60 hyperspectral images with their respective ground-truth binary images and representative rendered colour images (rendered in sRGB). (Nevrez Imamoglu, Yu Oishi, Xiaoqiang Zhang, Guanqun Ding, Yuming Fang, Toru Kouyama, Ryosuke Nakamura) [Before 28/12/19]
I3 - Yahoo Flickr Creative Commons 100M - This dataset contains a list of photos and videos. (B. Thomee, D.A. Shamma, G. Friedland et al.) [Before 28/12/19]
ICDAR'15 Smartphone document capture and OCR competition - challenge 2 - pictures of documents captured with smartphones under various conditions of perspective, lighting, etc. The ground truth is the textual content which should be extracted. (Burie, Chazalon, Coustaty, Eskenazi, Luqman, Mehri, Nayef, Ogier, Prum and Rusinol) [Before 28/12/19]
I-HAZE - A dehazing benchmark with real hazy and haze-free indoor images. (ethz) [Before 28/12/19]
Intrinsic Images in the Wild (IIW) - Intrinsic Images in the Wild, is a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes (Sean Bell, Kavita Bala, Noah Snavely) [Before 28/12/19]
IISc - Dissimilarity between Isolated Objects (IISc-DIO) - The dataset has a total of 26,675 perceived dissimilarity measurements made on 269 human subjects using a Visual Search task with a diverse set of objects. (RT Pramod & SP Arun, IISc) [Before 28/12/19]
INRIA feature detector evaluation sequences (Krystian Mikolajczyk) [Before 28/12/19]
Image/video quality assessment database summary (Stefan Winkler) [Before 28/12/19]
INRIA's PERCEPTION's database of images and videos gathered with several synchronized and calibrated cameras (INRIA Rhone-Alpes) [Before 28/12/19]
KITTI dataset for stereo, optical flow and visual odometry (Geiger, Lenz, Urtasun) [Before 28/12/19]
LabelMe images database and online annotation tool (Bryan Russell, Antonio Torralba, Kevin Murphy, William Freeman) [Before 28/12/19]
Large scale 3D point cloud data from terrestrial LiDAR scanning (Andreas Nuechter) [Before 28/12/19]
LFW-10 dataset for learning relative attributes - A dataset of 10,000 pairs of face images with instance-level annotations for 10 attributes. (CVIT, IIIT Hyderabad. ) [Before 28/12/19]
Light-field Material Dataset - 1.2k annotated images of 12 material classes taken with the Lytro ILLUM camera(Ting-Chun Wang, Jun-Yan Zhu, Ebi Hiroaki,Manmohan Chandraker, Alexei Efros, Ravi Ramamoorthi) [Before 28/12/19]
Linkoping Rolling Shutter Rectification Dataset (Per-Erik Forssen and Erik Ringaby) [Before 28/12/19]
LIRIS-ACCEDE Dataset - a collection of video excerpts with a large content diversity annotated along affective dimensions (Technicolor) [Before 28/12/19]
MARIS Portofino dataset - A dataset of underwater stereo images depicting cylindrical pipe objects and collected to test object detection and pose estimation algorithms. (RIMLab (Robotics and Intelligent Machines Laboratory), University of Parma.) [Before 28/12/19]
Materials in Context (MINC) - The Materials in Context Database (MINC) builds on OpenSurfaces, but includes millions of point annotations of material labels. (Sean Bell, Paul Upchurch, Noah Snavely, Kavita Bala) [Before 28/12/19]
MASSVIS (Massive Visualization Dataset) - Over 5K different information visualizations from a variety of sources, a subset of which have been categorized, segmented, and come with memorability and eye tracking recordings. (Borkin, Bylinskii, Kim, Oliva, Pfister) [Before 28/12/19]
MPI Sintel Flow Dataset A data set for the evaluation of optical flow derived from the open source 3D animated short film, Sintel. It has been extended for Stereo and disparity, Depth and camera motion, and Segmentation. (Max Planck Tubingen) [Before 28/12/19]
MPI-Sintel optical flow evaluation dataset (Michael Black) [Before 28/12/19]
MSR-VTT - video to text database of 200K+ video clip/sentence pairs [Before 28/12/19]
Middlebury College stereo vision research datasets (Daniel Scharstein and Richard Szeliski) [Before 28/12/19]
Modelling of 2D Shapes with Ellipses - he dataset contains 4,526 2D shapes included in standard as well as in home-build datasets. (Costas Panagiotakis and Antonis Argyros) [Before 28/12/19]
Multi-FoV - Photo-realistic video sequences that allow benchmarking of the impact of the Field-of-View (FoV) of the camera on various vision tasks. (Zhang, Rebecq, Forster, Scaramuzza) [Before 28/12/19]
Multiview Stereo Evaluation - Each dataset is registered with a "ground-truth" 3D model acquired via a laser scanning process(Steve Seitz et al) [Before 28/12/19]
Multiview stereo images with laser based groundtruth (ESAT-PSI/VISICS,FGAN-FOM,EPFL/IC/ISIM/CVLab) [Before 28/12/19]
Open Video Project (Gary Marchionini, Barbara M. Wildemuth, Gary Geisler, Yaxiao Song) [Before 28/12/19]
NCI Cancer Image Archive - prostate images (National Cancer Institute) [Before 28/12/19]
NIST 3D Interest Point Detection (Helin Dutagaci, Afzal Godil) [Before 28/12/19]
NRCS natural resource/agricultural image database (USDA Natural Resources Conservation Service) [Before 28/12/19]
O-HAZE - A dehazing benchmark with real hazy and haze-free outdoor images. (ethz) [Before 28/12/19]
Object recognition dataset for domain adaptation - Consists of images from 4 different domains: Artistic images, Clip Art, Product images and Real-World images. For each domain, the dataset contains images of 65 object categories found typically in Office and Home settings. (Venkateswara Hemanth, Eusebio Jose, Chakraborty Shayok, Panchanathan Sethuraman) [Before 28/12/19]
Object Removal - Generalized Dynamic Object Removal for Dense Stereo Vision Based Scene Mapping using Synthesised Optical Flow - Evaluation Dataset (Hamilton, O.K., Breckon, Toby P.) [Before 28/12/19]
Occlusion detection test data (Andrew Stein) [Before 28/12/19]
OpenSurfaces - OpenSurfaces consists of tens of thousands of examples of surfaces segmented from consumer photographs of interiors, and annotated with material parameters, texture information, and contextual information . (Kavita Bala et al.) [Before 28/12/19]
OSIE - Object and Semantic Images and Eye-tracking - 700 images, 5551 segmented objects, eye tracking data (Xu, Jiang, Wang, Kankanhalli, Zhao) [Before 28/12/19]
Osnabrück gaze tracking data - 318 video sequences from several different gaze tracking data sets with polygon based object annotation (Schöning, Faion, Heidemann, Krumnack, Gert, Açik, Kietzmann, Heidemann & König) [Before 28/12/19]
OTIS: Open Turbulent Image Set - several sequences (either static or dynamic) of long distance imaging through a turbulent atmosphere (Jerome Gilles, Nicholas B. Ferrante) [Before 28/12/19]
PanoNavi dataset - A panoramic dataset for robot navigation, consisted of 5 videos lasting about 1 hour. (Lingyan Ran) [Before 28/12/19]
PetroSurf3D - 26 high resolution (sub-millimeter accuracy) 3D scans of rock art with pixelwise labeling of petroglyphs for segmentation(Poier, Seidl, Zeppelzauer, Reinbacher, Schaich, Bellandi, Marretta, Bischof) [Before 28/12/19]
PHOS (illumination invariance dataset) - 15 scenes captured under different illumination conditions * 15 images (Vassilios Vonikakis) [Before 28/12/19]
PIRM - perceptual quality of super-resolution benchmark (Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L) [Before 28/12/19]
PittsStereo-RGBNIR - A Large RGB-NIR Stereo Dataset Collected in Pittsburgh with challenging Materials. (Tiancheng Zhi, Bernardo R. Pires, Martial Hebert and Srinivasa G. Narasimha) [Before 28/12/19]
PRINTART: Artistic images of prints of well known paintings, including detail annotations. A benchmark for automatic annotation and retrieval tasks with this database was published at ECCV. (Nuno Miguel Pinho da Silva) [Before 28/12/19]
Pics 'n' Trails - Dataset of Continuously archived GPS and digital photos (Gamhewage Chaminda de Silva) [Before 28/12/19]
Pitt Image and Video Advertisement Understanding - rich annotations encompassing the topic and sentiment of the ads, questions and answers describing what actions the viewer is prompted to take and the reasoning that the ad presents to persuade the viewer (Hussain, Zhang, Zhang, Ye, Thomas, Agha, Ong, Kovashka (University of Pittsburgh)> [Before 28/12/19]
RAWSEEDS SLAM benchmark datasets (Rawseeds Project) [Before 28/12/19]
ROMA (ROad MArkings) : Image database for the evaluation of road markings extraction algorithms (Jean-Philippe Tarel, et al) [Before 28/12/19]
Robotic 3D Scan Repository - 3D point clouds from robotic experiments of scenes (Osnabruck and Jacobs Universities) [Before 28/12/19]
Rolling Shutter Rectification Dataset - Rectifying rolling shutter video from hand-held devices (Per-Erik etc.) [Before 28/12/19]
RRC-60 Roman Republican Coins dataset - contains 6000 images of obverse and corresponding 6000 images of reverse sides of 60 coin types from Roman Republican period (Sinem Aslan) [3/1/20]
SALICON - Saliency in Context eye tracking dataset c. 1000 images with eye-tracking data in 80 image classes. (Jiang, Huang, Duan, Zhao) [Before 28/12/19]
Scripps Plankton Camera System - thousands of images of c. 50 classes of plankton and other small marine objects (Jaffe et al) [Before 28/12/19]
ScriptNet: ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) - The dataset consists of 4782 handwritten pages written by more than 1100 writers anddating from the 13th to 20th century. (Fiel Stefan, Kleber Florian, Diem Markus, Christlein Vincent, Louloudis Georgios, Stamatopoulos Nikos, Gatos Basili) [Before 28/12/19]
Seam Carving JPEG Image Database - Our seam-carving-based forgery database contains 500 untouched JPEG images and 500 JPEG images that were manipulated by seam-carving, both at the quality of 75 (Qingzhong Liu) [Before 28/12/19]
SIDIRE: Synthetic Image Dataset for Illumination Robustness Evaluation - SIDIRE is a freely available image dataset which provides synthetically generated images allowing to investigate the influence of illumination changes on object appearance (Sebastian Zambanini) [Before 28/12/19]
Smartphone document capture and OCR 2015 - Quality Assessment - pictures of documents captured with smartphones under various conditions perspective, lighting, etc. It also features text ground truth and OCR accuracies to train and test document image quality assessment systems. (Nayef, Luqman, Prum, Eskenazi, Chazalon, and Ogier) [Before 28/12/19]
Smartphone document capture and OCR 2017 - mobile video capture - video recording of documents, along with the reference ground truth image to reconstruct using the video stream. (Chazalon, Gomez-Kr??mer, Burie, Coustaty, Eskenazi, Luqman, Nayef, Rusi??ol, Sid??re, and Ogier) [Before 28/12/19]
Stony Brook Univeristy Real-World Clutter Dataset (SBU-RwC90) - Images of different level of clutterness, ranked by humans (Chen-Ping Yu, Dimitris Samaras, Gregory Zelinsky) [Before 28/12/19]
Street-View Change Detection with Deconvolutional Networks - Database with aligned image pairs from street-view imagery with structural,lighting, weather and seasonal changes. (Pablo F. Alcantarilla, Simon Stent, German Ros, Roberto Arroyo and Riccardo Gherardi) [Before 28/12/19]
SydneyHouse - Streetview house images with accurate 3D house shape, facade object label, dense point correspondence, and annotation toolbox. (Hang Chu, Shenlong Wang, Raquel Urtasun,Sanja Fidler) [Before 28/12/19]
SYNTHIA - Large set (~half million) of virtual-world images for training autonomous cars to see. (ADAS Group at Computer Vision Center) [Before 28/12/19]
Stony Brook University Shadow Dataset (SBU-Shadow5k) - Large scale shadow detection dataset from a wide variety of scenes and photo types, with human annotations (Tomas F.Y. Vicente, Le Hou, Chen-Ping Yu, Minh Hoai, Dimitris Samaras) [Before 28/12/19]
Technicolor Interestingness Dataset - a collection of movie excerpts and key-frames and their corresponding ground-truth files based on the classification into interesting and non-interesting samples (Technicolor) [Before 28/12/19]
Technicolor Hannah Dataset - 153,825 frames from the movie "Hannah and her sisters" annotated for several types of audio and visual information (Technicolor) [Before 28/12/19]
Technicolor HR-EEG4EMO Dataset - EEG and other physiological recordings of 40 subjects collected during the viewing of neutral and emotional videos (Technicolor) [Before 28/12/19]
Technicolor VSD Violent Scenes Dataset - a collection of ground-truth files based on the extraction of violent events in movies (Technicolor) [Before 28/12/19]
TMAGIC dataset - Several videosequences for visual tracking, containing strong out-of-plane rotation(Lebeda, Hadfield, Bowden) [Before 28/12/19]
Totally Looks Like - A benchmark for assessment of predicting human-based image similarity (Amir Rosenfeld, Markus D. Solbach, John Tsotsos) [Before 28/12/19]
Toulouse Vanishing Points Dataset - a dataset of Manhattan scenes for vanishing point estimation which also provide, for each image, the IMU data of the camera orientation. (Vincent Angladon and Simone Gasparini) [Before 28/12/19]
TUM RGB-D Benchmark - Dataset and benchmark for the evaluation of RGB-D visual odometry and SLAM algorithms (BCrgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard and Daniel Cremers) [Before 28/12/19]
UCL Ground Truth Optical Flow Dataset (Oisin Mac Aodha) [Before 28/12/19]
Underwater Single Image Color Restoration - A dataset of forward-looking underwater images, enabling a quantitative evaluation of color restoration using color charts at different distances and ground truth distances using stereo imaging. (Berman, Levy, Avidan, Treibitz) [Before 28/12/19]
Univ of Genoa Datasets for disparity and optic flow evaluation (Manuela Chessa) [Before 28/12/19]
Validation and Verification of Neural Network Systems (Francesco Vivarelli) [Before 28/12/19]
Very Long Baseline Interferometry Image Reconstruction Dataset (MIT CSAIL) [Before 28/12/19]
Virtual KITTI - 40 high-resolution videos (17,008 frames) generated from five different virtual worlds, for : object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation (Gaidon, Wang, Cabon, Vig) [Before 28/12/19]
Visual Object Tracking challenge - This challenge is held annually as an ICCV/ECCV workshop, with a new dataset and an updated evaluation kit every year. (Kristan et al.) [Before 28/12/19]
WHOI-Plankton - 3.5 million images of microscopic marine plankton on 103 categories (Olson, Sosik) [Before 28/12/19]
WILD: Weather and Illumunation Database (S. Narasimhan, C. Wang. S. Nayar, D. Stolyarov, K. Garg, Y. Schechner, H. Peri) [Before 28/12/19]
YACCLAB dataset - YACCLAB dataset includes both synthetic and real binary images(Grana, Costantino; Bolelli, Federico; Baraldi, Lorenzo; Vezzani, Roberto) [Before 28/12/19]
YtLongTrack - This dataset contains two video sequences with challenges such as low quality, extreme length and full occlusions, including manually annotated per-frame pose. (Lebeda, Hadfield, Matas, Bowden) [Before 28/12/19]

英国老鼠_

发布了70 篇原创文章 · 获赞 17 · 访问量 4万+

私信关注

计算机视觉数据集大全 - Part2

Hand, Hand Grasp, Hand Action and Gesture Databases

Image, Video and Shape Database Retrieval

Object Databases

People (static and dynamic), human body pose

People Detection and Tracking Databases

Remote Sensing

Robotics

Scenes or Places, Scene Segmentation or Classification

Segmentation (General)

Simultaneous Localization and Mapping

Surveillance and Tracking

Textures

Urban Datasets

Vision and Natural Language

Other Collections

Miscellaneous

猜你喜欢