[Computer Vision | Image Classification] Commonly used data sets for image classification and their introduction (1)

1. CIFAR-10

The CIFAR-10 dataset (Canadian Institute for Advanced Study, 10 categories) is a subset of the Tiny Images dataset and consists of 60,000 32x32 color images. The images were labeled with one of 10 mutually exclusive categories: airplane, car (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, boat, and truck (but not pickup truck). There are 6000 images per class, 5000 training images and 1000 testing images per class.

The criteria for determining whether an image belongs to a certain class are as follows:

The class name should be high on the list of possible answers to the question "What's in this picture?"
Images should be realistic. Labelers were instructed to reject line drawings.
The image should contain only one prominent instance of the object referenced by the class. As long as the identity of the object remains clear to the tagger, the object may be partially obscured or seen from unusual angles.

Insert image description here

2. ImageNet

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010, this dataset has been used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark for image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, but manual annotations are retained. ILSVRC annotations fall into one of two categories: (1) image-level annotations, which are binary labels that indicate whether an object class is present in the image, such as "There are cars in this image" but "There are no tigers," and (2) objects in the image. Object-level annotations with tight bounding boxes and class labels around the instance, for example, "There is a screwdriver centered at (20,25) with a width of 50 pixels and a height of 30 pixels." The ImageNet project does not own the copyright to the images and therefore only provides thumbnails and URLs of the images.

Total number of non-empty WordNet synsets: 21841
Total number of images: 14197122
Number of images with bounding box annotations: 1,034,908
Number of synsets with SIFT features: 1000
Number of images with SIFT features: 1.2 million

Insert image description here

3. MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples and a test set of 10,000 examples. It is a subset of the larger NIST Special Database 3 (Numbers Written by U.S. Census Bureau Employees) and Special Database 1 (Numbers Written by High School Students), which contains monochrome images of handwritten digits. The figures have been dimensionally normalized and centered on the fixed-size image. The original black and white (dual layer) image from NIST is size normalized to fit a 20x20 pixel box while preserving its aspect ratio. The resulting image contains grayscale levels due to the anti-aliasing techniques used by the normalization algorithm. The image is centered in the 28x28 image by calculating the center of mass of the pixel and translating the image to position that point at the center of the 28x28 field.

Insert image description here

4. CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Study, 100 categories) is a subset of the Tiny Images dataset and consists of 60,000 32x32 color images. The 100 classes in CIFAR-100 are divided into 20 superclasses. There are 600 images in each category. Each image has a "fine" label (the class it belongs to) and a "coarse" label (the superclass it belongs to). There are 500 training images and 100 testing images for each category.

The criteria for determining whether an image belongs to a certain class are as follows:

The class name should be high on the list of possible answers to the question "What's in this picture?"
Images should be realistic. Labelers were instructed to reject line drawings.
The image should contain only one prominent instance of the object referenced by the class.
As long as the identity of the object remains clear to the tagger, the object may be partially obscured or seen from unusual angles.

Insert image description here

五、SVHN (Street View House Numbers)

Street View House Numbers (SVHN) is a digit classification benchmark dataset containing 600,000 32×32 RGB printed digits (from 0 to 9) images cropped from house license pictures. The cropped image is centered on the number of interest, but nearby numbers and other distractors remain in the image. SVHN has three sets: a training set, a test set, and an additional set of 530,000 images that are less difficult and can be used to aid the training process.

Insert image description here

六、CelebA (CelebFaces Attributes Dataset)

The CelebFaces Attributes dataset contains 202,599 face images of size 178×218 from 10,177 celebrities, each image is annotated with 40 binary labels indicating facial attributes such as hair color, gender, and age.

Insert image description here

7. Fashion-MNIST

Fashion-MNIST is a dataset consisting of 28×28 grayscale images of 70,000 fashion products in 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST shares the same image size, data format, and structure of training and test splits as original MNIST.

Insert image description here

8. CUB-200-2011 (Caltech-UCSD Birds-200-2011)

The Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset is the most widely used dataset for fine-grained visual classification tasks. It contains 11,788 images belonging to 200 subcategories of birds, of which 5,994 are used for training and 5,794 for testing. Each image has detailed annotations: 1 subcategory label, 15 part locations, 312 binary attributes, and 1 bounding box. Text information comes from Reed et al. They extended the CUB-200-2011 dataset by collecting fine-grained natural language descriptions. Ten single-sentence descriptions were collected for each image. Natural language descriptions are collected through the Amazon Mechanical Turk (AMT) platform, require at least 10 words, and do not contain any subcategory and operational information.

Insert image description here

9. Places

The Places dataset is designed for scene recognition and contains over 2.5 million images covering over 205 scene categories with over 5,000 images per category.

Insert image description here

十、STL-10 (Self-Taught Learning 10)

STL-10 is an image dataset derived from ImageNet and is widely used to evaluate unsupervised feature learning or self-taught learning algorithms. In addition to 100,000 unlabeled images, it contains 13,000 labeled images from 10 object classes (e.g., birds, cats, trucks), of which 5,000 images are divided for training and the remaining 8,000 images for testing. All images are color images of size 96×96 pixels.

Insert image description here

Guess you like

Origin blog.csdn.net/wzk4869/article/details/133106003