Image recognition so easy丨Interpretation of classic image classification dataset dataset CIFAR-10

Today, I would like to introduce a classic image classification data set - CIFAR-10 , which is widely used in computer vision algorithm benchmarking in the field of machine learning. Although after more than 10 years of development, the identification problem of this data set has been "solved", and many models can easily achieve a classification accuracy of 80%. The classification accuracy of the deep learning convolutional neural network on the test data set can also be in the More than 90%, with good performance. But it is still a good choice for many beginners, let's take a look.

Table of contents

1. Dataset Introduction

2. Dataset details

3. Dataset task definition and introduction

4. Interpretation of data set file structure

5. Dataset download


1. Dataset Introduction

Publisher: University of Toronto Computer Science

Release time: 2009

background:

The essence of CIFAR-10 is a part called [the 80 million tiny images dataset] ("80 million small images" dataset), which is a subset of the dataset. Because the data involved some controversial content, it has been removed from the shelves.

Introduction:

CIFAR10 is a small dataset for universal object recognition compiled by Hinton's students Alex Krizhevsky and Ilya Sutskever. The CIFAR-10 dataset contains 60000 32x32 color images divided into 10 classes with 6000 images per class.

2. Dataset details

1. Label data volume

Training set: 50,000 images

Test set: 10000 images

2. Labeling category

The dataset has a total of 10 categories. The specific classification is shown in Figure 1.

3. Visualization

(figure 1)

3. Dataset task definition and introduction

1.  Image Classification

● Task definition

Image classification is a pattern recognition method for classifying different images based on semantic information in the field of computer vision.

●  Evaluation indicators

Accuracy

n_correct/n_total, the proportion of samples with correct label predictions to all samples.

Precision of a certain category :

TP/(TP+FP), among the samples predicted as this category, how many samples are predicted correctly.

Recall of a certain category :

TP/(TP+FN), in the samples of this category, how many samples are predicted correctly.

Note: In the above evaluation indicators, TP stands for True Positive, FP stands for False Positive, FN stands for False Negative, n_correct stands for the number of all predicted samples, and n_total stands for the number of all samples.

4. Interpretation of data set file structure

1. Dataset directory structure

dataset_root/├── batches.meta            #记录分类信息的元文件├── data_batch_1            #训练集1├── data_batch_2            #训练集2├── data_batch_3            #训练集3├── data_batch_4            #训练集4├── data_batch_5            #训练集5├── readme.html             #README文件└── test_batch              #测试集文件

2. Annotation file format

Since each picture in the data set is a 32X32 RGB picture, the data set does not directly store the picture file, but stores all the pictures in each data set through a numpy two-dimensional array, and records the corresponding file name. 

Using the unload code provided by the official website, the data in files such as data_batch_1 can be parsed.

python3:

def unpickle(file):import picklewithopen(file, 'rb') as fo:dict = pickle.load(fo, encoding='bytes')return dict

python2:

def unpickle(file):import cPicklewithopen(file, 'rb') as fo:dict = cPickle.load(fo)return dict

After parsing, the obtained dictionary is divided into four parts, one is the number of the training set, the other is the classification number of each picture, the third is the numpy array composed of all pictures, and the fourth is the list of file names.

In the label list, the classification category corresponding to each value:

In the data list, the information of all pictures in this batch is stored. Array is a 10000X3072 two-dimensional numpy array, each row stores the RGB information of a picture. In the one-dimensional array of each row, the first 1024 data records the R channel information of the picture, the middle 1024 data records the G channel information of the picture, and the last 1024 data record the B channel information.

Among the 1024 data of each channel information, 32 are used as a group, and each group records the value of 32 pixel points of each row of the picture. The first set of 32 values ​​records the pixel value of the first row of the picture, the second set of 32 values ​​records the pixel value of the second line of the picture, and so on.

3. Meta information format

The batches.meta file records the values ​​of the categorical variables and the comparison information of the corresponding categories. After parsing the provided code, the result of the dictionary is:

5. Dataset download

The OpenDataLab platform provides you with complete data set information of the CIFAR-10 data set, intuitive data distribution statistics, smooth download speed, and convenient visual scripts. Welcome to experience. Click on the original link to view it.

https://opendatalab.org.cn/CIFAR-10

References

[1] Official website: http://www.cs.toronto.edu/~kriz/cifar.html

[2] Dataset download: http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

Author丨Du Kunming

There is a wise man, all things prosper

- End -

The above is this sharing, more exciting dry goods of data sets, not to be missed. If there is anything else you want to see, come and tell the little assistant. More data sets are on the shelves, more comprehensive data set content interpretation, the most powerful online Q&A, the most active circle of peers... Welcome to add WeChat opendatalab_yunying to join the OpenDataLab official communication group.

Guess you like

Origin blog.csdn.net/OpenDataLab/article/details/127787645