Common data sets for transfer learning

Table of contents

Office-31

PACS

Office-Caltech10

MNIST+USPS


Common data sets for transfer learning

Office-31

Office-31 Dataset is Office Dataset which is the mainstream benchmark data set in visual transfer learning. This data set contains 31 common target objects in office environments, such as laptops, filing cabinets, keyboards, etc., in total 4652 images.

These images are mainly fromAmazon (online e-commerce images), Webcam (low-resolution image taken by webcam), DSLR (high-resolution image taken by DSLR camera).

The data set includes:

Amazon:2817 images, average per category 90 pictures, single image background

Webcam:795 images showing significant noise, color and white balance artifacts

DSLR:498 images, 5 per category objects, each photographed from different viewpoints 3 times times

Example image:

PACS

PACS, the data set is a domain-adaptive image data set, including 4 domains, photos (1670 images), art paintings (2048 images), cartoons (2344 images) and sketches (3929 images). Each domain contains 7 categories.

PACS data set division:

Training set: 8977 pictures

Test set: 1014 pictures

Validation set: 9991 pictures

Office-Caltech10

The first group of this website:transferlearning/data at master · jindongwang/transferlearning · GitHub

Office-Caltech-10 collectionwith inclusion2533book, inclusion (, A(Amazon< /span>、Number collection provided< /span>Special Expedition< /span>. DeCAFJapanSpecial ExpeditionSURF157YesD个、295YesW个、958YesA个、1123 YesC, among them)A high-resolution image with a contrasting mechanismD(DSLR) 网络摄像头 Beat 摄's low-resolution image fragmentW(Webcam,)Wireless electronic business pageC(Caltech))Four types of number setting,C A W D

Collection of articles10Key objects, isOffice-31sumCaltech-256numerical concentration homologous category:”backpack“,”bike“,”calculator“,”mouse “. projector“,”mug“,”“,”monitor“,”laptop computer“,”keyboard“,”headphones

Office-Caltech10SURFSpecial Expedition DeCAFSpecial Expedition:

Office-Caltech10 is a widely used image classification dataset, containing 10 different Object categories, of which 5 are from the Office data set, and a>features and features are commonly used feature extraction methods. DeCAFSURFdataset, Office-Caltech10 data set. On theCaltech from5

SURFSpecial Expedition:

The SURF (Speeded Up Robust Features) feature is a local feature based on scale space. It detects stable feature points in images by constructing Gaussian pyramids and performs descriptor calculations on these feature points. On the Office-Caltech10 data set, the SURF algorithm can be used to extract each SURFfeatures of an image.

For the detectedSURFkeypoints in each image, SURF< a i=4>The algorithm calculates the Haarwavelet response of its surrounding area and uses these responses to calculate SURF< /span> dimensional vector containing The scale, direction and difference information of the key points and surrounding pixels are obtained. 64descriptor is aSURF descriptor. Each

UseSURFThe features extracted by the algorithm can be used to represent low-level features such as texture and shape in the image. These features are important in object classification tasks. Have better performance.

DeCAFSpecial Expedition:

The DeCAF (Deep Convolutional Activation Features) feature is a feature based on a convolutional neural network Extraction method, which uses a pre-trained CNN model (such as AlexNet) Perform forward propagation on each image to obtain a set of high-dimensional feature vector representations. On the Office-Caltech10 dataset, the pretrained AlexNet can be used model to extract theDeCAFfeatures of each image. AlexNetThe model contains 5 convolutional layers and < a i=18>3 fully connected layers, and finally output a 1000 dimensional vector representing the classification probability distribution of the image .

For each image, you can useAlexNetthe first part of the model8layer performs forward propagation on it, and extracts the feature vector of the 8 layer as DeCAF a> dimensions and can be used to represent the semantic information of the image, such as objects, scenes, etc. contained in the image. 4096 features. This feature vector usually has

UseDeCAF feature extraction method to obtain higher-level feature representation, because it is a feature extraction method based on deep learning and can automatically Learn feature representations of images. This feature performs well in object classification tasks, especially when faced with complex image scenes.

Feature representation:

The feature point descriptor obtained by the Surf algorithm is a fixed-length vector, usually 64 dimensions. These descriptors can be used to represent low-level features such as texture and shape of the image.

The feature vector obtained by DeCAF is a high-dimensional vector, usually having 4096 dimensions. This vector can be used to represent the semantic information of the image, such as the objects, scenes, etc. contained in the image.

Applicable scene:

The Surf algorithm is suitable for scenarios where stable feature points in images need to be quickly detected, such as target tracking, image splicing, etc.

The DeCAF algorithm is suitable for scenarios that require advanced semantic analysis of images, such as image classification, object detection, image search, etc.

Overall, Surf and DeCAF are Two different feature extraction algorithms are suitable for different application scenarios. Surf is more suitable for low-level image processing tasks, while DeCAF It is more suitable for high-level image semantic analysis tasks.

MNIST+USPS

Hand-photographed numbers with different numbers, all machinesMnistnumbers with sumUSPS A number of selections. Mnist每张图为28*28Big and small, all together, common类numerals. 10张图,2000016*16number collection image piece size and smallUsps类numerals. 10张图片,70000

The download website for data is:sam roweis: data

Reference address of other data sets:Zhihu Portal

Guess you like

Origin blog.csdn.net/m0_55196097/article/details/130059285