Research paper on fine-grained image classification-2011

The Caltech-USCD Birds-200-2011 Dataset

Abstract

It is an expanded version of CUB-200 (expanded the number of pictures, added location labels [bounding boxes, part locations, attribute labels]).

Dataset specification and collection

content describe
Bird Species Contains 11788 images, a total of 200 bird species
Bounding Boxes -
Attributes Contains 28 attribute groups, a total of 312 binary attributes (one color has multiple options)
Part Locations Contains 15 local parts (pixel position and visibility)

application

Attributes describe
Subcategory Identification Classification is often unsuccessful due to the high visual similarity of the categories. Studying subcategories helps improve judgment and orientation
Multi-category object detection and locality-based methods Provides local information for local-based methods, while providing a richer category for object detection tasks
attribute-based approach Provides attribute information for attribute-based methods, and also provides relevant local location information
Crowdsourcing and User Research Because it is marked by the public, there may be some errors

It can be seen that this data set serves fine-grained image analysis tasks, and the main features are rich in types, while providing attributes and local features.

The experiments carried out in the paper are:

  1. Given a local location, classify the image;
  2. Given an original image, classify the image;
  3. Given a raw image, predict location and part visibility.

Novel Dataset for Fine-Grained Image Categorization:Stanford Dogs

Introduction

Contains more than 22,000 annotated images in 120 categories.

Each image uses a bounding box and object class label .

In addition to the problem of large intra-class variance and small inter-class variance between images, compared with other data sets, this data set contains more humans and artificial environments, which makes the background more different.

Comparison with other datasets

Please add a picture description
The variety is rich, and the samples of each category are also sufficient.

training and testing

In each category, 100 images are taken as the training set, and the rest are used as the test set.

Combining Randomization and Discrimination for Fine-Grained Image Categorization

Abstract

The purpose of the method in this paper is to explore the statistics of fine-grained images and detect discriminative image patches for recognition.

To achieve this goal, two means of discriminative feature mining and randomization are applied.

Discriminative feature mining can model discriminative details, while randomization can solve large feature spaces and prevent overfitting.

This paper proposes a random forest based on the discriminant tree algorithm, where each node is a classifier. It is worth mentioning that the training of this classifier is carried out together with the nodes of the upper layer.

Introduction

This article classifies by finding some discriminative image blocks (similar to finding differences), but without the support of feature selection, it will inevitably bring a huge number of image blocks. This paper proposes the use of randomization to address this issue.

This paper proposes a random forest with discriminative decision trees algorithm to discover discriminative image patches and image patch pairs. Our algorithm applies a powerful classifier at each node position and combines information at different depths of the tree to efficiently mine very dense sampling spaces .

The method proposed in this paper significantly improves the ability of decision trees in random forests while maintaining low correlation between trees. This property enables our method to achieve very low generalization error.

The experimental results of the method proposed in this paper are:

  1. Achieved SOTA on two datasets;
  2. The detected image patches are semantically meaningful;
  3. The generated image region structure is from coarse to fine, comparable to the human visual system.

Dense Sampling Space

The purpose of our algorithm is to identify fine-grained image statistics that are useful for classification.

The key is to find some special image blocks. The algorithm in this paper can do this by searching for some random rectangular boxes. It can be seen that the length and width combinations of sampling points and samples are very rich.

Please add a picture description

This paper refers to a wide range of image regions as a densely sampled space. Furthermore, in order to capture more discriminative features, we consider the interaction between image patch pairs . This pairwise interaction is achieved by applying concatenation, absolute difference or intersection.

However, the dense sampling space is large. Not only because of the location and length and width of sampling, but also because of the pairwise interaction between image blocks that exacerbates the spatial size.

In addition, there is a lot of noise and redundancy in the feature set. On the one hand, many image patches are not discriminative; on the other hand, the sampled image patches are highly overlapping.

Random Forest with Discriminative Decision Trees

Two goals are proposed:

  1. Efficiently extract information from image patches through discriminative training;
  2. Efficient exploration of dense feature spaces through randomization.

Specifically, this paper adopts a random forest structure, where each node is a discriminative classifier, trained with one or a pair of image patches.

In our setting discriminative training and randomization can benefit from each other. The advantages of our approach are:

  1. Random forest structures allow us to consider subsets of image regions, allowing us to efficiently explore densely sampled spaces;
  2. Random Forest selects the best image patch in each node, so it is able to eliminate noisy image patches and reduce redundancy in the feature set;
  3. By using discriminative classifiers to train tree nodes, our random forests have powerful discriminative trees. This allows our method to have a smaller generalization error.

The Random Forest Framework

Each tree returns a posterior probability of a sample being in a given class. We define the posterior probability of category c on leaf l of tree t as c ∗ = arg max ⁡ c 1 TP t , lt ( c ) c^*=\argmax_c\frac{1}{T}P_{t, l_t}(c)c=argmaxcT1Pt,lt(c) l t l_t ltIndicates the leaf node the image falls into.

Please add a picture description

Sampling the Dense Feature Space

Each internal node in the decision tree corresponds to one or a pair of rectangular image regions sampled from the densely sampled space. To sample candidate image regions, first normalize all images to unit width and height, and then sample diagonal locations from a 0-1 uniform distribution.

Each sampled image region is represented by a visually described histogram. For a pair of regions, a feature representation is formed by performing a histogram operation on the histograms obtained from the two regions.

Additionally, the image is augmented with the decision value of the image's parent node. Therefore, our feature representation combines information from all upstream nodes of the corresponding image.

Learning the Splits

Binary segmentation using SVM consists of two steps, namely:

  1. Randomly assign all samples from each class to a binary label;
  2. Use SVM to learn a binary segmentation of data.

Suppose there is an image of class C on a given node.

Each node performs a binary split on the data, which allows us to learn a simple binary SVM on each node.

Using the feature representation f of the image region, we get the corresponding binary splitting method:
Please add a picture description

where w is the weights learned from the linear SVM. The benchmark of segmentation is still the size of information gain.

Guess you like

Origin blog.csdn.net/weixin_46365033/article/details/127590575