Read the paper (six) Prototypical Networks for Few-shot Learning

1. Summary

We present a prototype network for small sample classification issues, a classification must focus on new induction class is no training, it gives only a small part of each new class of examples. Prototype learning a metric space, in the space, by calculating the distance to each class prototype representation, classification may be performed. Compared with the recent small sample of learning, they reflect a more simple induction bias in favor of this limited data system, and achieved good results. We offer an analysis, show some simple design decisions can produce substantial improvement over recent method involves complex architectural choices and meta-learning. We will further expand the network to zero prototype sample learning, and to achieve the best results on the CU-Birds dataset.

2. The main idea

(1)Our approach, prototypical networks, is based on the idea that there exists an embedding in which points cluster around a single prototype representation for each class. In order to do this, we learn a non-linear mapping of the input into an embedding space using a neural network and take a class’s prototype to be the mean of its support set in the embedding space. Classification is then performed for an embedded query point by simply finding the nearest class prototype. We follow the same approach to tackle zero-shot learning; here each class comes with meta-data giving a high-level description of the class rather than a small number of labeled examples. We therefore learn an embedding of the meta-data into a shared space to serve as the prototype for each class.

Our approach, prototype network, is based on the idea that there is an embedded, wherein the individual points around each class represents the prototype cluster. To do this, we use a neural network learning is input to the nonlinear mapping of the embedding space, and the prototype of the class as an average value in the set support the embedding space. Then, by simply find the nearest class prototype, embedded query points classification. We use the same approach to learning process of the zero samples; here, each class has metadata, providing a high level description of the class, not a small amount of sample with a label. Therefore, we learn to embed metadata into a shared space, as a prototype of each class.

(2) In particular, we relate prototypical networks to clustering in order to justify the use of class means as prototypes when distances are computed with a Bregman divergence, such as squared Euclidean distance. We find empirically that the choice of distance is vital, as Euclidean distance greatly outperforms the more commonly used cosine similarity.
in particular, we prototype cluster networks with up to prove Bregman divergence when in use (e.g., the square Euclidean distance) is calculated from the prototype method is used as the class It is reasonable. We have found from experience, choose from is crucial, because the Euclidean distance is much better than the more commonly used cosine similarity.

(3) through a neural network learned "good" mapping, each sample is projected into the same space, for each type of sample extraction their center point (mean) as a prototype (prototype). Euclidean distance is used as a measure of distance, so that the test sample training to distance themselves prototype category as close as possible to the prototype from other categories as far as possible. Test, test sample was obtained from the prototype category by each class to do sofmax. (reference)

3. Methods

Briefly several training process:
(1) randomly selected sample of K in all the categories in the category
(2) in the sample of K class, selected at random for each category support set and query set, each of the M
(3) characterized by mapping the network CNN, and eigenvectors are computed for each sample as the average class class prototype, the FIG c (k) is the class prototype.
(4) calculate eigenvectors quert set the K class prototype distances, and by softmax normalization, Category quert set probability distribution obtained.

4. Summary

In this paper, the Prototypical Networks (P-net) thought and match network (M-net) is very similar, but there are a few differences: 1. Use a different distance metric, M-net is cosine measure distance, P -net Euclidean distance is used belongs to Bregman divergence (see thesis). 2. In both scenarios different few-shot, one-shot at the equivalent (prototype taken when one-shot is set to support the sample) 3 on the network structure, P-net as compared to encoding the M-net layers and layer classification unity, less parameters, training more convenient.

5. supplement

On paper Bregman divergence of understanding, reference may be known on almost a big brother to answer: how to understand Bregman divergence? - Tan with chapter answer - know almost
https://www.zhihu.com/question/22426561/answer/209945856
contact paper simply put, is the Bregman divergence, that is, if you define a two abstract point of a particular space "distance" between, and in any case satisfy the probability distribution at these points, the average of these points (mean point) must be the average distance of these points from the space minimum point. So back in the extraction of "prototype" of the time, to be able to directly use the sample mean (mean point), so it feels instead of using the Euclidean distance cosine distance it is for this reason.

Reference material

[1] prototypical Networks for Few-SHOT Learning Articles
[2] [field] progress report annual study of small sample | VALSE2018
[3] When the small sample met Learning machine learning fewshot
[4] small sample learning (Few-shot Learning) Summary
[5] small sample learning (few-shot learning) - A prototype network (prototypical networks) Yes

Code

[1] # jakesnell/prototypical-networks

Reproduced in: https: //www.jianshu.com/p/1eaccf6d6f2d

Guess you like

Origin blog.csdn.net/weixin_34279061/article/details/91063771