论文阅读（六）Prototypical Networks for Few-shot Learning

1.摘要

我们针对少样本分类问题提出了原型网络，其中一个分类器必须归纳为训练集中没有的新类，只给出了每个新类的一小部分示例。原型网络学习一个度量空间，在该空间中，通过计算到每个类的原型表示的距离，可以执行分类。与最近的少量样本学习方法相比，它们反映了一种更简单的归纳偏见，有利于这种有限的数据体制，并取得了良好的结果。我们提供了一个分析，表明一些简单的设计决策可以比最近涉及复杂架构选择和元学习的方法产生实质性的改进。我们进一步将原型网络扩展到零样本学习，并在CU-Birds数据集上实现目前最好的结果。

2.主要思想

（1）Our approach, prototypical networks, is based on the idea that there exists an embedding in which points cluster around a single prototype representation for each class. In order to do this, we learn a non-linear mapping of the input into an embedding space using a neural network and take a class’s prototype to be the mean of its support set in the embedding space. Classification is then performed for an embedded query point by simply finding the nearest class prototype. We follow the same approach to tackle zero-shot learning; here each class comes with meta-data giving a high-level description of the class rather than a small number of labeled examples. We therefore learn an embedding of the meta-data into a shared space to serve as the prototype for each class.

我们的方法，原型网络，是基于这样一个想法，即存在一个嵌入，其中点围绕着每个类的单个原型表示进行聚类。为了做到这一点，我们使用神经网络学习了一个输入到嵌入空间的非线性映射，并将类的原型作为其在嵌入空间中支持集的平均值。然后，通过简单地查找最近的类原型，对嵌入的查询点进行分类。我们采用相同的方法来处理零样本学习；在这里，每个类都有元数据，提供对类的高级描述，而不是少量带标签的示例。因此，我们学习将元数据嵌入到共享空间中，作为每个类的原型.

（2）In particular, we relate prototypical networks to clustering in order to justify the use of class means as prototypes when distances are computed with a Bregman divergence,such as squared Euclidean distance. We find empirically that the choice of distance is vital, as Euclidean distance greatly outperforms the more commonly used cosine similarity.
特别是，我们将原型网络与聚类联系起来，以证明当使用Bregman散度（如平方欧几里得距离）计算距离时，将类方法用作原型是合理的。我们从经验上发现，距离的选择是至关重要的，因为欧几里得距离大大优于更常用的余弦相似性。

（3）通过神经网络学会一个“好的”映射，将各个样本投影到同一空间中，对于每种类型的样本提取他们的中心点(mean)作为原型（prototype）。使用欧几里得距离作为距离度量，训练使得测试样本到自己类别原型的距离越近越好，到其他类别原型的距离越远越好。测试时，通过对到每类原型的距离做sofmax获得测试样本类别。（参考）

3.方法

简述一下几个训练过程：
（1）在所有的类别中随机选择K个类别的样本
（2）在K个类别的样本中，随机选择出每个类别的support set和query set，各M个
（3）通过CNN网络进行特征映射，并计算每个类的样本的特征向量平均值作为类原型，下图c（k）就是类原型。
（4）计算quert set 的特征向量与K个类原型的距离，并通过softmax归一化，得到quert set 所属类别的概率分布。

4. 总结

本文提出的的Prototypical Networks（P-net）思想与match network（M-net）十分相似，但也有几个不同点：1.使用了不同的距离度量方式，M-net中是cosine度量距离，P-net中使用的是属于布雷格曼散度（详见论文）的欧几里得距离。2.二者在few-shot的场景下不同，在one-shot时等价（one-shot时取得的原型就是支持集中的样本）3.网络结构上，P-net相比M-net将编码层和分类层合一，参数更少，训练更加方便。

5. 补充

关于论文中Bregman divergence的理解，可参考知乎上一个大佬的回答：如何理解Bregman divergence？ - 覃含章的回答 - 知乎
https://www.zhihu.com/question/22426561/answer/209945856
联系论文简单地说，就是Bregman divergence，就是如果你抽象地定义一种在特定空间里两个点之间的“距离”，然后在这些点满足任意的概率分布的情况下，这些点的平均值点（mean point）一定是空间中距离这些点的平均距离最小的点。所以后面作者在提取“原型”的时候，才能够直接使用均值点（mean point），所以感觉作者用欧几里得距离而不用余弦距离就是这个原因吧。

参考资料

[1] Prototypical Networks for Few-shot Learning论文下载
[2] 【领域报告】小样本学习年度进展|VALSE2018
[3] 当小样本遇上机器学习 fewshot learning
[4] 小样本学习（Few-shot Learning）综述
[5] 小样本学习（few-shot learning）之——原形网络（Prototypical Networks）不错

代码

[1] # jakesnell/prototypical-networks

转载于:https://www.jianshu.com/p/1eaccf6d6f2d