[Notes] paper read | Matching networks for one shot learning

  • 论文信息:Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning[C]//Advances in neural information processing systems. 2016: 3630-3638.
  • Hirofumi author: Veagau
  • Edit Time: 2020, January 7

This article is NIPS 2016 conference paper, the authors from Google's DeepMind. In the paper the authors propose a combined measure of learning (Metric Learning) and memory enhancement neural network (Memory Augment Neural Networks) new neural network architecture - Matching Networks (matching network) . This network using the mechanisms of attention and memory mechanisms of accelerated learning, to achieve a label prediction unlabeled samples under conditions providing only a small sample.

Matching Networks schematic network architecture as follows:

For a given set support \ (S \) , predicted the new sample data \ (\ hat {x} \ ) label \ (\ hat {y} \ ) probability equation can be expressed as:
\ [P \ left (\ hat {y} | \ hat {
x}, S \ right) = \ sum_ {i = 1} ^ {k} a \ left (\ hat {x}, x_i \ right) y_i \] where \ (K \) represents the number of samples categories supports centralized, \ (a \ left (\ Hat {X}, x_i \ right) \) is the focus calculation formula for calculating cosine similarity based on the sample data and embed a new set of support sample data representation and a softmax function, expression is as follows:
\ [A \ left (\ Hat {X}, x_i \ right) = \ {E ^ {FRAC C (F (\ Hat {X}), G (x_i))}} { \ sum_ {j = 1} ^
{k} {e ^ {c (f (\ hat {x}), g (x_j))}}} \] where \ (C \) represents the cosine similarity calculation, \ ( F \) and \ (G \) represents a new sample applied on the sample support set embedding function (embedding function).

In order to enhance specimen embedded matching, the author in the text also proposes Full Context Embeeding (full text embedded, abbreviated FCE) method - support is embedded in each sample should be independent of each other, and embed a new set of samples should be supported regulation of the distribution of sample data, the embedding process need to be placed under the support of the entire set of environments, the authors adopt LSTM network with reading attention to the new sample is embedded, the final results show that the introduction of the performance of FCE's Matching network It has been significantly improved.

In addition to the new network architecture presented, the authors also ImageNet data sets were further processing, made suitable for Few-shot Learning scenario mini-ImageNet data sets - data extracted from 100 types ImageNet data set, each category 600 samples, became the following Omniglot data set after the second FSL standard data set.

This paper follows in the experimental design of Train-the Test for condition Condition Match (training - the same test conditions) principle - the task is set to the task of training should be when the actual test setup is consistent, is a small sample learning an important guiding principle experiment, model generalization error can be reduced to improve the robustness of the model.

Guess you like

Origin www.cnblogs.com/veagau/p/12164335.html