Python Meta-Learning - Implementation of General Artificial Intelligence Chapter 3 Reading Notes

Code of this book: https://github.com/sudharsan13296/Hands-On-Meta-Learning-With-Python
This book’s ISBN number: 9787115539670

Insert image description here

Chapter 1: Meta-learning

Chapter 2: Metric-based single-sample learning algorithm—Twin Network

Chapter 3: Prototype Networks and Variants

3.1 Prototype network

Like Siamese networks, prototype networks attempt to learn a metric space for classification. The basic idea of ​​the prototype network is to create a prototype representation of each class andClassify query points (new points) based on the distance between the class prototype and the query point

Suppose we have an image consisting of a lion, an elephant, and a dogSupport set (a small data set that provides the query), we have 3 classes: {Lion, Elephant, Dog}. (多分类的情况)
We create a prototype representation for each of these 3 classes and use a convolutional network to extract features for each data point.
Insert image description here
In the above example, there are three categories, each with 2 images, processed by the convolutional network. After obtaining the feature vectors of 6 images, we compare the feature vectors of each categoryFind the mean, a total of 3 categories were obtainedClass prototype (only one class prototype per class).
Insert image description here
When we need to classify new data, useThe same convolutional network used to create class prototypesExtract features for this data point
Insert image description here
这张狮子图片并未出现在支撑集,是新的数据(称为 查询点)

After getting the feature vector of the query point and all the feature vectors of the class prototypecompare distance(Euclidean distance) to determine which class the query point belongs to.
Insert image description here
After finding the distance between the class prototype and the query point embedding, we apply softmax on this distance and get the probability. Since we have 3 classes - lion, elephant and dog, we will get 3 probabilities. The class with the highest probability is the class of the query point.
此处采用softmax函数,需要将 欧式距离 取相反数变成负值

3.1.1 Algorithm

  1. Suppose there is a data set D = {(x1, y1),(x2, y2), …,(xn, yn)}, where x is the feature, y is the class label, and there are k categories in total.

  2. Randomly sample n data points from each category of the data set D to form a support set S, with a total of n×k pictures.

  3. Select m data points to form the query set Q (Tags required)。

  4. Use the convolutional neural network to extract the feature vectors in the support set S, and find the average of these feature vectors by category,Get class prototype

  5. The convolutional neural network is used to extract the feature vectors in the query set Q.

  6. Calculate the Euclidean distance d between the query set and the class prototype

  7. by pairingd takes a negative valueUsing softmax, predict the probability p k that the query set belongs to its own category
    Insert image description here

  8. Calculate the negative log probability loss [其实就是交叉熵损失]L = -log(p k ) and minimize the loss using stochastic gradient descent.

3.1.2 Performing classification using prototype networks

Remember, don’t use the entire dataset for training. Because we use single-sample learning, we extract some data points from each class as a support set, and use the support set to train the network in stages. For each stage, we sample data points, build a support set and a query set, and train Model.

# 支撑集中每个类的样本数量
num_shot = 5

3.2 Gaussian prototype network

Reference link: Meta-learning—Gaussian Prototypical Network
Paper: Gaussian Prototypical Networks for Few-Shot Learning on Omniglot
Code: https://github.com/hjhdaniel/cv-prototypical-network


In a Gaussian prototype network, the output of the encoder isFeature vectoras well asThe covariance matrix represented by the radius componentorCovariance matrix represented by diagonal components

  • Radius component: If the radius component of the covariance matrix is ​​used, then theThe dimension is 1, because radius is just a number.
  • Diagonal component: If you use the diagonal component of the covariance matrix, the dimensions of the covariance matrix will be the same as the eigenvector matrixDimensions are the same

The algorithm for the inverse of the covariance matrix: Let S raw be the covariance matrix and S be the inverse covariance matrix.

  1. S = 1 + softplus( Sraw )
  2. S = 1 + sigmoid( Sraw )
  3. S = 1 + 4×sigmoid( Sraw )
  4. S = offset + scale×softplus(S raw / div), offset, scale, div are initialized to 1.0, which are trainable parameters

where softplus(x) = log (1 + e x ), sigmoid(x) = 1/ (1 + e −x )


So what are covariance matrices and embeddings used for? As mentioned before, it 拓宽了数据点周围的置信区域,this is very useful in noisy data. Suppose there are two classes, A and B.Black dots represent embeddings of data points, around the black spotsThe circles represent the covariance matrix, large scaleThe dashed circle represents the overall covariance matrix of a class,MiddleAsterisk indicates class prototype. As you can see, the covariance matrix around the embedding gives us a confidence region around the data points and the class prototype.
Insert image description here

When implementing the code, the final feature map is subjected to a convolution operation to obtain a vector of (D+D)×1×1, where 1×1 represents the size of the feature map after convolution, and (D+D) is the dimension. Split this vector into a feature embedding vector and a diagonal component representation of the covariance matrix

embeddings, raw_covariance_matrix = tf.split(X_encoded, [embedding_dim, covariance_matrix_dim], 1)

Next, compute the inverse Sci of the covariance matrix using any of the methods discussed . Then compute
the class prototype Pc : where
Insert image description here
Sci is the diagonal component of the inverse covariance matrix, xci is the embedding, and the superscript c Representative class.

After computing the prototype of each class, we learn the embedding of the query point. Let
Insert image description here
_
Insert image description here

3.3 Supplement to the paper

For Gaussian prototype networks, the radius or diagonal of the covariance matrix is ​​output together with the embedding vector (more precisely, in its original form, see Section 3.1 for details). These are then used to weight the embedding vectors corresponding to support points of a particular class, and to calculate the overall covariance matrix for that class. Then, the distance d c (i) from a prototype c of a class to a query point i is calculated as
Insert image description here
where p c is the center point, or prototype, of class c, and S c = Σ -1 c is its covariance matrix inverse. Therefore, the Gaussian Prototype Network is able to learn class- and orientation-dependent distance measures in the embedding space. We found that the speed of training and its accuracy depend heavily on how distance is used to construct the loss. We conclude that the best option is to use linear Euclidean distance, i.e. d c (i). The specific form of the loss function used is presented in Algorithm 1. Figure 2 shows the embedding space diagram of the Gaussian prototype network. Figures 10 and 11 in the Appendix show a sample of the embedding space during training. It illustrates the clustering of similar characters for classification.
Insert image description here

Figure 2: Diagram showing the embedding space of a Gaussian prototype network. An image is mapped by the encoder to its embedding vectors (dark spots). Its covariance matrix (dark ellipse) is also output by the encoder. Then, the overall covariance matrix for each class is calculated (the large light-colored ellipse), as well as the prototype of the class (the stars). A class covariance matrix is ​​used to locally modify the distance metric of the query point (shown in gray).

We study the setting where the covariance matrix is ​​diagonal, as summarized in Section 3.1. For the radius case, S = sI, where I is the identity matrix and s∈R1 is calculated from the raw encoder output for each image. For the diagonal case, S = diag ( s), where s is also calculated from the raw encoder output for each image.

A key part of a prototype network is the creation of a category prototype from the available support points for a specific category. We propose as our solution a variance-weighted linear combination of the embedding vectors of individual support instances. Let class c have supporting images I i , which are encoded as embedding vectors x c i , and the inverse of the covariance matrix S c i , whose diagonal is s c i . The prototype, i.e. the center point of the class, is defined as
Insert image description here
where ◦ represents component-wise multiplication and division is also component-wise. Then, the diagonal of the quasi-covariance matrix is ​​calculated as
Insert image description here
This is equivalent to optimizing the Gaussians centered at each point into an overall quasi-Gaussian, hence the name of the network: Gaussian. The elements of s are actually 1/σ 2 . Therefore, Equations 5 and 6 correspond to weighting the examples by 1/ σ2 . This allows the network to down-weight examples that are less important to defining a class, thus making our architecture more suitable for noisy, uneven, or otherwise imperfect data sets.

For a one-shot regime, which is how our network is trained, there is a single label vector x c defining each category. This means that the vector itself becomes the prototype of the class and its covariance matrix is ​​inherited by the class. The covariance then plays a role in modifying the distance from the query point. The complete algorithm is described in Algorithm 1.
Insert image description here

Guess you like

Origin blog.csdn.net/qq_56039091/article/details/127527785