Deep Learning Application - Meta-learning [13]: Meta-learning concept, learning period, working principle, model classification, etc.

insert image description here
[Introduction to advanced deep learning] must-see series, including activation function, optimization strategy, loss function, model tuning, normalization algorithm, convolution model, sequence model, pre-training model, adversarial neural network, etc.

insert image description here
The column introduces in detail: [Introduction to advanced deep learning] must-see series, including activation function, optimization strategy, loss function, model tuning, normalization algorithm, convolution model, sequence model, pre-training model, adversarial neural network, etc.

This column is mainly to facilitate beginners to quickly grasp relevant knowledge. In the follow-up, we will continue to analyze the knowledge principles involved in deep learning to everyone, so that everyone can reserve knowledge while practicing the project, knowing what it is, why it is, and why to know why it is.

Disclaimer: Some projects are online classic projects for everyone to learn quickly, and practical links will be added in the future (competitions, papers, practical applications, etc.)

Column subscription:

# Deep learning application - meta-learning [13]: meta-learning concept, learning period, working principle, model classification, etc.

1. Meta-learning overview

1.1 Meta-Learning Concepts

Meta-Learning is often understood as "Learning-to-Learn",
which refers to the process of improving learning algorithms in multiple learning stages.
During base learning,
an inner (or lower/base) learning algorithm solves a task defined by a dataset and an objective.
During meta-learning, the outer (or upper/meta) algorithm updates the inner learning algorithm so that the model it learns improves the outer target.
Therefore, the core idea of ​​meta-learning is to learn a prior knowledge (prior).

1.2 Meaning of meta-learning

The meaning of meta-learning has two layers. The first layer is to let the machine
learn to learn, so that it has the ability to analyze and solve problems
.
It generalizes well to new domains
to complete new tasks with large differences.

Few-Shot Learning is the application of Meta-Learning in the field of supervised learning.
In the Meta-training stage,
the data set is decomposed into different tasks to learn the generalization ability of the model in the case of category changes.
In the meta-testing stage,
in the face of a new category, there is no need to change the existing model, and only one or a few steps of training are needed to complete the requirements.

1.3 Meta-Learning Units

The basic unit of meta-learning is the task, and the task structure is shown in Figure 1.
Meta-Training Data, Meta-Validation Data, and Meta-Testing Data are task sets composed of sampling tasks.
The tasks in the meta-training set and meta-validation set are used to train the meta-learning model, and
the tasks in the meta-testing set are used to measure the effect of the meta-learning model on completing tasks.

In meta-learning, the previously learned task is called a meta-train task, and the
new task encountered is called a meta-test task.
Each task has its own training set and test set, and
the internal training set and test set are generally called support set (Support Set) and query set (Query Set).
The support set is another N-Way K-Shot problem, that is, there are N categories and each category has K examples.

Figure 1 Task structure.

1.4 Base learner and meta learner

Meta-learning is essentially a hierarchical optimization problem (Bilevel Optimization Problem),
where one optimization problem is nested within another optimization problem.
The external optimization problem and the internal optimization problem are usually referred to as the upper layer optimization problem and the lower layer optimization problem, respectively,
as shown in MAML in Figure 2.

Figure 2. Two-layer optimized meta-learning MAML.

A two-level optimization problem involves two actors:

  1. The upper-level participants are meta-learners,
  2. The lower layer of participants is the base learner.
    The optimal decision of the meta-learner depends on the response of the base learner, and the base learner itself optimizes its own internal decisions.
    These two levels have their own different objective functions, constraints and decision variables.
    The objects and functions of the base learner and the meta-learner are shown in Figure 3.

Figure 3 Base learner and meta learner. The meta-learner summarizes the task experience to learn the commonality between tasks, and at the same time guides the base learner to learn the characteristics of new tasks.

1.4.1 Base learner

The base learner (Base-Learner) is the model in the base layer.
Every time the base learner is trained, it considers the data set on a single task. Its basic functions are as follows:

  • Train the model on a single task, learn the characteristics of the task, find the rules, and answer the questions that the task needs to solve.

  • Obtain helpful experience for completing a single task from the meta-learner, including the initial model and initial parameters, etc.

  • Use the training data set in a single task to construct a suitable objective function,
    design the optimization problem to be solved, and perform iterative updates from the initial model and initial parameters.

  • After training on a single task, both the trained model and parameters are fed back to the meta-learner.

1.4.2 Meta-learner

Meta-Learner is a model in the meta-layer, which summarizes the training experience on all tasks.
After each training of the basic learner, the meta-learner will synthesize new experience and update the parameters in the meta-learner. Its basic functions are as follows:

  • Synthesize the results of base learner training on multiple tasks.

  • Summarize the commonality of multiple tasks, perform fast and accurate reasoning on new tasks,
    and send the reasoning to the base learner as the initial model and initial parameter values,
    or other parameters that can accelerate the training of the base learner.

  • Guide the optimal behavior of the base learner or explore a specific new task.

  • Extract features relevant to the model and training on the task.

1.5 How meta-learning works

The main purpose of meta-learning is to find the meta-learner FFF ,
inFFF guided by the base learnerfff is in the support set (support set)D tr D^{\mathrm{tr}}DUnder the action of tr , after several steps of fine-tuning, the optimal state f ∗ f^{*}suitable for the current new task can be obtainedf . whileFFThe optimization of F requires the cumulative sum of all current task losses,
that is,∇ ∑ n = 1 N l ( fn ∗ , D nte ) \nabla\sum_{n=1}^{N} l \left( f_{n}^{ *}, D_{n}^{\mathrm{te}} \right)n=1Nl(fn,Dnthe) .
The working principle of meta-learning is shown in Figure 4.

Figure 4. How meta-learning works.

1.5.1 Meta-learning training process

Taking the classification task as an example, the specific training process of the N-Way K-Shot problem in meta-learning:

First, a few-shot data set is provided, which generally contains many categories, and
each category contains many samples.
The training set is divided, several categories are randomly selected as the training set, and the remaining categories are used as the test set.

meta-train stage:

  • Randomly select N classes in the training set, each class has K samples, which is the support set, and the
    remaining samples are the query set;
    the support set and query set constitute a task.

  • Each time a task is sampled for training, it is called an episode;
    several tasks are selected at one time to form a batch;

  • A meta-train can train multiple batches;

  • Complete training after traversing all batches.

meta-test stage:

  • Randomly select N categories in the test set, K samples for each category, as the train set, and
    the remaining samples as the test set.

  • Use support set to fine-tune the model;

  • Use the test set to test the model (the test set here is the data that you really want the model to be used for classification).

In the above training process, each training (episode) will sample different tasks,
so overall, the training includes different category combinations.
This mechanism enables the model to learn the common parts of different tasks,
such as how to extract important features and compare The samples are similar, etc., forget about the task-related part in task.
The model learned through this learning mechanism can also perform better classification when faced with new unseen tasks.

1.6 Meta-Learning Keys

The key to meta-learning is to discover the universal law between different problems, and solve the unknown problem by promoting the universal law. Universal laws need to achieve a balance between the expressive power of commonality and characteristics of problems. The search for universal laws mainly depends on the following points:

  • Discover the parts that are closely related between the solved problems and the new problems, extract the universal laws of the solved problems, and use them to solve the new problems;

  • Decompose the new problem, simplify it, and find the universal laws that are closely related to the sub-tasks of the new problem in the solved problems, as well as the scope of application of these laws;

  • Learn reasoning logic in new problems, use reasoning logic to represent new problems, look for laws in these representations, and find solutions to new problems through reasoning logic between various parts of the new problem itself.

1.7 Meta-Learning Classification

  • Optimization-based meta-learning: such as MAML, Reptile, LEO, …

  • Metric-based meta-learning: such as SNAIL, RN, PN, MN, …

  • Model-based meta-learning: such as Learning to learn, Meta-learner LSTM, …

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/131202628