Python Meta-Learning - Implementation of General Artificial Intelligence Chapter 1 Reading Notes

Code of this book: https://github.com/sudharsan13296/Hands-On-Meta-Learning-With-Python
This book’s ISBN number: 9787115539670

Insert image description here

Chapter 1

1.1 Meta-learning

Meta-learning can generate a general artificial intelligence model that can learn to perform various tasks without having to train them from scratch. We can use very few data points to train meta-learning models to complete various related tasks. Therefore, for a new task, the meta-learning model can use the previousKnowledge gained from related tasks, no need to start training from scratch.

There are two categories in the dataset, namely dogs and cats, so it can be called dual (n = 2) category k-sample learning - n represents the number of categories in the dataset. The k in k-sample learning represents the number of data points in each category.

In order for the model to learn from a small number of data points, we will train them the same way. Therefore, when there is a data set D, we pick a few data points from each category in the data set, called the support set. Again, pick a few different data points from each category, called a query set. Therefore, we train the model with a support set and test the model with the query set. We train the model in an episodic fashion, that is, in each stage, extract a small number of data points from the data set D, prepare the support set and query set, and use the support set for training and the query set for training. test. Therefore, after multiple stages, the model will learn how to learn from smaller data sets.

1.2 Types of meta-learning

  • Learning metric spaces
  • Learning initialization
  • Learning optimizer

1.2.1 Learning metric spaces

In a metric-based meta-learning scenario, we will learn a suitable metric space. Suppose we want to learn the similarity between two images. In the metric-based scenario, we use a simple neural network to extractExtract features,andFind similarities by calculating the distance between features of two images. This method is widely used in few-shot learning with few data points. The following chapters will introduce metric-based learning algorithms such as Siamese networks, prototype networks, and relational networks.

1.2.2 Learning initialization

In this method, we try to learn optimal initial parameter values. What does it mean? Let's say we are building a neural network to classify images. We first initialize random weights, calculate the loss, and minimize the loss via gradient descent. Therefore, we will use gradient descent to find the optimal weights that minimize the loss. If you do not initialize the weights randomly, but initialize them with optimal values ​​or values ​​close to the optimal values, you can converge faster and learn quickly. The following chapters will introduce how to accurately find these optimal initial weights through algorithms such as MAML, Reptile, and Meta-SGD.

In MAML and Reptile, we try to find better, availableMultiple related tasksModel parameters that generalize over time so that it can learn quickly with fewer data points.

1.2.3 Learning Optimizer

In this approach, we try to learn the optimizer. How to generally optimize neural networks? The answer is to optimize the neural network by training on large data sets and using gradient descent to minimize the loss. But in few-shot learning scenarios, gradient descent fails because our data set is smaller. So in this case, we're going to learn about the optimizer itself. We will have two networks: a base network that tries to learn, and a meta-network that optimizes the base network.

we useRecurrent neural network RNN ​​(LSTM can also be used) instead of traditional gradient descent optimizer. The behavior of gradient descent is basically a sequence of updates from the output layer to the input layer. We store these updates in a state so that we can use RNN and store the updates in the RNN cell.

The main idea of ​​this algorithm is to replace the gradient descent method with RNN. We are learning to perform gradient descent through an RNN, and this RNN is optimized through gradient descent.

Use RNN to find optimal parameters. Therefore, the RNN (optimizer) finds the optimal parameters and sends them to the optimization object (base network). The optimization object uses this parameter, calculates the loss, and sends the loss to the RNN. Based on this loss, the RNN optimizes itself through gradient descent and updates the model parameters θ.

RNN takes as input the gradient ▽ t of the optimization object and its previous state h t , and returns an output - an update g t that minimizes the optimizer loss . We use function m to represent RNN:
Insert image description here
Insert image description here
it can be used Insert image description hereto update model parameter values.

Insert image description here

Guess you like

Origin blog.csdn.net/qq_56039091/article/details/127239583