some NLP terms

pre-training (pre-training)

Train the model for a task, save the model parameters with good results, and use them directly when you want to perform similar tasks in the future, and you can get better results. This process is pre-training.

fine-tuning

Suppose you already have a pre-trained model for cat face recognition, and you want to use this model to handle other tasks, such as face recognition. At this time, you can directly use the parameters of the previously saved model as the initialization parameters of this task, and then make some modifications according to the results during the training process. This process is fine-tuning.

downstream task

Really want to solve the task. First use public datasets for training, and these datasets may not be very good at what you really want to accomplish, which means fine-tuning this pre-trained model on the dataset of the actual problem to be solved, and this task is called for downstream tasks.

Few-shot Learning (less sample learning)

After the model has learned a large amount of data of a certain category, it only needs a small number of samples to quickly learn new categories.

  • novel class: a category that has not been seen
  • Support set (support sample): a training set of CK data (C is a class, K is how many samples are taken for each class), if K is 1, it is one-shot
  • Prediction object (query set): also known as batch, after training on the support set, it needs to be predicted on the set
    insert image description here

k-way n-shot support Set: Support Set is a small sample data set that helps the model to distinguish new categories. k represents the number of categories in the small sample, and n represents how much data each category has. For example, there are 3 categories, and each category has only one sample, then it is 3-way one-shot.

k-way: The larger the number of k, the lower the classification accuracy
n-shot: The larger the number of n, the higher the classification accuracy

specific methods:

  1. Use the pre-trained model f to perform feature extraction on all small samples to obtain their feature vectors (feature Vectors)
  2. Merge the Feature Vectors of the same category (the average used in the above figure), and then perform regularization (normalize), and finally get the vector ui of each category
    insert image description here
  3. The picture to be predicted (query) is obtained by the vector q according to steps 1 and 2
  4. Compare the vector q and the category vector ui, who is the closest, then what category is the picture
    insert image description here

Prompt? (Natural language prompt information)

As the size of the pre-trained language model continues to increase, the hardware requirements, data requirements, and actual cost of fine-tuning it are also rising. In addition, the rich and diverse downstream tasks also make the design of the pre-training and fine-tuning stages cumbersome and complicated. Therefore, researchers hope to explore methods that are smaller, lighter, more universal and efficient, and Prompt is a method along this direction. try.

In simple terms, a user takes as input a description of a task and a small number of examples, and a language model is used to generate output. This method is called in-context learning or prompting.

Suppose we want to classify the emotion of a sentence Best pizza ever!, we can add a template after this sentence:

Best pizza ever! It was ___.

Then based on the result of filling in the blanks in the previous sentence, the probability of the model predicting great is much higher than bad. Therefore, we can turn the sentiment classification problem into a cloze problem by constructing a suitable Prompt, so that we can make good use of the potential of the pre-training model itself.

https://zhuanlan.zhihu.com/p/386470305

secondary title

Level 3 heading

Guess you like

Origin blog.csdn.net/CSTGYinZong/article/details/129044356