A taste of the paper | Different performances of large language models in in-context learning

9e2fe3a9ad7e9ec0bfb3c883d074cdba.png

Notes sorting: Bi Zhen, Ph.D., Zhejiang University, research direction is knowledge graph, natural language processing

Link: https://arxiv.org/pdf/2303.03846.pd

This article is the latest paper published by institutions such as Google, which aims to study the ability of large model context learning. This paper studies how contextual learning in language models is influenced by semantic priors and input-label mappings. The authors study two different settings and conduct experiments on various models including GPT-3, InstructGPT, Codex, PaLM and Flan-PaLM. Experiments have found that for small language models, semantic priors have a greater impact on ICL, while for large language models, even with stronger semantic priors, it is possible to invert labels or learn unrelated labels. Do learning tasks. In addition, the authors also found that semantic priors and input-label mapping can be enhanced by instruction learning.

General introduction

7d4648461160870ea6114948d1ad4b88.png

figure 1

This paper demonstrates three different ways of context learning: regular context learning, context learning with inverted labels, and context learning with irrelevant labels. In contextual learning with inverted labels, the model needs to override the original semantic prior in order to perform tasks based on the input examples. In the context learning of unrelated labels, the labels are not related to the task semantics, so the model must learn the mapping between the input and the labels to perform the task, and can no longer rely on the semantic information of natural language labels. Specifically, three different ways of context learning are as follows:

(1) In conventional context learning, both the semantic prior and the input-label mapping enable the model to successfully learn context.

(2) In context learning with inverted labels, all labels in an example are inverted, which means semantic prior knowledge and input-label mapping are inconsistent. The labels in the evaluation set are held constant, so for a binary classification task, performing better than 50% accuracy in this setting means the model is unable to cover semantic priors, while performing below 50% accuracy means the model is able to learn Input-label mapping and overriding semantic priors.

(3) In Context Learning with Unrelated Labels (SUL-ICL), the labels are not semantically related to the task (e.g., in sentiment analysis, the paper uses "foo/bar" instead of "negative/positive"). Since labels are irrelevant to task semantics, the model must learn the mapping between inputs and labels to perform tasks, and can no longer rely on the semantic information of natural language labels.

Discussion analysis

94cf3b9d50ca4a05aa5ff054721d0882.png

This paper conducts experiments on a variety of models, covering different sizes, training data, and instruction learning (GPT-3, InstructGPT, Codex, PaLM, Flan-PaLM), to analyze the interaction between semantic priors and input-label mappings. role, and pay particular attention to how the results vary with model size. The authors conducted experiments on seven tasks that are widely used in natural language processing research.

d60aba6e0e7547694522685fb20fd75b.png

figure 2

As shown in Figure 2, when faced with flipped labels, large models are capable of learning input-label mappings by overriding prior semantics, while small models cannot flip predictions and only slightly degrade performance. Note that the true labels of the evaluation examples are not flipped. Therefore, if the model learns to follow flipped labels, its accuracy should be below 50% when more than 50% of the labels are flipped.

c40a1e88535bfed09acd83dd9bcf415f.png

image 3

894657c747c4002667dee39e1bdce0cd.png

Figure 4

As shown in Figure 3, small models rely more heavily on semantic priors than large models, because small models suffer more performance degradation than large models when semantically unrelated targets are used instead of natural language targets. Meanwhile in the SUL-ICL setting of Figure 4, larger models benefit more from additional examples than smaller models.

d1614c0b84e35ddfd7824d234d644b9c.png

Figure 5

As shown in Figure 5, in the SUL-ICL environment, certain tasks emerge as the size of the model increases, and only a sufficiently large model can successfully perform these tasks.

a3ba1d390d4e644a3e64daf78217a788.png

Figure 6

Figure 6 shows the average performance of PaLM and Flan-PaLM models on all datasets as a function of the number of contextual examples. It can be seen that in the SUL-ICL environment, Flan-PaLM performs better than PaLM, and this effect is most pronounced in small models, as Flan-PaLM-8B outperforms PaLM-8B by 9.6%, almost catching up with PaLM-62B. This trend suggests that instruction learning can enhance the ability to learn input-label mappings.

c409020c157be0c25b03c8da0d609a9a.png

Figure 7

In Figure 7, the paper shows the performance of each of the PaLM and Flan-PaLM models with respect to the scale of label flips. Models learned from instructions are worse at flip prediction than models using only pre-training. Even with 100% label flipping, the Flan-PaLM models cannot go beyond random guessing to cover their semantics. The standard PaLM model can reduce the accuracy to 31% when presenting 100% flipped labels. These results suggest that instruction learning either increases the model's reliance on semantic priors or provides the model with more semantic priors because instruction-learned models are less adept at flipping their natural language goals. Combined with the results in Figure 6, instruction learning can improve the model's ability to learn input-label mappings, but it simultaneously enhances the use of semantic priors.

Summarize

In this paper, we study the prior knowledge learned by the language model during the pre-training process and the contextual learning ability of the input label mapping. The study found that large language models can learn to cover semantic priors, and this ability is related to the size of the model. To remove the semantic meaning of labels, the authors propose an experimental setting, Semantic-Free Label Context Learning (SUL-ICL), and find that this context learning ability is also related to model size. In addition, the study also analyzes language models that undergo instruction learning, and finds that instruction learning improves the ability to learn input-label mappings, but also strengthens semantic priors. Finally, the study also analyzes the performance of language models on high-dimensional linear classification tasks and finds that this expressiveness is also related to model size. Taken together, these results demonstrate that the contextual learning behavior of language models changes with model size, and that large language models have the ability to map inputs to multiple types of labels, a true form of symbolic reasoning.


OpenKG

OpenKG (Chinese Open Knowledge Graph) aims to promote the openness, interconnection and crowdsourcing of knowledge graph data with Chinese as the core, and promote the open source and open source of knowledge graph algorithms, tools and platforms.

932f122764a5c958a40f9a3eb943b37a.png

Click to read the original text and enter the OpenKG website.

Guess you like

Origin blog.csdn.net/TgqDT3gGaMdkHasLZv/article/details/130097540