【提示学习】AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Paper information

name content
paper title AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts
Paper address https://arxiv.org/abs/2010.15980
field of study NLP, text classification, hint learning, text classification
proposed model AutoPrompt
source code http://ucinlp.github.io/autoprompt

read summary

  PromptThe task requires building the fit Pattern, but writing the fit Patternrequires manual work and human guesswork, with a lot of uncertainty. To solve this problem, the proposed AUTOPROMPTmodel is created based on gradient descent search Pattern.

[0] Summary

  This paper also proposes an AUTOPROMPTautomated method called , for creating appropriate prompt templates for diverse tasks Pattern.

  Although the traditional MLMtask can evaluate the knowledge level of the language model, it needs to manually write the appropriate one Pattern, which involves a large human cost and guesswork. AUTOPROMPTMethods based on gradient descent search Patternsolve this problem.

  AUTOPROMPTWithout additional parameters or fine-tuning, they can sometimes even achieve comparable performance to state-of-the-art supervised models.

[1 Introduction

  The paper points out that traditional analysis methods, such as probing classifiers and attention visualization , have certain limitations and deficiencies. By promptingmeans of hints, transforming the task into the form of a language model can obtain the knowledge possessed by the model more directly.

  However, existing hinting methods require manual construction of context, which is time-consuming and error-prone, and the model is highly sensitive to context. To address this issue, the researchers propose AUTOPROMPTmethods to replace manual construction by automatically generating prompts, thereby improving the efficiency and broad applicability of the analysis.

  AUTOPROMPTBased on a gradient search strategy, the method combines raw task inputs and trigger word (trigger tokens)sets to generate cues applicable to all inputs. A language model can be evaluated as a classifier by combining its predictions for cues with the class probabilities of associated tag words.

  The effectiveness of the paper is proved through multiple experiments AUTOPROMPT. First of all, the researchers used to AUTOPROMPTconstruct hints for sentiment analysis and natural language reasoning, without fine-tuning, only using the pre-trained masked language model can (MLMs)achieve good performance, SST-2reaching 91% accuracy on the dataset, which is better than Fine-tuned ELMomodel. Second, the researchers AUTOPROMPTapplied LAMAthe fact retrieval task to successfully extract MLMsfact knowledge by constructing more effective hints. Additionally, the researchers introduced a variant of the task similar to relation extraction, testing MLMswhether knowledge can be extracted from a given text. MLMsExperimental results show that we can outperform existing relation extraction models when provided with context sentences of real facts , but Patternperform poorly when provided with context sentences of artificial templates.

  Finally, the paper also points out that AUTOPROMPTit has certain practical advantages over fine-tuning. In the case of low data volume, AUTOPROMPThigher average and worst-case accuracies are achieved. Unlike fine-tuning, using hints LMsdoes not require a lot of disk space to store model checkpoints, and once a hint is found, it can be used on top of existing pre-training LMs. This has benefits when serving models for multiple tasks.

[2] Model overview

  Writing Patterntemplates is time-consuming and unclear whether the same Patterntemplate will work for every model, and what criteria determine Patternwhether a template is best for eliciting the desired information. Based on this consideration, it is introduced AUTOPROMPT, and its structure is shown in the figure below.

Please add a picture description

[2.1] Background and math notation

To build the prompt template, the original task input xinp x_{inp}   is differentiatedxinp, trigger token xtrig x_{trig}xtrigMLMand the prompt given as input xprompt x_{prompt}xprompt. By using the template λ λλ,将xinp x_{inp}xinpmaps to xprompt x_{prompt}xprompt

[Note] Looking at the lower left of the above picture, n [T]are actually trigger tokens xtrig x_{trig}xtrig, they are the tokens that will be searched by gradient descent, and they are initially [MASK]initialized, [P]which is the real one that we usually predict [MASK].

  For Verbalizerpart, the form of multi-map token probability summation is adopted:

Please add a picture description

[2.2] Gradient-based hint template search

  The idea is promptsto add something that is promptsshared by all trigger tokens, that is, in the template [T]. These are initialized tokenat the beginning [MASK], and then iteratively updated to maximize the likelihood estimate.

  At each step, calculate the jjthjtrigger token replaced byanothertokenwww( w w w belongs to the words in the vocabulary) the maximum value of the gradient. will cause the largest change intop − k top-ktopK constitute thetokenscandidate setV cand V_{cand}Vc and d:

Please add a picture description

[2.3] Automatic tag word selection

  Although in some tasks label tokensthe choice is quite obvious, in some abstract class labelsproblems label tokensit is not clear what to choose. Therefore, in the paper, the author proposes a general two-stage method for automatically selecting label tokensthe set V y V_yVyMethods.

  In the first step, train a prediction logistic classifierusing [MASK] tokenas input class label:

  classifier input:
Please add a picture description
  classifier output:

Please add a picture description

where yy   in the right formulay β y \beta_y βylearned weightbias terms

  在第二步中,将MLMoutput word embeddings w o u t w_{out} wout作为训练好的logistic classifier的输入(替换上式中的 h ( 1 ) h^{(1)} h(1)),获得分数 s ( y , w ) = p ( y ∣ w o u t ) s(y,w) = p(y|w_{out}) s(y,w)=p(ywout) . 直观上来看 s w = e x p ( w o u t ⋅ y + β y ) s_w = exp(w_{out}·y + \beta_{y}) sw=exp(wouty+βy)得分对于与给定label相关的词汇会更大。通过上述方式可以构造出topk最高得分词汇:
Please add a picture description

[Note] PatternThere are hard templates and soft templates. AutoPromptIt belongs to the category of hard templates. Personally, I still think Soft Promptit would be better to train continuous pseudo-markers. No matter how hard the template is searched, the word you are looking for is still PLMin the vocabulary, and the continuous vector has a higher degree of freedom.

Guess you like

Origin blog.csdn.net/qq_43592352/article/details/130932095