OpenAI Development Series (8): Advanced Prompt Engineering Based on Chain of Thought (CoT)

The full text has a total of more than 9,000 words, and the expected reading time is about 18~30 minutes | Full of dry goods (with paper reproduction), it is recommended to collect!

The goal of this article: to introduce the basic method of prompting engineering, the prompting method of thinking chain and the prompting method of LtM, and reproduce and solve the four classic reasoning problems in the paper .

image-20230816161915843

Code download click here

Most of the content is the scientific research results in the LLM field in the past two years, but the conclusions are mostly solutions in the English context. Therefore, these high-value scientific research results will be systematically sorted out, and the Chinese content logic will be proposed in combination with the actual situation of the Chinese context. inferred solution.

1. Introduction

The emergent ability of the large model (LLM) refers to the ability of the model to solve problems in certain fields without being trained for specific tasks (data), but under some technical means.

Example: To play the game of "counting three and quitting one" , suppose there is a game called "counting three and quitting one". The rules of the game are as follows: people form a circle, start with one person, and count one by one in order. Every person who counts to 3 must exit the game. This process continues until only the last person remains. Now, without explicitly telling GPT-3 or GPT-4 how to play this game, or providing a specific algorithm, see how it responds:

Prompt: If there are 10 people playing the game of counting three quits one, who is the last remaining person?

Look at ChatGPT's reply:

image-20230816140252160

Based on the above reasoning process, it can be said that the big language model does not even know what the true meaning of the numbers represents, but after learning countless corpus and discovering the potential probability relationship between numbers, mathematical operations or complex The ability to reason is a very strong ability.

The large models mentioned above can solve problems in some specific fields under some technical means, and this technical means can be divided into two categories in terms of current research: hint engineering and fine-tuning. Hint engineering and fine-tuning are both guidance and optimization methods for the emerging ability of the model, but compared with fine-tuning, hint engineering is cheaper, more flexible to use, and has a better effect on improving the complex semantic understanding of the model in a small semantic space .

Prompt engineering can be understood in this simple way: a complete interaction process between a user and a large language model is the prompt engineering of a large language model (LLM). Different prompt methods will obtain completely different quality results.

2. Four classical reasoning problems

For hint engineering, its focus is to solve complex semantic understanding problems , and to verify whether the model has this ability, you can observe whether the model can solve complex logical reasoning problems.

If the model can solve the inference problems that cannot be solved in the original state under the guidance of the hint engineering, it means that the hint engineering can improve the reasoning ability of the model, and the more effective the hint engineering, the greater the improvement of the reasoning ability of the model. This can be verified by setting reasoning problems of different complexity.

Consider the following four classic reasoning problems:

Reasoning questions 1 and 2 are from the paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

image-20230719091658909

Switch to the Chinese context:

  • reasoning question 1

prompt1 = 'Roger has five tennis balls, he bought two more boxes of tennis balls, each box contains 3 tennis balls, how many tennis balls does he have in total now? '

code show as below:

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")

prompt1 = '罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?'

response1 = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt1,
            max_tokens=1000,
            )

response1["choices"][0]["text"].strip()

Look at the inference results:

image-20230719092147846

  • Reasoning Question 2

prompt2 = 'There are a total of 23 apples in the cafeteria. If they use up 20 apples and then buy 6 apples, how many apples are there in the cafeteria now? '

Look at the code:

prompt2 = '食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?'
response2 = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt2,
            max_tokens=1000,
            )

response2["choices"][0]["text"].strip()

Look at the inference results:

image-20230719092449486

This reasoning question is a little more complicated, that is, the cafeteria not only adds 6 apples, but also consumes 20 apples. If there is an increase or decrease, the large model cannot make a correct judgment. The correct answer should be that there are 23-20+6=9 apples left in the cafeteria.

Reasoning question 3 comes from the paper: Large Language Models are Zero-Shot Reasoners

image-20230719093003045

Switch to Chinese context:

  • reasoning question 3

prompt3 = 'The juggler can juggle 16 balls. Half of the balls are golf balls, and half of the golf balls are blue. How many blue golf balls are there in total? '

Look at the code:

prompt3 = '杂耍者可以杂耍16个球。一半的球是高尔夫球,一半的高尔夫球是蓝色的。请问总共有多少个蓝色高尔夫球?'
response3 = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt3,
            max_tokens=1000,
            )

response3["choices"][0]["text"].strip()

Look at the inference results:

image-20230816141812460

The mathematical calculation process of the third logic problem is not complicated, but a language trap is designed, that is, what is half of half. It can be found that the model cannot make accurate judgments around this question, and the correct answer should be 16*0.5*0.5=4 blue golf balls.

Reasoning question 4 is from the paper LEAST-TO-MOST PROMPTING ENABLES COMPLEX REASONING IN LARGE LANGUAGE MODELS

image-20230719150659795

  • reasoning question 4

prompt4 = 'It took Amy 4 minutes to climb to the top of the slide, it took her 1 minute to slide down, the water slide will close in 15 minutes, how many times can she slide before it closes? '

Look at the code:

prompt4 = '艾米需要4分钟能爬到滑梯顶部,然后需要花费1分钟滑下来,现在水滑梯将在15分钟后关闭,请问在关闭之前她能滑多少次?'
response4 = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt4,
            max_tokens=1000,
            )

response4["choices"][0]["text"].strip()

Look at the inference results:

image-20230816142103058

The calculation process of this question is the most complicated among the four classic reasoning questions, involving multi-stage calculations and division operations. The correct calculation process should be to first calculate that it takes Amy a total of 5 minutes to climb up and down, and then the slide will close in 15 minutes, so it can slide 15/5=3 times before closing.

From the above results, in the case of Zero-shot, 'text-davinci-003' has weak logical reasoning ability, and can only solve relatively simple reasoning problems that only have a linear operation process. For these four For a reasoning question, the model only answered the first question correctly, and the other questions were answered wrongly. It can be said that the reasoning ability of the model is very poor.

The next step is to show the prowess of the prompting project. Next, we will solve these problems by reproducing different prompting methods to strengthen the logical processing ability of the model.

3. One-shot & Few-shot prompt learning method

The easiest way to prompt the project is to input some similar questions and the answers to the questions, let the model learn from it, and ask new questions at the end of the same prompt, so as to improve the reasoning ability of the model. This method is also known as One-shot or Few-shot prompting method.

One-shot and Few-shot were first proposed by the OpenAI research team in the paper "Language Models are Few-Shot Learners" . This paper is also the originator of the hint engineering method. It not only introduces the two core methods of hint engineering, but also Describe in detail the specific reasons behind this.

In terms of specific applications, the Few-shot prompt method is not complicated. It only needs to input the questions + answers of some similar questions as part of the prompt .

Do an experiment: first, input the first example that the model can answer correctly as a prompt word, and check whether the second question can be deduced smoothly:

The writing format of Few-shot:

When you need to enter multiple questions and answers as prompt words, start the question with Q and start the answer with A (you can also replace it with "question" and "answer"), and different question and answer dialogues need to be wrapped for clearer display. The specific method is to complete by escape character + newline.

code show as below:

prompt_Few_shot1 = 'Q:“罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?” \
                  A:“现在罗杰总共有11个网球。” \
                  Q:“食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?” \
                  A:'

response_Few_shot1 = openai.Completion.create(
                     model="text-davinci-003",
                     prompt=prompt_Few_shot1,
                     max_tokens=1000,
                     )

response_Few_shot1["choices"][0]["text"].strip()

Look at the inference results:

image-20230719101912636

Although it is impossible to determine how the model's prediction process has changed, after learning the first example, the model is indeed able to make an accurate judgment on the second question. It can be found that Few-shot can play a certain role in improving the logical reasoning ability of the model.

Test again by inputting the questions and answers of the two examples as prompt words to see if the model can correctly answer the third question. code show as below:

prompt_Few_shot2 = 'Q:“罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?” \
                  A:“现在罗杰总共有11个网球。” \
                  Q:“食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?” \
                  A:“现在食堂总共有9个苹果。” \
                  Q:“杂耍者可以杂耍16个球。一半的球是高尔夫球,一半的高尔夫球是蓝色的。请问总共有多少个蓝色高尔夫球?” \
                  A:'

response_Few_shot2 = openai.Completion.create(
                     model="text-davinci-003",
                     prompt=prompt_Few_shot2,
                     max_tokens=1000,
                     )

response_Few_shot2["choices"][0]["text"].strip()

Look at the inference results:

image-20230816142732247

The model still answered the third question incorrectly. Next, try to use the first two questions as part of the prompt words, and let the model answer the fourth question, look at the code:

prompt_Few_shot3 = 'Q:“罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?” \
                  A:“现在罗杰总共有11个网球。” \
                  Q:“食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?” \
                  A:“现在食堂总共有9个苹果。” \
                  Q:“艾米需要4分钟能爬到滑梯顶部,然后需要花费1分钟滑下来,现在水滑梯将在15分钟后关闭,请问在关闭之前她能滑多少次?” \
                  A:'

response_Few_shot3 = openai.Completion.create(
                     model="text-davinci-003",
                     prompt=prompt_Few_shot3,
                     max_tokens=1000,
                     )

response_Few_shot3["choices"][0]["text"].strip()

Look at the inference results:

image-20230816142922210

The fourth question was also answered incorrectly. This also shows that the few-shot hint method can improve the reasoning ability of the model to a certain extent, but the improvement is limited. For slightly more complex reasoning questions, the model still cannot make accurate answers .

The use of Few-shot is relatively simple, but in fact there are many variants of Few-shot. One of the very important variants is to modify the examples around the prompts, that is, not only provide questions + answers in the examples, but also Some "hints" to assist thinking and judgment will be added.

4. Thinking Chain Prompt Method

4.1 Zero-shot-CoT Prompt Method

Zero-shot-CoT is a better hint method under the idea of ​​Few-shot. It uses the chain of thought (also known as the chain of thought, CoT) hint method to solve this problem. A very simple and effective way is to add a sentence "Let's think step by step" at the end of the prompt word, which can greatly improve the reasoning ability of the model.

This method was first proposed by the University of Tokyo and Google in the paper "Large Language Models are Zero-Shot Reasoners" . This approach is also called Zero-shot-CoT because only the prompt words need to be modified without manually writing successful examples of derivation (no need to write thought chain samples) .

According to the description of the original paper, the author tried many different groups of prompt word suffixes when testing the Zero_shot_CoT method, and tested on a robot instruction data set, and finally found that "Let's think step by step" works best, other instructions and various The instruction accuracy ranking is as follows:

image-20230719104708205

If you switch to the Chinese context and translate the command "Let's think step by step" into Chinese, after doing a lot of tests, you can conclude that "Please reason step by step and draw a conclusion" is far better than "Please Let's think step by step" and other similar prompt words and sentences , this situation also gave a very profound inspiration, that is, the "thinking process" of the large model is a black box model, even if the prompt words with similar meanings are realistic for the model The influence may have a very big difference , you can test it yourself, the effect is very different.

Use the command "Please reason step by step and draw a conclusion" to test in the Chinese context,

  • reasoning question 1

look at the code

prompt_Zero_shot_CoT1 = '罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?请一步步进行推理并得出结论。'

response_Zero_shot_CoT1 = openai.Completion.create(
                          model="text-davinci-003",
                          prompt=prompt_Zero_shot_CoT1,
                          max_tokens=1000,
                          )

response_Zero_shot_CoT1["choices"][0]["text"].strip()

Look at the inference results:

image-20230719105416603

  • Reasoning Question 2

Look at the code:

prompt_Zero_shot_CoT2 = '食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?请一步步进行推理并得出结论。'

response_Zero_shot_CoT2 = openai.Completion.create(
                          model="text-davinci-003",
                          prompt=prompt_Zero_shot_CoT2,
                          max_tokens=1000,
                          )

response_Zero_shot_CoT2["choices"][0]["text"].strip()

Look at the inference results:

image-20230719105626548

  • reasoning question 3

Look at the code:

prompt_Zero_shot_CoT3 = '杂耍者可以杂耍16个球。一半的球是高尔夫球,一半的高尔夫球是蓝色的。请问总共有多少个蓝色高尔夫球?请一步步进行推理并得出结论。'

response_Zero_shot_CoT3 = openai.Completion.create(
                          model="text-davinci-003",
                          prompt=prompt_Zero_shot_CoT3,
                          max_tokens=1000,
                          )

response_Zero_shot_CoT3["choices"][0]["text"].strip()

Look at the inference results:

image-20230816143247078

  • reasoning question 4

Look at the code:

prompt_Zero_shot_CoT4 = '艾米需要4分钟能爬到滑梯顶部,然后需要花费1分钟滑下来,现在水滑梯将在15分钟后关闭,请问在关闭之前她能滑多少次?请一步步进行推理并得出结论。'

response_Zero_shot_CoT4 = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt_Zero_shot_CoT4,
            max_tokens=1000,
            )

response_Zero_shot_CoT4["choices"][0]["text"].strip()

Look at the inference results:

image-20230816143543094

In fact, the fourth question is still answered incorrectly.

After the verification of four logical derivation questions, it is true that Zero-shot-CoT is more effective than Few-shot, that is, it can greatly improve the reasoning ability of the model through more concise prompts.

Two other important conclusions of the paper:

The second conclusion that needs to be focused on is: In this paper, for the first time, the idea of ​​using a large model for two-stage reasoning is proposed, that is, the first stage first splits the problem and solves the problem in sections (Reasoning Extraction), and then In the second stage, the answer extraction (Answer Extraction) will be carried out , which will give great inspiration to the subsequent LtM prompt method.

image-20230719110418641

The third important conclusion of this paper: the author has verified through testing that in actual use, Zero-shot-CoT is slightly weaker than the Few-shot-CoT method

image-20230719110601240

The paper clearly stated that the larger the model, the better the effect of CoT. In other words, the larger the model, the better the effect of CoT on stimulating the "emergence ability" of the model. And GPT-3 can achieve an accuracy rate of about 55% on the GSM8K data set.

GSM8K is a very well-known data set composed of elementary school mathematics problems, which is often used to test the reasoning ability of the model.

4.2 Few-shot-CoT hint method

In terms of birth time, Few-shot-CoT was born earlier than Zero-shot-CoT. The difference between them is: Zero-shot-CoT stimulates the thinking of the model by modifying the suffix of the prompt word in the case of zero-shot prompts chain, and Few-shot-CoT is to write the sample of thinking chain as a prompt word, let the model learn the derivation method of the thinking chain, so as to better complete the derivation task .

This method was first proposed by the Google Brain team in the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" . It was also in this paper that the concept of the chain of thought was first proposed, so this paper can be said to be the first of the chain of thought. The originator of the mountain.

Compared with Few-shot, Few-shot-CoT is only different in that it needs to give not only the answer to the question in the prompt sample, but also the process of deriving the question (that is, the chain of thought), so that the model can learn The derivation process of the chain of thought, and apply it to new problems . A series of conclusions are given in the paper to demonstrate the validity of the application of the thinking chain in these fields.

image-20230719152322893

For example, around reasoning 1, manually write a thought chain as a few-shot example:

'Q: "Roger has five tennis balls, and he bought two more boxes of tennis balls, each with 3 tennis balls, how many tennis balls does he have in total now?" A: "
Roger had five tennis balls at the beginning, and he bought another Bought two boxes of tennis balls, each box of 3 tennis balls, and bought a total of 6 tennis balls, so now the total is 5+6=11 tennis balls. So the answer is 11."'

After obtaining an example of a chain of thought, you can use it as a sample to perform Few-shot-CoT to solve the second reasoning problem. Look at the code:

prompt_Few_shot_CoT2 = 'Q:“罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?” \
                        A:“罗杰起初有五个网球,又买了两盒网球,每盒3个,所以,他总共买了2×3=6个网球,将起始的数量和购买的数量相加,可以得到他现在总共的网球数量:5+6=11,所以罗杰现在总共有11个网球” \
                        Q:“食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?” \
                        A:'

response_Few_shot_CoT2 = openai.Completion.create(
                         model="text-davinci-003",
                         prompt=prompt_Few_shot_CoT2,
                         max_tokens=1000,
                         )

response_Few_shot_CoT2["choices"][0]["text"].strip()

Look at the inference results:

image-20230816144132768

From the results, the model can answer the second question very well. Next, input the thought chain of the first two questions as prompt words to guide the model to solve the third question. Look at the code:

prompt_Few_shot_CoT3 = 'Q:“罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?” \
                        A:“罗杰一开始有五个网球,又购买了两盒网球,每盒3个,共购买了6个网球,因此现在总共由5+6=11个网球。因此答案是11。” \
                        Q:“食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?” \
                        A:“食堂起初有23个苹果,用掉20个,又买了6个,将苹果的减少量减去使用量,加上购买的数量,可以得到现在食堂总共的苹果数量:23-20+6=9,所以现在食堂总共有9个苹果。” \
                        Q:“杂耍者可以杂耍16个球。一半的球是高尔夫球,一半的高尔夫球是蓝色的。请问总共有多少个蓝色高尔夫球?” \
                        A:'

response_Few_shot_CoT3 = openai.Completion.create(
                         model="text-davinci-003",
                         prompt=prompt_Few_shot_CoT3,
                         max_tokens=1000,
                         )

response_Few_shot_CoT3["choices"][0]["text"].strip()

Look at the inference results:

image-20230816144343419

Continue to superimpose, look at the last question, the code is as follows:

prompt_Few_shot_CoT4 = 'Q:“罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?” \
                        A:“罗杰一开始有五个网球,又购买了两盒网球,每盒3个,共购买了6个网球,因此现在总共由5+6=11个网球。因此答案是11。” \
                        Q:“食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?” \
                        A:“食堂最初有23个苹果,用掉20个,然后又买了6个,总共有23-20+6=9个苹果,答案是9。” \
                        Q:“杂耍者可以杂耍16个球。一半的球是高尔夫球,一半的高尔夫球是蓝色的。请问总共有多少个蓝色高尔夫球?” \
                        A:“杂耍者总共能杂耍16个球,并且一半是高尔夫球,而且高尔夫球有一半是蓝色的。所以蓝色高尔夫球的数量是 8 ÷ 2 = 4,也就是说总共有4个蓝色的高尔夫球。” \
                        Q:“艾米需要4分钟能爬到滑梯顶部,然后需要花费1分钟滑下来,现在水滑梯将在15分钟后关闭,请问在关闭之前她能滑多少次?” \
                        A:'

response_Few_shot_CoT4 = openai.Completion.create(
                         model="text-davinci-003",
                         prompt=prompt_Few_shot_CoT4,
                         max_tokens=1000,
                         )

response_Few_shot_CoT4["choices"][0]["text"].strip()

Look at the results:

image-20230816144602258

The thought chain of the first three questions was input as a prompt word sample, and the fourth question could also be answered correctly. But there is a problem: I tested 10 times, and 3 of them were wrong answers. That is to say, although the Few-shot-CoT method is effective, it is not very stable. If you want to get a stable correct answer, A more advanced hinting method is required .

According to the conclusion in the paper "Large Language Models are Zero-Shot Reasoners", from the test results of massive data, Few-shot-CoT is more accurate than Zero-shot-CoT .

Emphasis is placed on the relationship between the model size and the effect of the chain of thought. In short, the larger the model, the better the Few-shot-CoT application effect. As shown below:

image-20230719152541138

Also taking the GSM8K dataset as an example, it can be seen that the model effect LaMDA (137B) < GPT-3 (175B) < PaLM (540B). Similar to Zero-shot-CoT, the larger the model, the better the effect of CoT on stimulating the potential capabilities of the model.

4.3 CoT improvement method: LEAST-TO-MOST PROMPTING (LtM prompt method)

Shortly after the CoT proposed by Google Brain was actually verified to greatly improve the reasoning ability of large language models, another team from Google Brain published another heavyweight paper "LEAST-TO-MOST PROMPTING ENABLES COMPLEX REASONING IN" on this basis . LARGE LANGUAGE MODELS ", in which a hint method called Least-to-Most (LtM) is proposed to further improve the reasoning ability of the large language model. This hint method called LtM can not only improve the performance of the model on GSM8K to 62%, but even achieve 3 times the effect of CoT in some special semantic interpretation scenarios.

This method is by far the most effective hint learning method around the improvement of model reasoning ability.

The original intention of the LtM prompt method is to solve the problem of insufficient generalization ability of the CoT prompt method—that is, the manual-written thought chain prompt samples may not be able to migrate well to other problems. In other words, it is The problem-solving process migration ability is insufficient, that is, the generalization ability is insufficient. And this lack of generalization ability will lead to "new problems" that cannot be solved using "old templates" .

So one idea is: let the big model find the thinking chain to solve the current problem by itself . Based on this idea, Google Brain has developed a new prompting process, that is, through the prompting process, let the model find out which problems must be solved step by step to solve the problem, and then solve the most original problems by solving these problems in sequence .

The whole prompting process will be divided into two stages, the first stage is top-down decomposition (Decompose Question into subquestion), the second stage is bottom-up sequentially solve the problem (Sequentially Solve Subquestion), the whole The process of answering questions one by one can actually be regarded as a CoT process, but LtM will require the model to individually generate a link to solve the problem according to each different problem, so that complex reasoning problems can be solved more accurately . The change from less to more in the whole process is the source of the word LEAST-TO-MOST.

image-20230719153447415

The above picture is an example of the process of understanding LtM prompts proposed in the paper. The example here is the fourth reasoning problem that has been trying to solve.

In the paper, the prompt template "To solve __, we need ti first solve:" is used to guide the model to create sub-questions . The model will ask the sub-question "How much time does it take Amy to climb the slide + slide down the slide once" according to the original question, and then first Solve this sub-problem, and then solve the original problem. It is not difficult to find that this is actually a very simple two-stage problem-solving process-the first stage only decomposes an additional sub-problem (that is, a total of two questions to answer).

According to the results given in the paper, the first-stage model can very smoothly answer "how much time does it take for Amy to climb the slide + slide down the slide at a time" - 5 minutes, and then successfully answer Amy's time before the slide is closed. Accurate conclusions that can be played three times.

In the second stage, that is, Sequentially Solve Subquestion is not simply solving two problems in sequence, but after solving the sub-questions, the original question, sub-question, question and answer are all input to the large language model as a prompt, Let it answer the original question.

Therefore, the core of LtM is not only to guide the model to split the problem, but also to return the questions and answers of the sub-questions to the model in time, so as to better answer the original question.

Theoretically, there will be three calls to the large model in the whole process, and the question-and-answer process is as follows:

35

Try it this way, look at the code:

prompt_Zero_shot_LtM4 = 'Q:“艾米需要4分钟能爬到滑梯顶部,然后需要花费1分钟滑下来,现在水滑梯将在15分钟后关闭,请问在关闭之前她能滑多少次?”\
                         A:为了解决“在关闭之前她能滑多少次?”这个问题,首先需要解决的问题是'

response_Zero_shot_LtM4 = openai.Completion.create(
                          model="text-davinci-003",
                          prompt=prompt_Zero_shot_LtM4,
                          max_tokens=1000,
                          )

response_Zero_shot_LtM4["choices"][0]["text"].strip()

The inference results of the model are as follows:

image-20230816160425721

The LtM prompt process can solve this reasoning problem very well. Moreover, in the actual testing process, the model can not only disassemble the task, but also automatically answer the original question according to the answers of the disassembled sub-questions, and finally answer the original question accurately in a prompt sentence.

'In order to solve the problem of "__", the first problem we have to solve is __' is also a proven prompt word template that is the most appropriate and can be answered most accurately. It is recommended to use it frequently and verify its function.

Similarly, when using LtM to test the other three questions:

  • reasoning question 1

code show as below:

prompt_Zero_shot_LtM1 = 'Q:“罗杰有五个网球,他又买了两盒网球,每盒有3个网球,请问他现在总共有多少个网球?”\
                         A:为了解决“罗杰总共又多少个网球?”这个问题,首先要解决的问题是'

prompt_Zero_shot_LtM1 = openai.Completion.create(
                        model="text-davinci-003",
                        prompt=prompt_Zero_shot_LtM1,
                        max_tokens=1000,
                        )

prompt_Zero_shot_LtM1["choices"][0]["text"].strip()

Look at the inference results:

image-20230816160625508

  • Reasoning Question 2

code show as below:

prompt_Zero_shot_LtM2 = 'Q:“食堂总共有23个苹果,如果他们用掉20个苹果,然后又买了6个苹果,请问现在食堂总共有多少个苹果?”\
                         A:为了解决“现在食堂总共有多少个苹果”这个问题,首先要解决的问题是'

prompt_Zero_shot_LtM2 = openai.Completion.create(
                        model="text-davinci-003",
                        prompt=prompt_Zero_shot_LtM2,
                        max_tokens=1000,
                        )

prompt_Zero_shot_LtM2["choices"][0]["text"].strip()

Look at the inference results:

image-20230816160715230

  • reasoning question 3

code show as below:

prompt_Zero_shot_LtM3 = 'Q:“杂耍者可以杂耍16个球。一半的球是高尔夫球,一半的高尔夫球是蓝色的。请问总共有多少个蓝色高尔夫球?”\
                         A:为了解决“总共有多少个蓝色高尔夫球”这个问题,首先要解决的问题是'

prompt_Zero_shot_LtM3 = openai.Completion.create(
                        model="text-davinci-003",
                        prompt=prompt_Zero_shot_LtM3,
                        max_tokens=1000,
                        )

prompt_Zero_shot_LtM3["choices"][0]["text"].strip()

The reasoning results are as follows:

image-20230816160812046

Through a large number of tests, the LtM prompt process can help the model solve the above problems very well, which also shows that the LtM has a very significant effect on improving the reasoning ability of large language models. It can be said that it has been tried so far (in terms of solving reasoning problems ) is the most effective type of prompting method, but it should be noted that although the effect is very good, there are still cases of model calculation errors, which cannot be avoided. What can be done at present is to continuously explore the prompting method that can stabilize the correct output to the greatest extent .

V. Summary

This article first introduces four classic reasoning problems, and then explains the One-shot and Few-shot prompt learning methods in detail. The core part is the thought chain prompting method, including Zero-shot-CoT prompting method, Few-shot-CoT prompting method, and the improved CoT method LEAST-TO-MOST PROMPTING (LtM prompting method). These in-depth contents are designed to help understand and master more advanced hint engineering techniques for more efficient application and optimization in large model development.

Reference papers:

Finally, thank you for reading this article! If you feel that you have gained something, don't forget to like, bookmark and follow me, this is the motivation for my continuous creation. If you have any questions or suggestions, you can leave a message in the comment area, I will try my best to answer and accept your feedback. If there's a particular topic you'd like to know about, please let me know and I'd be happy to write an article about it. Thank you for your support and look forward to growing up with you!

おすすめ

転載: blog.csdn.net/Lvbaby_/article/details/131811653