Thesis Notes CoT: Prompt + Reasoning + Large Model = Thinking Chain Prompt

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models


Prompt + reasoning + large model: CoT thought chain prompt

SourceGoogle Brain

论文:Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

1. Summary

This paper explores how generating a chain of ideas as a series of intermediate reasoning steps can significantly improve the ability of large language models to perform complex reasoning. In particular, we show how this reasoning ability emerges naturally in sufficiently large language models through a simple method called thought-chain prompting, where some thought-chain demonstrations are provided as examples of prompts.

Experiments on three large-scale language models show that thought chain hints can improve performance on a range of arithmetic, common sense, and symbolic reasoning tasks. The gain in experience can be staggering. For example, a PaLM 540B with only eight thought chain samples achieves state-of-the-art accuracy on the GSM8K math word problem benchmark, even surpassing fine-tuned GPT-3 with a validator.

2. Introduction

In this paper, we combine the strengths of both ideas, cue learning, commonsense reasoning, avoiding their limitations. Specifically, we explore the ability of language models to perform on inference tasks with a small number of cues, given cues consisting of triples (<input, thought chain, output>). A thought chain is a series of intermediate natural language inference steps that lead to a final output, we refer to this method as a thought chain prompt.

insert image description here

As shown in Figure 1. We empirically evaluate on arithmetic, commonsense, and symbolic reasoning benchmarks, showing that chain-of-thought cues outperform standard cues, sometimes to a surprising degree.

insert image description here

Figure 2 shows one result of the GSM8K math word problem benchmark, where PaLM 540B's thought-chain prompts greatly outperform the standard prompts and achieve new state-of-the-art performance. The hint-only approach is important because it does not require large training datasets and a single model checkpoint can perform many tasks without losing generality. This work highlights how large language models can learn from a few examples with task natural language data.

Target

The goal of this paper is to give language models the ability to generate something like a chain of thought—a coherent sequence of intermediate reasoning steps leading to a final answer to a question.

Contribution
  1. First, in principle, chains of thought allow models to decompose multi-step problems into intermediate steps, meaning that additional computation can be allocated to problems that require more inference steps.
  2. Second, chains of thought provide an interpretable window into the behavior of a model, indicating how it might have arrived at a particular answer, and provide an opportunity to debug where inference paths went wrong (although fully characterizing the model computations that support an answer is still a challenge. unresolved issues).
  3. Third, thought-chain reasoning can be used for tasks such as mathematical word problems, commonsense reasoning, and symbol manipulation, and may (at least in principle) be applicable to any task that humans can solve through language.
  4. Finally, in sufficiently large off-the-shelf language models, it is easy to elicit thought-chain inference simply by including examples of thought-chain sequences as examples of few-shot cues.

3. Experiment

In order to verify that the method of prompting the thinking chain can greatly improve the reasoning ability of the large model, this paper conducts experiments on arithmetic reasoning, common sense reasoning, and symbolic reasoning.

3.1 Arithmetic Reasoning

When used with a 540B parametric language model, thought-chain hints perform on par with task-specific fine-tuned models on multiple tasks, even reaching state-of-the-art on the challenging GSM8K Benchmark test.

insert image description here

3.2 Common sense reasoning

Although thought chains are particularly applicable to mathematical word problems, the language-based nature of thought chains actually makes them applicable to a broad class of commonsense reasoning problems that involve reasoning about physical and human interactions under the assumption of general background knowledge . Commonsense reasoning is the key to interacting with the world, and current natural language understanding systems are still unable to perform commonsense reasoning.

insert image description here

3.3 Symbolic Reasoning

Our final experimental evaluation considers symbolic reasoning, which is simple for humans but can be difficult for language models. We find that thought-chain cues not only enable language models to perform symbolic reasoning tasks that are challenging in the standard cued setting, but also facilitate generalization to lengths of inference-time inputs longer than seen in few-shot examples.

  • The last letter is concatenated.

    This task requires the model to concatenate the last letters of words in names (eg, "Amy Brown" → "yn"). This is a more challenging version of concatenation of initials that language models can already perform without chains of thought. 3 We generate full names by randomly concatenating names from the top thousand names from census data.

  • Coin flip.

    This task asks the model to answer whether the coin is still up after people flip it or not (e.g., "The coin is up. Phoebe flipped the coin. Osvaldo didn't flip the coin. Is the coin still up?" → "No").

    insert image description here

4. Discussion

4.1 Experimental summary

We have explored thought-chain cues as a simple mechanism to elicit multi-step reasoning behavior in large language models.

We first see that thought chain hinting greatly improves performance in arithmetic reasoning, yields stronger improvements than ablation, and is robust to different annotators, examples, and language models. Experiments on commonsense reasoning highlight how the linguistic nature of thought-chain reasoning makes it universally applicable. Finally, we show that for symbolic reasoning, thought-chain cues help OOD generalize to longer sequence lengths.

4.2 Limitations
  1. We first demonstrate that although thought chains mimic the thought processes of human reasoners, this does not answer the open question of whether neural networks are actually "reasoning".
  2. Although the cost of manually augmenting examples with thought chains is minimal in the snapshot-less setting, this annotation cost may not be fine-tuned (although this may be overcome with synthetic data generation or zero-shot generalization).
  3. There is no guarantee of correct reasoning paths, which may lead to both correct and incorrect answers; improving fact generation from language models is an open direction for future work.
  4. Thought-chain reasoning that occurs only within the scope of large models makes it costly to serve in real-world applications; further research could explore how reasoning can be induced in smaller models.

V. Summary

We have explored thought chain hints as a simple and broadly applicable method to enhance reasoning in language models. Through experiments on arithmetic, symbolic, and commonsense reasoning, we find that thought-chain reasoning is an emerging property of model scale that allows sufficiently large language models to perform reasoning tasks with flat scale curves. Broadening the range of reasoning tasks that language models can perform is expected to motivate further research into language-based reasoning methods.

Guess you like

Origin blog.csdn.net/be_humble/article/details/130061633
Recommended