Latest: The impact of the ChatGPT large model on economic research

text

Regarding the content of the text below, the author : Chen Siyu, Educational Economics, Southwest University, email: [email protected]

Anton Korinek, Language Models and Cognitive Automation for Economic Research. NBER, 2023.

Large language models (LLMs) such as ChatGPT have the potential to revolutionize research in economics and other disciplines. I describe 25 use cases along six domains in which LLMs are starting to become useful as both research assistants and tutors: ideation, writing, background research, data analysis, coding, and mathematical derivations. I provide general instructions and demonstrate specific examples for how to take advantage of each of these, classifying the LLM capabilities from experimental to highly useful. I hypothesize that ongoing advances will improve the performance of LLMs across all of these domains, and that economic researchers who take advantage of LLMs to automate micro tasks will become significantly more productive. Finally, I speculate on the longer-term implications of cognitive automation via LLMs for economic research.

" Language Models and Cognitive Automation for Economics Research"

Table of contents

1. Summary

This paper divides the functions of large-scale language models into six areas: ideation, writing, background research, data analysis, programming, and mathematical derivation. 25 cases are used to explain the role and practicality of large-scale language models LLMs in these six areas. This paper divides the role of LLMs into 3 levels: experimental, useful, and very useful. This paper hypothesizes that the continued development of LLMs will improve the performance of LLMs in the above six domains, enabling economic researchers who use LLMs to automate micro-tasks to greatly increase productivity. Finally, this paper predicts the long-term significance of cognitive automation by studying LLMs in economics.

2. Introduction

1. Research Background

Recent advances in large-scale LLMs could revolutionize research in economics and other disciplines. LLM has just broken through the bottleneck and has become useful in a wide range of cognitive tasks. On November 28, 2022, the GPT3.5 model released by Open AI has gained more than 100 million customers within two months of its release, and GPT3.5 can 14 days to produce a volume of text equivalent to a human printed work. (Thompson). Google and Microsoft are planning to give users access to LLM.

2. Research purpose

1) Improve productivity

This article illustrates 25 examples of modern LLM models based on recent research. After the author's experiments, this paper divides the functions of LLM into six types: ideation, writing, background research, data analysis, coding and mathematical derivation. This article provides instructions for using these features and demonstrates them using concrete examples. Additionally, this article attempts to classify each LLM feature from experimental to very useful. (See Table 1 on page 29). This paper hopes that this description will help other researchers to use the capabilities of LLM. Currently, the paper considers the most useful tool to be the LLM, which automates small "microtasks" that researchers engage with many times throughout the day, but are too small to be assigned to human research assistants. LLM is suitable for such tasks due to its high speed and low transaction costs. Additionally, the LLM helps with coding and data analysis tasks as well as ideation and writing. Researchers can significantly increase their productivity by incorporating LLM into their workflow.

2) Predict the function of future LLM

Studying the current function of LLMs can predict the functions of future generations of LLMs. In recent years, the amount of computation (computing power) used to train cutting-edge LLMs has doubled on average every six months, bringing about rapid growth in LLM capabilities. It is widely expected that these advances will continue and that more robust LLM systems will be released soon. In the longer term, this paper hypothesizes that LLM may enter an era of cognitive automation, which could have profound implications for scientific progress in economics and other disciplines. Moreover, this cognitive automation may also have clear implications for the value of cognitive labor.

3. Literature review

1) Underestimate the function of LLM

Bender considers LLMs as "random parrots" (Bender et al., 2021) or "advanced autocompletion". According to Thompson’s research, “a former president of Mensa International reported that ChatGPT achieved a score of 147 on the language IQ test.” Although human intelligence levels are relatively static, LLM is developing rapidly, and each new iteration changes it. more accurate and powerful.

2) Overestimate the function of LLM

Other researchers believe that ChatGPT is artificial general intelligence (AGI), that is, artificial intelligence. LLM can produce authoritative-style text when the content is completely wrong, which can trick readers into believing false content.

This paper argues that LLMs increasingly have a comparative advantage in generating content; humans currently have a comparative advantage in evaluating and discriminating content. LLMs are also superhuman in handling large amounts of text. These features can facilitate human-machine cooperation.

Three, large language model LLM

1. Basic model

Bommasani sees LLM as a class of fundamental models that can be seen as a new paradigm for artificial intelligence in the 2020s. The base models are large deep learning models with parameter counts on the order of 1011 and growing. Researchers pre-train models on rich data to create a foundation, which they then adapt to different applications through a process called fine-tuning. For example, fine-tuning an LLM to act as a chatbot (like ChatGPT) or a system for generating computer code (like Codex). OpenAI's GPT-3.5, DeepMind's Chinchilla, Google's PaLM and LaMDA, and Anthropic's Claude are some cutting-edge LLMs.

The pre-training of the basic model uses a lot of computation and data in the process of self-supervised learning, and the model learns the inherent structure in the training data by continuously predicting the masked data. For example, to train an LLM, researchers feed the model snippets of text containing masked words, and the model learns to predict what the missing words are. These data come from Wikipedia, scientific articles, books and other online sources.

2. Scaling Laws proposed by OpenAI

The base model and extended LLM differ from previous generations of deep learning models in that the latest generation of LLM narrows the gap between the broad capabilities of humans and the capabilities of specific AI systems. The overall performance of LLM is improved according to predictable scaling laws, which are empirical laws of several generations of machine learning models. Kaplan argues that the scaling law observes that the goodness-of-fit of an LLM, as measured by log loss, increases with "training computation," which is the number of computations performed to train the model, and the parameter count and size. Hoffmann argues that proportionally increasing the scaling parameter count and the size of the LLM training data is optimal.

For scaling rules, see the paper Training Compute-Optimal Large Language Models

Fourth, the application of LLM

This section presents examples of the use of the LLM in economic research, organized into six areas: ideation, writing, background research, coding, data analysis, and mathematical derivation. For each domain, this paper provides a general explanation and some specific use cases to illustrate how to leverage the capabilities of LLM. Unless otherwise noted, this paper uses the current leading publicly available system, GPT-3 (text-davinci-003), which is slightly more powerful than ChatGPT but generates similar output. To maximize reproducibility, we set the "Temperature" parameter of the model to 0, which makes the response provided by the system deterministic. The system was trained on data as of 2021, and without access to the internet, the generated text is based entirely on the parameters obtained during the training process. Furthermore, the system has no memory and information cannot be carried over from one stage to the next. The amount of text that the system can handle is less than 4000 characters, which equates to about 3000 words. The results generated by the LLM vary according to the prompt. Small changes in hints, such as different spacing or punctuation, can also result in completely different output.

Common to all the applications presented in this paper is the fast response time and low transaction cost of LLM, which makes LLM very useful in outsourcing microtasks, even though in some tasks, LLM is error-prone.

1. Idea

1) Brainstorming

Researchers train LLMs based on large amounts of data across the cross-section of human knowledge, so LLMs are very useful in brainstorming ideas and examples related to a defined topic.

This article asks LLM “Please brainstorm the economic channels through which advances in artificial intelligence could exacerbate inequality”. The channels listed by the LLM model are not innovative, but are relevant, largely plausible, and broader than this article contemplates. After point 5, the paper observes that, similar to the trend in human brainstorming, LLM creativity declines and responses are repeated.

2) Evaluate ideas

The LLM can also evaluate different ideas, in particular providing the strengths and weaknesses of different research programmes. Type in this text "I'm writing a paper on the impact of AI on inequality. Which would you find more useful, a paper on how AI can increase inequality or a paper on how AI can reduce inequality?"

Responses from the LLM model suggest that studying how AI increases inequality is more useful for positive jobs, whereas studying how it reduces inequality is more useful for normative jobs.

3) Provide counter-evidence

LLM is good at presenting arguments and counterarguments in support of a given point of view. This helps eliminate confirmation bias that is common in our human brains. The case for this paper is "AI will exacerbate inequality. What are the main objections?"

Some of the counterarguments given by the LLM are good, some are bad, but the output of the LLM covers the main points known to the author.

2. Writing

The core competency of LLM is text generation. LLMs are quite useful for many writing-related tasks, this includes synthesizing text based on gist, changing text style, editing text, evaluating style, generating headlines, generating headlines and tweets.

1) Synthetic text

LLM can translate rough bullet points into well-structured, legible sentences. This article uses the following example “Please write a topic sentence that integrates the following arguments and goes beyond the paragraphs.” The ability to synthesize text allows researchers to focus on the ideas in the essay rather than the writing process itself. LLM can also write in a specific style. For example, the above example would make a difference if we added "write in an academic style", "colloquial style", "in a style that non-economists can understand" or "in the style of your favorite politician". different. LLM can also write text in LaTeX format.

2) Edit text

LLM can correct grammatical or spelling errors in text, change style, and improve clarity and conciseness. This set of features is most useful for non-native speakers who want to improve their writing. In the examples below, each word with errors and the corrections made by the system are highlighted in boldface. This article uses the command "Can you correct the following sentence?"

The LLM can interpret its edits so students can learn from them. LLM can also convert texts according to different levels of readers so that readers of different levels can understand them. For example, use "Rewrite the text below so an 8-year-old can understand it."

3) Evaluate the text

LLM can also assess text style, clarity, similarity. But the following questions about the draft abstract of this paper, the answers given by LLM are not ideal.

"What are the major stylistic shortcomings of the following passage?" "Could you rewrite this passage to correct these shortcomings?" The author agrees with all of the shortcomings identified by LLM. The rewritten version mitigates some of these shortcomings, but it doesn't do it perfectly. The system can also answer questions like "Which of the following arguments is the hardest to understand?"

4) Generate title and outline

LLM can generate catchy titles or outlines, the example below is based on the 3 titles given by the author's abstract of a recent paper (Korinek and Juelfs, 2022)

1. Future-proofing societies: Preparing for the decline of autonomous machines and the workforce 2. The end of work? Navigating the impact of automated machines on the workforce 3. The impact of autonomous machines on the workforce: How to distribute work and income All three LLM-suggested titles are well suited for the paper.

5) Generate Tweets

LLM can generate tweets, and in our experiment we command LLM to write 5 tweets to summarize a piece of text.

3. Background research

1) Summarize the text

The experiment in this article is to summarize a piece of text into a sentence. "Large language models (LLMs) have the potential to revolutionize economic research. This article describes six areas in which LLMs are useful: ideation, contextual research, writing, data analysis, coding, and mathematical derivation. I provide a general description and demonstrate how These domains are exploited and LLM features are categorized as experimental, useful, and very useful. I hypothesize that continued progress will improve the performance of LLM in these domains and the economics of using LLM to automate micro-tasks Humans will greatly increase productivity. Finally, I speculate on the long-term impact of cognitive automation on economic research through LLM. This paper discusses the potential of large language models by illustrating and demonstrating the LLM case." The concluding sentence provided by LLM covers all the points.

2) Literature review

LLM is of limited use in literature retrieval, LLM may retrieve papers that do not exist, more accurate tools can be found on the website https://elicit.org:. This site only reports existing papers.

3) Formatting references

The case of this article is "Please convert the following references to bibtex format"" Now format it in APA format. If LLM encounters highly cited works in the training data set, such as "bibtex reference For stiglitz weiss" LLM can do well works so that users don't have to copy or enter detailed citation information for related works. However, when the system generates bibtex references for works with low citation counts, it blatantly falsifies article and citation information.

4) Translate text

Jiao et al. (2023) argue that LLMs such as ChatGPT are competitive with commercial translation products on resource-rich European languages. But on languages ​​with fewer resources, fewer digitized texts and fewer digitized translations, their performance is worse.

5) Explanation Theory

The LLM can act as a tutor, explaining many common economic concepts very clearly, which can be very useful for students trying to learn something new, or even for more advanced researchers in specialized fields. The case in this article is to explain "Why are instrumental variables useful?" LLM explains instrumental variables very well, but answers the first and second theorems of welfare economics very poorly.

4. Programming

GPT3.5 has been trained with a lot of computer code, so it is quite powerful in encoding. OpenAI's Codex can be accessed through the code-davinci-002 model, or integrated into GitHub as Copilot. The language model text-davinci-003 is a descendant of code-davinci-002, so it is not only capable of generating natural language, but also computer code. While the two programming languages ​​the LLM is most proficient in are python and R, it works in any common programming language - from basic Excel functions to complex C/C++ code.

1) Write code

The case is Python code to calculate Fibonacci numbers. Another example of this system is drawing graphics. This article modifies the prompt above to read "Python code to calculate Fibonacci numbers and plot the first 10 numbers and compare to an exponential curve". The result worked fine.

But currently publicly available LLMs are not powerful enough to write complete codes to simulate most economic problems without human assistance—for example, basic economic problems such as optimal consumption smoothing, optimal monopoly pricing, etc.

2) Explain the code

LLM can also explain in plain English what a given code does, which is especially useful when researchers are working with unfamiliar programming languages.

3) Translate the code

Often, code in one programming language needs to be translated into another language, for example, we can port a project to another platform, LLM can modify code snippets found on online coding forums such as StackExchange that are useful but in the wrong language.

Current LLMs are pretty solid at translating shortcodes across most common languages. For longer sequences, LLM still requires human assistance.

4) Fix bugs

The case used in this article is "What is wrong with the following code?"

LLM is very useful for spotting spelling mistakes and violations of basic grammar. It also has some functions beyond this scope, for example, LLM can also be applied when the index is confused. For high-level errors, such as errors in the underlying algorithm of the code, LLM still requires manual debugging.

5. Data Analysis

1) Extract data from text

This paper experiments with how LLM can extract data from written text. "Mark got an A in economics and a B+ in math. Sally got an A- in both economics and math. Frank got a B in economics and a C in math. Reformatted as follows: name and grade in economics and grade in math."

LLM can also extract the following 10 types of data: phone number, zip code, social security number, credit card number, bank account number, date, time, price, percentage, measurement (length, weight, etc.).

2) Organize data by format

LLM can also convert the data into the required format. Building on the previous example, LLM first formats the data in comma-separated values ​​(CSV) format and then formats it as a LaTeX table.

3) Text classification

This paper asks GPT3.5 to classify five tasks from the U.S. Department of Labor's Occupational Information Network (O*NET) database as easy or hard to automate, and justifies its classification. Experimental results demonstrate that the classification is reasonable, but not entirely robust. When the prompt word was changed, the system's answer to essentially the same question also changed.

4) Extract ideas

LLM can also extract emotional attitudes from text. For example, LLM can classify tweets as "positive" or "negative". Likewise, the LLM can classify the statement by the Federal Open Market Committee (FOMC) to set U.S. interest rates. Is the statement hawkish or dovish? The statement is hawkish. The Committee raised the target range for the federal funds rate and reduced its holdings of Treasuries, agency debt, and agency mortgage-backed securities. The Committee is also firmly committed to returning inflation to its 2 percent objective. The system's assessment is correct, and with good reason.

5) Simulate human subjects

Argyle et al. (2022) observed that the training data of LLM contains a lot of information about humans, and he proposed to use LLM to simulate human subjects. He set GPT3 on the socio-demographic context of real humans, demonstrating that subsequent responses to survey questions were highly correlated with the actual responses of humans with the described context. Horton (2022) demonstrates the application of economics by replicating and extending several behavioral experiments using simulated test subjects.

6. Mathematical derivation

Noorbakhsh et al. 2021) showed that LLMs can be fine-tuned for mathematical tasks. Frieder et al. (2023) developed a dataset of graduate-level mathematics problems and concluded that the mathematics ability of ChatGPT was significantly lower than that of ordinary mathematics graduate students. Current LLMs are mostly trained through text and mathematical papers.

1) modeling

This paper uses an LLM to describe a consumer's spending power isoelastic for one commodity and linear for another commodity, written in LaTeX, and assigning variables to prices. Based on this hint, the LLM knows to continue processing the consumer's optimization problem. The left column shows the generated text, the right column shows the version compiled by LaTeX. It can be seen that the LLM correctly populates the suitable budget constraints and establishes the associated maximization problem. In the Lagrange equations, the system contains budget constraints with unusual signs. It correctly derived two of the three first-order conditions - but got it wrong on the derivative of the isoelastic utility function. While it takes time to read the generated text and spot errors, LLM automatically writes out maximization and Lagrangian problems, and solves partial maximization problems in seconds, still helping to save valuable research time.

2) Derivation equation

As the previous examples show, LLMs currently have a somewhat limited ability to derive equations. In fact, continuing the example above, this update corrects the error in the first-order condition and requires the system to generate the remainder. The system correctly deduces the solution of the optimization problem.

However, the mathematical capabilities of the system are still rather limited: after obtaining the correct solution, we fix the sign error in the Lagrangian and try to regenerate the rest of the derivation, but the system produces garbled characters. The author tried several other derivations and found that the error rate was too high for the system to function in this application.

3) Explain the model

Current LLMs are also limited in their ability to explain simple models. Here is an example that systematically explains the mathematics behind the famous bat and ball problem: Solve the bat and ball problem and state all intermediate steps. When the prompt is different, the result is different. LLMs often produced more reliable content when asked to demonstrate intermediate steps, known as “thought chain prompts”—similar to how students were less prone to errors when asked to explain the intermediate steps behind their reported solutions (Wei et al., 2022b).

V. Conclusion

The table below summarizes all the example tasks illustrated in this paper, categorized by the six domains of LLM application. In the third column of the table, the author's subjective evaluation of the usefulness of the described LLM functions as of February 1, 2023 is reported. The authors' ratings range from 1 to 3, where 1 describes the LLM's ability to be experimental but provides inconsistent results and requires rigorous human modification; 2 indicates useful and potentially time-saving capabilities, but may be inconsistent and requires careful attention Modification; 3 means already very functional and works as expected most of the time.

1 = Experimental; results have been inconsistent and require strict human oversight

2 = Helpful; needs supervision, but may save time

3 = very useful; incorporating these into your workflow will save time

LLM has emerged as a useful research tool for tasks such as ideation, writing and background research, data analysis, coding, and mathematical derivation.

In the short term, cognitive automation through LLM will allow researchers to become more productive. This may help increase the overall rate of progress in economics.

In the medium term, LLMs will become increasingly useful in generating the content of research papers. Human researchers will focus on their comparative advantage—by asking questions, suggesting directions to get answers, distinguishing which generated content is useful, and compiling information and providing feedback, similar to advisors. Advances in LLMs will mean that LLMs will get better and better at performing tasks, so there will be less need for humans to provide input, edits, and feedback.

In the long run, economists should learn the "hard lessons" of AI progress, which Sutton describes as: For most of AI's history, researchers have worked to make AI systems more efficient by programming domain-specific knowledge into them. Smarter and stronger, he observed, is a strategy that always helps in the short term, but its benefits eventually level off.

Given enough computing power, sufficiently advanced AI systems might generate and elucidate superior economic models, and the cognitive work of human economists might eventually become redundant.

Six, prospects

Cognitive automation poses new research questions for economists:

1. What does cognitive automation mean for the labor market? Will it also accelerate the automation of manual labor? How can our society best prepare for the coming changes?

2. What is the impact of cognitive automation on education? Will human capital be devalued?

3. How will cognitive automation affect technological progress and economic growth? If human labor can be automated, what will be the bottleneck for future growth?

4. How can we best solve the AI ​​alignment problem?

Guess you like

Origin blog.csdn.net/pony8181/article/details/130618742