Seventy years of NLP! Chris Manning’s long article [with access address]: Can the basic model become AGI in ten years?

Source | Xinzhiyuan ID | AI-era

In the past ten years, relying only on simple neural network calculations and large-scale training data support, considerable breakthroughs have been made in the field of natural language processing. Pre-trained language models obtained from this training, such as BERT, GPT-3 and other models, have Provides powerful general language understanding, generation and reasoning capabilities.

Some time ago, Christopher D. Manning, a professor at Stanford University, published a paper on "Human Language Understanding and Reasoning" in the Daedalus journal. He mainly reviewed the development history of natural language processing and analyzed the future development prospects of the basic model.

picture

Paper access link: https://direct.mit.edu/daed/article/151/2/127/110621/Human-Language-Understanding-amp-Reasoning

picture

The author of the paper, Christopher Manning, is a professor of computer science and linguistics at Stanford University and a leader in applying deep learning to natural language processing. His research focuses on using machine learning methods to deal with computational linguistics problems so that computers can intelligently process, understand and Generate human language.

Professor Manning is an ACM Fellow, AAAI Fellow and ACL Fellow. Many of his books, such as "Fundamentals of Statistical Natural Language Processing" and "Introduction to Information Retrieval", have become classic textbooks. His course Stanford CS224n "Deep Learning Natural Language Processing" It is a must-read for countless NLPers to get started.

The four eras of NLP

The first era (1950-1969)

The research of NLP first started with the research of machine translation. At that time, people believed that the translation task could continue to develop based on the achievements of code breaking during World War II. Both sides of the Cold War were also developing systems that could translate the scientific results of other countries. However, during this period , almost nothing is known about the structure of natural language, artificial intelligence, or machine learning.

There was very little computational effort and little available data, and although the initial systems were promoted with great fanfare, these systems only provided word-level translation lookups and some simple, rule-based mechanisms for handling the inflectional forms of words (morphology) and word order.

The second era (1970-1992)

This period saw the development of a series of NLP demonstration systems that demonstrated sophistication and depth in handling phenomena such as syntax and citation in natural language, including Terry Winograd's SHRDLU, Bill Woods' LUNAR, Roger Schank's SAM, Gary Hendrix's LIFER and Danny Bobrow's GUS were both hand-built, rules-based systems that could even be used for tasks such as database queries.

Linguistics and knowledge-based artificial intelligence are advancing rapidly, and the second decade of this era has given rise to a new generation of hand-built systems with clear boundaries between declarative language knowledge and procedural processing, and with the benefits of language The development of academic theory.

The third era (1993-2012)

During this period, the number of available digital texts increased significantly, and the development of NLP gradually shifted to in-depth language understanding, extracting information such as location and metaphorical concepts from tens of millions of words of text. However, it was still only based on word analysis, so most of the Researchers mainly focus on annotated language resources, such as the meaning of tagged words, company names, tree banks, etc., and then use supervised machine learning techniques to build models.

The fourth era (2013-present)

Deep learning or artificial neural network methods began to develop, which can model context over long distances. Words and sentences are represented by real-valued vector spaces of hundreds or thousands of dimensions. Distances in vector spaces can represent similarities in meaning or grammar. degree, but the execution of tasks is still similar to the previous supervised learning.

In 2018, very large-scale self-supervised neural network learning achieved great success. It can simply input a large amount of text (billions of words) to learn knowledge. The basic idea is to continuously predict "given the first few words" Next word, repeat predictions billions of times and learn from errors, which can then be used for question answering or text classification tasks.

The impact of pre-trained self-supervised methods is revolutionary, producing a powerful model without the need for human annotation that can be used for a variety of natural language tasks with subsequent simple fine-tuning.

Model architecture

Since 2018, the main neural network model used in NLP applications has been transformed into Transformer neural network. The core idea is the attention mechanism. The representation of a word is calculated as a weighted combination of word representations from other positions.

A common self-supervised goal of Transofrmer is to mask the words appearing in the text, compare the query, key and value vectors at that position with other words, calculate the attention weight and weight the average, and then pass the fully connected layer and normalization Layers and residuals are connected to generate new word vectors, and then repeated many times to increase the depth of the network.

picture

Although the network structure of Transformer does not seem complicated and the calculations involved are simple, if the number of model parameters is large enough and a large amount of data is used for training and prediction, the model can discover most structures of natural language, including syntax. Structure, connotation of words, factual knowledge, etc.

prompt generation

From 2018 to 2020, the main way researchers used large pre-trained language models (LPLM) was to fine-tune them with a small amount of annotated data to make them suitable for custom tasks.

However, after the release of GPT-3 (Generative Pre-training Transformer-3), researchers were surprised to find that the model could complete well even on new tasks that had not been trained by just inputting a prompt.

In contrast, traditional NLP models are assembled in a pipeline manner from multiple carefully designed components. They first capture the sentence structure and low-level entities of the text, and then identify higher-level meanings before inputting them into certain specific fields. in the execution component.

Over the past few years, companies have begun replacing this traditional NLP solution with LPLM, fine-tuned to perform specific tasks.

machine translation

Early machine translation systems could only cover limited language structures in limited domains.

Google Translate, launched in 2006, built a statistical model from large-scale parallel corpus for the first time; in 2016, Google Translate was converted to a neural machine translation system, and the quality was greatly improved; in 2020, it was updated again to a neural translation system based on Transformer, which no longer requires two Instead of using parallel corpus of different languages, a huge pre-trained network is used to translate the language type through a special token.

Question and Answer Task

The question and answer system needs to find relevant information in a text collection and then provide answers to specific questions. There are many direct commercial application scenarios downstream, such as pre-sales and after-sales customer support.

Modern neural network question answering systems have high accuracy in extracting answers that exist in text, and are also quite good at classifying texts that do not have answers.

Classification tasks

For common traditional NLP tasks, such as identifying people or organization names in a piece of text, or classifying sentiment (positive or negative) about a product in a text, the best systems currently are still fine-tuned based on LPLM.

text generation

In addition to many creative uses, generative systems can also write formulaic news articles, such as sports reports, automated summaries, etc., and can also generate reports based on radiologists' test results.

But while it works well, researchers still wonder whether these systems actually understand what they're doing, or are just meaningless, complex rewrites.

meaning

Linguistics, philosophy of language and programming languages ​​all study methods of describing meaning, namely denotational semantics or heory of reference: the meaning of a word, phrase or sentence is the meaning of the world it describes. A set of objects or situations (or their mathematical abstraction).

The simple distributional semantics of modern NLP believes that the meaning of a word is just a description of its context. Manning believes that meaning arises from understanding the network of connections between language forms and other things. If it is dense enough, the language form can be well understood. significance.

The success of LPLM on language understanding tasks, and the broad prospects for extending large-scale self-supervised learning to other data modalities such as vision, robotics, knowledge graphs, bioinformatics, and multimodal data, make AI more general .

base model

In addition to early basic models such as BERT and GPT-3, language models can also be connected with knowledge graph neural networks, structured data, or other sensory data can be obtained to achieve multi-modal learning, such as the DALL-E model. After self-supervised learning of a corpus of paired images and texts, the meaning of the new text can be expressed by generating corresponding pictures.

We are still in the early days of basic model development, but in the future most information processing and analysis tasks, even tasks like robot control, can be handled by relatively few basic models.

Although the training of large base models is expensive and time-consuming, once the training is completed, it is quite easy to adapt it to different tasks, and the output of the model can be adjusted directly using natural language.

But this approach also has risks:

1. The power and influence enjoyed by institutions capable of training basic models may be too great;

2. A large number of end users may suffer from bias during model training;

3. Because the model and its training data are very large, it is difficult to judge whether it is safe to use the model in a specific environment.

Although these models can ultimately only vaguely understand the world and lack human-level careful logical or causal reasoning capabilities, the broad validity of the basic models also means that there are many scenarios that can be applied, and they may be developed into real models within the next decade. General artificial intelligence.

References:

https://direct.mit.edu/daed/article/151/2/127/110621/Human-Language-Understanding-amp-Reasoning

Guess you like

Origin blog.csdn.net/lqfarmer/article/details/132765997