I visualized the learning trajectory of GPT, which is very similar to human beings|ACL2023

8d15dc752af1a96836519a670b26d3c0.jpeg
Think back to how we acquire a language as children? Generally speaking, in the first year of human babies' birth, babies can only imitate some "phonemes" and say some of the simplest and basic words or sentences . At the age of one year old, babies can start to master and say some of the most basic syntactic structures , and start to splice the initial imitative fragmented words into a sentence, such as "The boy sang", "The boy fell", and then Children will gradually learn more complex nested syntactic structures when they get older , such as "The boy that I saw sang". Although this time classification is not accurate, the order of children's learning stages can be roughly described in this way.

bdbc081518cd2ee13f688ff3c006ea1c.jpeg


Recently, researchers from Meta AI, Paris University of Arts and Sciences and Paris-Saclay University have discovered an interesting phenomenon. The order in which the GPT model learns languages ​​is very similar to the order in which human children learn languages, following a From easy to difficult and from shallow to deep, we often learn simple expressions first and then compose complex long sentences . The similarity between the GPT as a statistical model and the language acquisition of human children will help people to conduct a combined analysis of the two and draw more interesting conclusions.

Essay topic:

Language acquisition: do children and language models follow similar

Paper link:

https://arxiv.org/pdf/2306.03586.pdf

The learning of language skills can be described by two models of "sequential" and "parallel". Sequential learning means that the learning of complex skills will not start until the simple skills are fully mastered, while parallel learning means that the learning of simple and complex skills is parallel. can be learned simultaneously . The difference between sequential and parallel performance is shown in the following figure:

1d68ee1eca33df6382a804be011e8c94.jpeg

By citing previous studies on the classification of children's language acquisition stages, this article divides children's language acquisition into three stages, which are the initial simple sentence stage to more complex sentences guided by What, How, etc., and finally To the more complex Why-introduced sentences and relative clauses, etc.:

8a4cfa6efe93a2c1331b1cf7b7336db3.jpeg

Based on the three stages of the above classification, the author selects a set of language probes (Linguistic Probes) for each stage as the "stage ability test" as shown in the figure below :

bfcbef339335dccc41653f562da11988.jpeg

Specific to training execution, the author's main idea is to train 48 GPT-2 models from scratch, evaluate the model after every 100 training, and observe the "language ability" of these 48 GPT-2 models. As for how to evaluate the abstract concept of so-called language ability, the author team selected 96 language probes from three open source test benchmarks BLIMP, Zorro and BIG-Bench for the different language skills of the language model that they want to evaluate. 2 Conducted a language test , using the output of the Softmax layer to compare the overall proportion of grammatical and ungrammatical sentences to evaluate whether the model has mastered the language ability represented by the current language probe. At the same time, in order not to lose the generality of the test, the author conducted different tests on the language acquisition rate data obtained by 48 GPT-2 models to verify that the order of acquiring these language skills is consistent among all GPT-2 models. is shared.

The system trajectory results of language ability learning are finally obtained as shown in the figure below:

83fd45c129c1e6e7972d5e75a20dc9fa.jpeg

From the right column of the figure above, it can be clearly seen that the acquisition time of skills is directly related to the three stages of language skills . The acquisition time of advanced stage skills is longer. The model is similar to that of human children with a systematic learning trajectory from easy to difficult. However, by dividing the 64 language probes into early, middle and late three groups according to the skill acquisition time, and comparing the changes in the accuracy rate within the group as the training rounds increase, as shown in the figure below, you can see the three groups There is an obvious process of improvement from the beginning of training, which shows that the learning trajectory of GPT-2 is actually parallel, but from the perspective of learning rate, the learning rates of the three groups are significantly different. The early group learning rate faster while the late group was relatively slower.

b9569e25d170ef4559067d7ae006a044.jpeg

And then compare the training trajectory of the GPT-2 model with the behavior of human children. It can be observed that the learning order of Children roughly matches the learning order of GPT-2. It seems that the model and children acquire language skills in a similar order. The result As shown below:

421dd2c34793302ab5db3afcb99b8d1c.jpeg

Summary and Discussion

As a "statistical model", it is undeniable that the time of these language learning is related to the frequency of occurrence of linguistic phenomena in natural language, so it seems that this learning strategy from easy to difficult is directly related to the 28 rule of the model training data . And the learning process of GPT-2 shows that some phenomena may not be consistent with some linguistic intuitions. For example, when using the "Simple" probe to check the consistency of the subject and predicate in simple sentences and using the "Wh Questions Subject Gap Long Distance" probe, Intuitively, it is much easier to judge subject-verb agreement than to calculate the distance between the problem and the subject of the problem, but the learning time is similar between the two. At the same time, recall that the training goal of the unsupervised pre-training of the GPT model is not very consistent with the goal orientation of children learning to "speak" , although they showed a similar learning sequence in the experiment.

But if you think a little deeper, the similarities and differences between GPT-2 as a statistical model and children as "human intelligence" in learning language are very similar to a long-standing debate in linguistics, that is, language learning Whether it comes from the continuous input of acquired experience corpus, or what Chomsky said, human beings are born with a "language structure". Language acquisition essentially depends on this innate structure rather than a large amount of acquired training . Through the study of the language acquisition process of the GPT model that seems to have mastered the language skills in a general sense, it may help us discover what makes humans learn language extremely quickly and at low cost, but the model needs to be established The reason why it can only be realized on a huge number of parameters. In general, finding the similarity between the model's language acquisition and human's language acquisition may not only help us analyze human language acquisition, but also help us improve the model's performance by using this similarity. Acquisition has very important reference significance .


Guess you like

Origin blog.csdn.net/zhaomengsen/article/details/131350599