Articles by Gary Marcus and others from New York University: 11 Enlightenments of Human Thinking for AI

Articles by Gary Marcus and others from New York University: 11 Enlightenments of Human Thinking for AI

Author: Gary Marcus (professor emeritus at New York University), Ernest Davis (computer science professor at New York University)

原文:Insights for AI from the Human Mind,Communications of the ACM

Translator: Zhu Yanrui

Marvin Minsky wrote in " The Society of Mind " ( The Society of Mind ): "What trick can make humans smarter? The trick is—there is no trick. The power of wisdom comes from the complexity of humans, and It is not derived from a single perfect principle."

In recent years, artificial intelligence has defeated world champions in the fields of Go and Poker, and has made extraordinary progress in the fields of machine translation, object classification, and speech recognition. However, the focus of most AI systems is very limited. AlphaGo doesn't know that Go is a game of putting stones on the board. It doesn't even know what a "stone" is and what a "board" is. If the board of Go is not square but rectangular, it needs a completely different algorithm.

In order for AI to understand open text or control home robots, we need to explore further. Let it learn human thinking will be a good start, because human thinking far surpasses machines in terms of understanding and thinking ability.

To this end, this article provides 11 clues from cognitive science (psychology, linguistics and philosophy).

1. There is no panacea

From behaviorism to Bayesian reasoning to deep learning, we always propose simple theories to explain human intelligence. However, Chaz Firestone and Brian J. Scholl said, "The human mind does not only operate in one way, because it is never a single individual. The mind is composed of different parts, and each part has its own unique operating mode: For example, perceiving color is completely different from thinking about planning, and it is also very different from thinking about understanding text, adjusting body, remembering events, and experiencing emotions."

The human brain is extremely complex, with more than 150 brain regions and about 86 billion neurons. Neurons have hundreds of different types and trillions of synapses. Each synapse contains a large number of different proteins.

A truly flexible and intelligent system will be like a brain, full of complexity, and any theory that reduces the agent to a single principle is doomed to fail.

2. Rich internal representation

Cognitive psychology attaches importance to internal representations, such as beliefs, desires, and goals, as do classic AI systems. If AI wants to express the semantics of "President Kennedy's famous Berlin visit in 1963", it will add a series of representative facts, such as affiliation (Berlin, Germany) and visit relationship (Kennedy, Berlin, June 1963). By accumulating such representative facts, AI increases its own knowledge and experience, and its inferences are based on this foundation. Therefore, its inferences are trivial and not holistic.

Currently, deep learning technology is trying to circumvent this approach and instead uses a series of vectors to capture events in a vague manner, and does not directly and accurately represent semantics, such as subordination (Berlin, Germany) and visitation (Kennedy, Berlin, 1963) June). The focus of deep learning is on abstract reasoning, because it is not meant to represent precise factual knowledge from the beginning. But once the facts are blurred, it is difficult to reason correctly. The much-hyped GPT-3 system can explain this problem well. The related system BERT cannot reliably answer questions such as "If you put two trophies on the table and another one, how many trophies do you have?" The problem.

3. Abstraction and generalization

Most of what we know is abstract. For example, the relationship "A is B's sister" can represent many people. For example, Malia is Sasha's sister, Princess Anne is Prince Charles's sister, and so on. Although we don't know all the specific sibling relationships in the world, we know the meaning of "sister" and we can distinguish them based on specific circumstances. If two people have common parents, it can be inferred that they are siblings. If we know that Laura is the daughter of Charles and Caroline, and that Mary is their daughter, then we can infer that Mary and Laura are sisters.

The representation of cognitive models and common sense is constructed from abstract relationships and combined in complex structures. We can abstract almost anything: time ("10:35 PM"), space ("Arctic"), special events ("Abraham Lincoln assassination"), sociopolitical organizations ("U.S. State Department"), and theoretical structures (" Grammar”), and use them when explaining events or telling stories, simplifying the essence from complex situations, and having a huge effect on interpreting the world.

4. Highly structured cognitive system

Marvin Minsky believes that we should regard human cognition as a "thinking society", in which there are dozens or hundreds of different "subjects" or "brain regions", each of which specializes in different types. Task. For example, drinking a cup of tea requires the interaction of areas such as "grab zone", "balance zone", "thirsty zone", and "action zone". Many studies in evolutionary psychology and developmental psychology have shown that the mind is not a single whole, but is composed of many brain regions.

Many studies in evolutionary psychology and developmental psychology have shown that the mind is not a single whole, but is composed of many brain regions.

The irony is that the current research situation of machine learning is very different from the way of human thinking. It tends to use a single mechanism with simple internal structure to build end-to-end models. Nvidia's 2016 driving model is an example, which abandoned classic modules such as perception, prediction and decision-making. Instead, it uses a neural network model as a whole, and conducts algorithm training based on the input (image pixels) and output (direction, acceleration, etc. instructions) of the neural network.

Proponents of machine learning point out the advantages of the "jointness" of the entire system, without having to separate each module separately. Since it is so easy to own a large network, why bother to construct many individual modules?

The disadvantage of this system is that it is difficult to debug and does not have flexibility. Nvidia's system usually only works well in a few hours with the intervention of a human driver, not thousands of hours. Waymo's multi-module system can navigate from point A to point B and handle lane changes, while NVIDIA cannot change lanes.

When top AI researchers solve complex problems, they usually use hybrid systems. To win in Go, you need to combine deep learning, reinforcement learning, game tree search and Monte Carlo search. Watson’s victory in the Jeopardy! game, question-and-answer robots such as Siri and Alexa, and various Internet search engines all use the “kitchen sink” (kitchen sink, which can be understood as pursuing the pursuit of indulgence, using all available elements). The above method), and integrates many different kinds of methods. The study by Mao et al. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision also shows how an integrated system of deep learning and symbolic technology can produce good results in visual problem solving and image text retrieval. Many different hybrid systems are discussed.

5. There are many tools for simple tasks

Even in fine-grained situations, cognitive mechanisms usually consist of multiple mechanisms, such as verbs and their past tense. In English and many other languages, some verbs form the past tense through simple rules, such as "walk-walked, talk-talked, perambulate-perambulated", while others have irregular past tenses, such as "sing-sang , ring-rang, bring-brought, go-went". Based on children’s erroneous data, Gary Marcus and Steven Pinker once proposed a hybrid model, in which regular verbs are summarized by rules, and irregular verbs are organized through an association network. Even on such simple questions, there is structure.

6. Composition

In the words of the famous linguist Humboldt, the essence of language is to "express infinite meaning with limited words." With the help of limited brains and limited language data, we have managed to create a grammar that constructs a long sentence from smaller parts such as words and phrases so that we can express and understand countless words. For example, "Sailor loves this girl", we can expand it, "Maria imagines the sailor loves this girl", we can also expand it, "Chris wrote an article about how Maria imagined the sailor loves this girl" and so on, We can understand every sentence pattern.

On the contrary, neural network researcher Hinton has been arguing that the meaning of sentences should be encoded in what he calls "thought vectors". However, there is a subtle difference between the sentence itself and the meaning it expresses, which is difficult to capture. Although the system based on this can produce grammatical sentences, the system itself does not understand the meaning of the text it creates.

7. Top-down and bottom-up integration of information

image
Figure 1. Possible numbers or letters.

Is the image shown in Figure 1 letters or numbers? It may be, depending on the context (see Figure 2).

Cognitive psychologists distinguish between two types of knowledge, one is bottom-up, derived from our perception experience, and the other is top-down, derived from our perception of the transcendental world. know. Letters and numbers belong to two different categories, while words and numbers are composed of elements in these two categories. When we see the more complete image in Figure 2, one of the blurred images is letters in one context, and numbers in another context.

image
Figure 2. Context-sensitive explanation.

No matter what we see, we will put it in a specific cognitive model and add the overall understanding related to it.

8. Embedding concepts into theory

In a classic experiment, developmental psychologist Frank Keil asked the children, if a raccoon becomes a skunk after cosmetic surgery and has a very unpleasant smell, is it a skunk? The children think that it is still a raccoon, which probably stems from their biological concept, that what a creature is depends on its essence.

But when faced with some man-made products, the children's views changed, such as transforming a coffee pot into a bird feeder, and they recognized it.

The concepts in the theory are essential for effective learning. Suppose a preschooler sees a photo of an iguana for the first time. Not only can he quickly identify other photos of the iguana, he can also recognize the iguana in the video and the iguana in real life, and easily compare them with the kangaroo. differentiate. Similarly, children can infer from the zoology of iguanas eating and breathing that they will grow, reproduce and die.

Without the accumulation of facts, the theoretical system is scarce. To be successful, an agent needs to continuously embed facts in its theoretical system to enrich the overall theory and better organize the facts.

9. Causality

As Judea Pearl emphasized, a deep understanding of causality is a universal and essential link in human cognition. If the world is simple and we know everything well, perhaps the only causal relationship we need to understand is physics. We can determine what is affected by simulation: If I apply a force of XX micronewtons, what will happen next?

But this kind of detailed simulation is unrealistic: too many particles need to be tracked, the time of occurrence is too short, and our information is too inaccurate.

Instead, we often use approximate terms. We know that certain things are causally related, even if we don’t know why. We take aspirin because we know it can alleviate discomfort without needing to understand biochemistry. We know that sex can lead to pregnancy, even if we do not understand the specific mechanism of embryo production. Causal knowledge is everywhere, it is the basis of human activities.

10. Tracking individuals

In daily life, you will pay attention to various objective objects and track their attributes and history. For example, your spouse used to be a reporter, your car has a dent in the trunk, you changed the gearbox last year, etc. Our experience is composed of entities that change over time, and much of what we know is organized around these things and their history and characteristics.

Strangely, the deep learning system does not have this view. In most cases, current deep learning systems focus on learning general, category-level associations, rather than facts about specific individuals. It does not exist like a database that records time and changes, so it is difficult to track the changes between entities and their categories.

11. Innate knowledge

How much structure (and ability) does the brain have innate, and how much does it learn? The previous view that "innate and acquired" are mutually independent is wrong. Evidence from developmental psychology and developmental neuroscience believes that innate and acquired (innate and nurturing) work together.

Most machine learning researchers want to train algorithms from an absolute blank state, but this makes the research more difficult because they only focus on acquired training and ignore the inherent advantages. The most effective way is to combine the two. Humans may be born to understand that the world is made up of matter, they are continuous in time and space, and they are born with a perception of geometry and quantity, as well as the basis of an intuitive psychology.

The same is true for AI systems. It should not only learn from the association between pixels and actions, but should take the understanding of the world as the basis for all development work.

to sum up

The discovery of cognitive science has inspired us to build artificial intelligence based on the flexibility and universality of human thinking. Machines do not need to copy human thoughts, but a thorough understanding of human thoughts will contribute to major advancements in AI.

We believe that AI research should start from the core framework of human knowledge, such as time, space, causality, and the interrelationship between people and other things. These should be embedded in the system structure that can expand all kinds of knowledge, and adhere to the principles of abstraction, composition and focus on objective individuals.

We also need to develop powerful reasoning techniques to deal with complex, uncertain, and incomplete knowledge, which can balance top-down and bottom-up work, and link them with perception, manipulation, and language to build richness Cognitive model. The focus will be on building a human-inspired learning system that can stimulate all the knowledge and cognitive abilities that AI has. It can integrate new knowledge into its prior knowledge and can learn from various possible sources of information like a child. Learning methods include interacting with the world, interacting with people, reading, watching videos, receiving clear instruction, and so on.

This is a difficult task, but it must be done.

References:

  1. Brown, T.B. et al. Language models are few-shot learners. (2020); arXiv preprint arXiv:2005.14165

  2. Darwische, A. Human-level intelligence or animal-like abilities? Commun. ACM 61, 10 (Oct. 2018), 56–67.

  3. Devlin, J. et al. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-2019. (2019), 4171–4186.

  4. Firestone, C. and Scholl, B.J. Cognition does not affect perception: Evaluating the evidence for ‘top-down’ effects. Behavioral and Brain Sciences 39, e229. (2016.)

  5. Keil, F.C. Concepts, Kinds, and Cognitive Development. MIT Press, Cambridge, MA, 1992.

  6. Lupyan, G. and Clark, A. Words and the world: Predictive coding and the language=perception-cognition interface. Current Directions in Psychological Science 24, 4 (2015), 279–284.

  7. Marcus, G. Innateness, alphazero, and artificial intelligence. (2018); arXiv preprint arXiv:1801.05667).

  8. Marcus, G. Deep Understanding: The Next Challenge for AI. NeurIPS-2019 (2019).

  9. Marcus, G. GPT-2 and the nature of intelligence. The Gradient. (Jan. 25, 2020).

  10. Marcus, G. The next decade in AI: four steps towards robust artificial intelligence. (2020); arXiv preprint arXiv:2002.06177

  11. Marcus, G. and Davis, E. GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about. Technology Review (Aug. 22, 2020).

  12. Mao, J. et al. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv preprint arXiv:1904.12584.

  13. Murphy, G. The Big Book of Concepts. MIT Press, 2002.

  14. Pearl, J. and MacKenzie, D. The Book of Why: The New Science of Cause and Effect. Basic Books, New York, 2018.

  15. Spelke, E. Initial knowledge: Six suggestions. Cognition 50, 1–3 (1994), 431–445.

  16. van Harmelen, F., Lifschitz, V., and Porter, B., Eds. The Handbook of Knowledge Representation. Elsevier, Amsterdam, 2008.

Guess you like

Origin blog.csdn.net/shujushizhanpai/article/details/112578365