[NeuIPS 2019] Professor Tang Jie from the Department of Computer Science at Tsinghua University interpreted the Yoshua Bengio report: How to use deep learning to implement System2?

Author: Tang Jie

Recently, Turing Award winner Yoshua Bengio brought a report entitled "From System 1 Deep Learning To System 2 Deep Learing" at NeuIPS 2019, which mentioned several very interesting points.

This article brings you a detailed explanation of Yoshua Bengio's report.

Click here to enter "Report Address"

12161.jpg

Yoshua believes that AI is an amazing advancement in this century. Is it enough to simply increase the data set, model size, and computer speed? In fact, AI is far from true artificial intelligence!

12162.jpg

Yoshua’s first point of view is that the human cognitive system contains two subsystems (this is the consensus view in cognitive theory) : System1 intuitive system, which is mainly responsible for rapid, unconscious, nonverbal cognition, which is currently Deep learning is mainly done; System2 is a logical analysis system, which is a conscious system with logic, planning, reasoning and verbal expression. This is the focus of future deep learning. Of course, Yoshua also mentioned that the computer is used as an agent to realize AI, which needs to be considered from the computer's point of view, such as better models and knowledge search.

12163.jpg

 

12164.jpg

 

12165.jpg

How to use deep learning to implement System2?

Yoshua believes that for computers, the most critical thing is to deal with changes in data distribution. For System 2, the basic elements include: attention and awareness. Attention has actually been implemented and discussed a lot in the current deep learning models, such as GAT (graph attention mechanism), etc.; consciousness is more difficult. In fact, the most important part of consciousness is the definition of the boundary. Yoshua mentioned that the consciousness prior can be realized by using the sparse factor graph model. This is an idea that realizes the causality. From the overall theoretical framework, we can consider meta-learning, localized change hypothesis, causal discovery, and finally the architecture can consider how to learn the operation of different objects.

For processing data distribution, traditional machine learning is based on IID (independent and identically distributed), which is the assumption of independent and identically distributed, but the actual situation is that the data we are interested in in many real scenarios is often data with very few occurrences. That is, what we need to pay attention to when processing is more OOD (out of distribution), that is, there is less distribution in the data. Of course, this requires us to have new data assumptions in the machine learning algorithm. Especially from the perspective of Agent, we need to consider which factors affect the change of data distribution, and how to generalize the current IID and OOD by methods such as the composability of different distributions. Compared with traditional symbolic AI systems, current AI needs more machine learning capabilities with generalization capabilities.

12166.jpg

 

12167.jpg

 

12168.jpg

 

12169.jpg

 

121610.jpg

Attention mechanism is an important technology developed in deep learning in recent years. In recent years, it has been widely used in many systems. Attention mechanism can be seen as the first step to realize consciousness. There is a top-down approach in the human brain. Attention and bottom-up attention.

From a cognitive perspective, consciousness is a very complex mechanism. Global Workspace Theory is a cognitive neurological theory proposed by Baars et al. in 1988. Its core idea is that consciousness content exists globally in various cognitive processes, including Attention, Evaluation, Memory and verbal report. These concepts sound a bit abstract. Later, Dehaene, Changeux and colleagues and others proposed an implementation model of Global Workspace Architecture. Global workspace theory is very similar to System2 introduced earlier. Other cognitive theories related to consciousness also include Multiple drafts theory, which is a theory proposed by Daniel Dennett in 1991.

The key to the combination of machine learning and consciousness model is how to realize consciousness in machine learning, or how consciousness-related theories/models can help machine learning. For example, you can construct some hypotheses based on the theory of consciousness, and then use machine learning methods to verify these hypotheses. Of course, looking at consciousness from a human point of view, high-level representation can be said to be language. This requires the organic combination of two human cognitive systems System1 and System2, which means that low-level representation and high-level decision-making are combined. stand up.

121611.png

 

121612.jpg

 

121613.jpg

Yoshua also mentioned preconsciousness/consciousness prior. Specifically, sparse factor graphs can be used. Sparse factor graphs are not a new thing. The basic idea is a unified model of graph models. The advantage of factor graphs is that both directed graphs and undirected graphs can be unified. The sparse factor graph can be used to learn the causal relationship between variables, thereby constructing the causal relationship between variables (find the true causal relationship, instead of giving a weight to different variables, which is why sparseness is considered).

121614.png

 

121615.jpg

 

121616.png

Meta-learning (learning to learn models) is a way to achieve rapid migration from machine learning to OOD and models. Speaking of OOD, the reason is that there is a change in behavior, or the intervention of user behavior on the data. The knowledge representation of meta-learning can effectively help overcome OOD, such as learning the causal relationship between variables through meta-migration. The challenge here is how to learn the causal characteristics of unknown intervention variables. The last is how to learn the possible operations of the sample, similar to automatic machine learning, but here is a different operation level of the object.

121617.jpg

 

121618.jpg

 

121619.jpg

 

121620.png

 

121621.png

 

121622.jpg

 

121623.png

 

121624.png

 

121626.png

 

121627.png

Click here to enter "Report Download"

Past review:

[NeurIPS100] Seven award-winning papers of NeurIPS2019 are announced and in-depth analysis of selected papers!

The TOP100 list of NeurIPS ten-year highly cited scholars is released! These big cows are worthy of worship!

[NeurIPS100] Aminer Participation Strategy: How to participate in the NeurIPS Conference with 13,000 people more efficiently?

Guess you like

Origin blog.csdn.net/AMiner2006/article/details/103562320