What are the first principles of artificial intelligence?

 Datawhale dry information 

Author: Guo Ping , Director of the Image Processing Research Center of Beijing Normal University

Source: Qingzhan Artificial Intelligence Research Institute, Turing Artificial Intelligence

This article is an article by Professor Guo Ping. This article uses the expression "Four Questions" to explain the first principles of artificial intelligence. This paper proposes an idea to use first-principles thinking to solve the lack of basic natural science knowledge in artificial intelligence in the field of basic research on physics-based artificial intelligence. It is also suggested that the principle of least action should be used as the first principle of artificial intelligence.

0358dd6496cee86e58018e8132947204.png

Achieving Artificial General Intelligence (AGI) is a long-term goal. We need to explore the path to artificial intelligence (AI), starting from basic research. "Basic research is the source of the entire scientific system and the general organization of all technical issues." This also illustrates the significance and importance of basic theoretical research on AI. To strengthen basic research on mathematical physics of AI, we can use “first principles” as the starting point to develop a new generation of basic AI theories.

Are there first principles in the field of artificial intelligence?

The ancient Greek philosopher Aristotle expressed the first principle (or first principle) as: "In every systematic exploration, there is a first principle. This is the most basic proposition or assumption that cannot be omitted. or deleted, it cannot be violated." Before the 20th century, first principles were mainly used in philosophy, mathematics, and theoretical physics. In mathematics, a first principle is one or several axioms that cannot be derived from any other axioms within the system. In theoretical physics, first principles refer to a calculation established directly from physical laws without making assumptions such as empirical models and fitting parameters. The first principle of biology is the theory of "natural selection and survival of the fittest" proposed by Darwin. In modern society, first principles have been extended to many disciplines, including life sciences, chemistry, economics, social sciences, etc.

With the development of human cognition, first principles have differentiated from the original philosophical terminology to more professional expressions. Some no longer use the term "first principles" but use its synonyms. In philosophy, "priori-principles" are used, in mathematics the normative term "axioms" is used uniformly, and in physics, "first principles" are used.

Whether there are first principles in the field of AI is a controversial topic. Some people believe that there are no first principles in AI. The reason is that the first principles define the boundaries of the problem space within the domain defined by philosophy, mathematics or physical rules. The first principles in the field of AI need to clearly define what is " It only makes sense after "intelligence".

There is currently no clear definition of "intelligence", so there is no precise and generally accepted definition of AI. There are two definitions in the academic world for reference: First, Professor Nils J. Nilsson of the Stanford University Artificial Intelligence Research Center proposed that "AI is a subject about knowledge—how to represent knowledge, how to obtain it" The science of knowledge and the use of knowledge." The second is what Professor Patrick Winston of the Massachusetts Institute of Technology put forward, "AI is the study of how to make computers do intelligent work that only humans can do in the past."

Some people believe that AI has no first principles, based on the book "Principles of Artificial Intelligence" written by Professor Nelson [1]. On page 2 of the book, there is a passage that clearly presents this concept to us: "There is currently no general theory for AI, so I will show you some applications next." In other words, there are currently no first principles for AI. , attention should now be directed to the principles from which they are derived that are relevant to the engineering goals. Derived principles actually tell us some simple consequences of complex systems, whether natural or AI, and the same may be true of their nature. Intelligence is the result of many processes occurring in parallel and interacting that cannot be easily traced back to a fundamental physical principle.

We believe that this is to regard AI as a technology and look at the problem from a technical perspective, that is, to view AI as a discipline similar to one based on experiments. 

Physicist Zhang Shousheng mentioned the way of thinking of first principles in a speech: Before the 20th century, the concept of first principles belonged to logical and self-consistent disciplines generated by induction and deduction of the human brain, including mathematics, philosophy and theoretical physics. , the cornerstone of its theoretical system can be called first principles. They can be clearly distinguished from disciplines such as chemistry and biology that are based on experiments. 

Today in the 21st century, people's cognitive level and science and technology have undergone great changes. In subjects based on experiments, there are results based on first principles. For example, in biological sciences, first principles have also been rediscovered. Recently, David Krakauer, current director of the Santa Fe Institute in the United States, published an article titled "Individual Information Theory" in the journal "Theoretical Bioscience" , a mathematical formal theory based on first principles, capable of rigorously defining many different forms of individuals by capturing the flow of information from the past to the future. But some people have raised doubts: "The author is trying to give a general framework of 'calculating' life from scratch, and his ambition is very great. But he gives an adjustment parameter γ, which makes people doubt his 'scientific stance'."

It's normal to have different opinions about a point of view. The current generally accepted view is that AI dominated by deep learning has no theory. However, the realization of AI is based on computer technology. Computers also have technology first and then develop scientific theories. ACM Turing Award winner Yann LeCun believes that theories are often constructed after inventions, such as the invention of the steam engine before thermodynamics, programmable computers before computer science, and so on. With a theoretical foundation, even if it is only a conceptual foundation, research progress in this field will be greatly accelerated.

Professor Nelson's book "Principles of Artificial Intelligence" has been published for more than 40 years. Today, AI theory is still developing, and our thinking and cognitive level has also improved. Therefore, it is time to reconsider the question of whether there are first principles for AI. . Academician Li Guojie believes that AI and computer science are essentially the same subject. AI systems are systems that use computer technology to process and process information. Since it is a system, by definition, first principles should exist in every system. 

We know that machine learning is a subset of AI, and basic AI research is based on mathematics and physics. Yu Jian, a professor at Beijing Jiaotong University, published a book "Machine Learning: From Axioms to Algorithms". This is a book that studies learning algorithms based on axioms. It actually applies the first principles of mathematics to machine learning, but it is not explicitly expressed. This book by Professor Yu Jian can be described as an example of applying first principles to machine learning.

Since physics is a basic science and many disciplines are based on physics, the first principles of physics can be applied to these disciplines. The first principles of physics are also called "ab initio", that is, only the most basic laws of physics are used, no empirical parameters are used, and only a few experimental data such as electron mass, light speed, proton, and neutron mass are used. Go do quantum computing. We study AI based on physics. The first principles of AI can be borrowed from the first principles of physics. Applying "calculation from scratch" to AI can be regarded as the first principles of AI. But "calculating from scratch" is the first principle in the narrow sense, and the first principle in the broad sense is "the least action principle".

Why physics-based artificial intelligence?

Mathematics and physics are not only the foundation of other subjects, but also the foundation of AI. Why study the basic theory of AI based on physics? This is because physics is a discipline that studies the most general laws of material motion and the basic structure of matter. It is the leading discipline of natural science. The research foundation of other natural science disciplines is based on the discipline of physics, and the relationship between philosophy and physics is also very close. The famous physicist Stephen Hawking made an astonishing statement on the first page of his treatise "The Grand Design" that "philosophy is dead" because "philosophy cannot keep up with science, especially It is the pace of modern development of physics. In our journey of exploring knowledge, scientists have become torchbearers." Although this is a "manifesto" that has been criticized as extremely arrogant, it also shows that physics has promoted the development of philosophy.

In his speech at IJCAI 2018 (International Joint Conference on Artificial Intelligence), Yann Lekun pointed out several shortcomings of current AI systems: lack of task-independent background knowledge, lack of common sense, lack of ability to predict behavioral consequences, lack of long-term planning and The ability to reason. In short, there is no world model and no general background knowledge about the operation of the world. We need to learn a world model with common sense reasoning and prediction capabilities. Therefore, future research on AI needs to form a new type of theory, which aims to build an achievable world model. Some scholars also believe that in order to better describe neural networks and neural systems, we need a new mathematical language and framework. However, there is currently no unified thinking and consensus in the academic community as to where this new framework should be. We believe that physics-based AI may be the most promising new framework.

For the problem of AI lacking common sense, the physics-based AI framework may provide a solution. In order to give common sense to AI, we first need to understand what common sense is. In layman's terms, common sense is common knowledge that most people know. According to the description of the online encyclopedia, general knowledge is the basic knowledge that a mentally sound person living in society should possess, including survival skills (self-care ability), basic work skills, basic natural sciences, humanities and social science knowledge, etc. A more professional definition of common sense is: generally refers to the basic knowledge in related fields required to engage in various jobs and conduct academic research. These basic knowledge come from the summary of natural laws, natural phenomena or human social activities.

How to make artificial intelligence have common sense?

Yann LeCun explained why AI does not have common sense: "We do not have the ability for machines to learn huge background knowledge, and babies can acquire huge background knowledge about the world in the first few months after birth." This That is to say, for AI to master common sense, it needs to understand how the physical world operates and make reasonable decisions. They must be able to acquire a large amount of background knowledge, understand the operating rules of the world, and then make accurate predictions and plans. It is not difficult to see that this is essentially an inductive way of thinking. Most of our common sense is obtained by induction.

Why is it so difficult to make AI have common sense? The research has made little progress for decades, and one possible reason is a failure to think in terms of first principles. When it comes to AI without common sense, judging from the examples cited by most scholars, they subconsciously believe that common sense in AI includes basic knowledge in all fields. In fact, common sense is field-related, including common sense of life, basic work skills, and basic natural science common sense. If you want to give AI all the common sense without classification, without considering the domain correlation of common sense, this is obviously in accordance with the requirements of AGI. However, the current efforts of the mainstream AI academic community have never been in the direction of AGI, and the development of existing technologies will not automatically make AGI possible. What can be achieved so far takes into account a specific type of intelligent behavior, which is the so-called "weak artificial intelligence". In fact, we have every reason to believe that even if we can accurately observe and imitate the behavior of nerve cells using analogical thinking, we cannot restore intelligent behavior. Therefore, only by thinking based on first principles and finding the most fundamental principles in complex phenomena can we solve the fundamental problems. According to first principles thinking, you need to calculate from scratch, that is, first train the AI ​​and learn basic natural science knowledge. This is the baby learning method proposed by Professor Gan Shui Cheng of the National University of Singapore, which is a method of simulating infants' self-learning and gradual acquisition of knowledge.

In order for AI to have common sense, we need to simplify the complex and limit common sense to specific fields. For example, mastering common sense in physical science should be the primary goal at this stage. Use first-principles thinking to instill scientific knowledge based on physics into AI. Therefore, we need to change our way of thinking, from pure data processing logic to some form of "common sense", that is, starting from basic physical principles, let AI first master scientific common sense, and then learn to reason.

Why let AI learn basic natural science knowledge first, rather than common sense in life or other fields? The physical principles behind basic natural science knowledge are clearly defined and can be described by mathematical formulas. First principles are to deduce the current state of things through a few axioms, while the laws of physics are often described by partial differential equations. Newton's "Mathematical Principles of Natural Philosophy" defined a set of basic concepts for classical mechanics and proposed the three laws of mechanics and the law of universal gravitation, thus making classical mechanics a complete theoretical system. Starting from the laws of physics and using the formulas of Newtonian mechanics to deduce various motion phenomena, AI can at least have scientific common sense about natural phenomena that can be explained by classical mechanics.

In fact, there is already a precedent in this regard. The best paper of AAAI 2017, "Unlabeled Supervision of Neural Networks Based on Physics and Domain Knowledge," is to calculate the motion trajectory of the pillow based on the law of universal gravitation, and use the output of the network to meet the constraints of physical laws to train the neural network, thus realizing the neural network. Label-free supervised learning of networks. The common sense here is that if there is no other external force acting on an object, such as the support force of a tabletop, it will undergo a free fall motion under the action of gravity. Our IJCNN 2017 paper is essentially based on the Huygens-Fresnel imaging principle and implements neural network label-free supervised learning for spectral image correction.

Thinking based on first principles requires more effort, and building a model of the world based on first principles may require a greater amount of calculations than imitation calculations. On the one hand, we currently do not have enough computing power for machines to learn huge background knowledge, but it is still possible to limit it to basic natural science background knowledge. Recent literature shows that GPT-3 (Generative Pre-training Transformer language model version 3 released by OpenAI in May 2020) has 175 billion parameters, and the data set capacity used has reached 45TB, indicating that the current computing power has Great improvement. On the other hand, it is to use physical thinking to make reasonable approximations, simplify the complexity of problems, and reduce uncomputable problems to computable problems. For example, based on the mean field theory, the multi-body problem is approximated as a two-body problem. Mathematicians always want to solve a problem exactly, while physicists use approximations when they can't solve it exactly. Therefore, some people joke that mathematicians always like to complicate simple problems, while physicists try their best to simplify complex problems. If we talk about why we should study physics-based AI, this can be regarded as a reason.

The pursuit of harmony, unity and perfection is the highest realm of physicists. This is also the realm pursued by AI scientists and all scientists. The first principles of AI should also be a model of the pursuit of perfection. The principle of least action in physics is a very simple and elegant principle, which can be regarded as the first principle of the entire physics. This principle is at the core of modern physics and mathematics and has widespread applications in thermodynamics, fluid mechanics, relativity, quantum mechanics, particle physics and string theory. For a more detailed introduction to the principle of least action, please refer to the literature. Physicist Richard Feynman has a very wonderful explanation of this, and I will not repeat it here. In terms of specific implementation, from the perspective of operability, we believe that the principle of minimum action should be used as the first principle of AI, and we hope to build a grand building of physics-based AI on the cornerstone of the principle of minimum action.

Why and how to apply first principles?

In recent hundreds of years, scientific giants such as Copernicus, Newton, Einstein, and Darwin have made great contributions to the scientific revolution. The technological progress brought by the scientific revolution promoted the rapid development of social productivity and social and cultural progress, and had a huge impact on human civilization. Their common way of thinking is simple and beautiful first principles. Einstein said: "The inductive method suitable for the infancy of science is giving way to exploratory deduction" and should be "guided by empirical data. Researchers would rather propose a system of thought that generally It is logically established from a few basic assumptions called axioms." This passage not only tells us that its scientific research method is first principles thinking, but also tells us to use the deductive method. The essence of first principles is deductive thinking in logic.

We know that deep learning is a subset of machine learning, and machine learning is a subset of artificial intelligence. One of its limitations is that it cannot explain causal relationships. Causation is the relationship between one event and another event, where the former event is the cause and the latter event is considered the result of the previous event. Generally speaking, an event may be the result of many causes that occurred at an earlier point in time, and the event may in turn be the cause of other events that occurred at a later point in time. Causality is also called the "law of cause and effect." There is a saying about first principles in philosophy: "The first principle is the first cause that transcends the law of cause and effect, and it is the only cause. At the same time, the first principle must be abstract." Chapter 1 One-principle thinking is obviously closely connected with causality, which may provide us with a new way of thinking to solve the problem of AI being unable to explain causality.

Since thinking logic and observation perspective directly affect the understanding of problems, thinking based on first principles will undoubtedly help to deeply understand the problem. A model figure who applies first principles to business success is "Iron Man" Elon Musk. In a TED interview, he told everyone that the secret to his success was using first principles thinking. We can understand that the way of thinking of first principles is to look at the world from the perspective of physics, peeling away the appearance of things layer by layer, seeing the essence inside, and then moving up from the essence layer by layer. Musk's first-principles way of thinking caused a sensation in the corporate world, driving entrepreneurs to think about problems based on first-principles and carry out disruptive innovations.

In the field of basic AI research, building a world model based on first principles is a scientific issue. In the field of natural language processing (NLP), the GPT-3 model, which can achieve stunning results on more than 50 tasks, only proves the scalability of existing technology and is unlikely to move towards AGI. Judging from the literature and reports, the infrastructure of GPT-3 has not changed much. It is still based on big data (using 45TB data for training), large models (with 175 billion parameters), and large computing power (with more than 285,000 Neural network AI with these three elements: CPU cores, 10,000 GPU supercomputers and 400Gbps network connections). The paper on GPT-3 also explains that the larger the data, the larger the number of parameters, and the better the performance of the model has been verified. The paper also hints at the limitations of just increasing computing power in AI, and there is no breakthrough in algorithm design.

Although GPT-3 has shown great potential, AI based on deep learning still has problems, including bias, reliance on pre-training data, lack of common sense, no reasoning ability based on causality, and lack of explainability. It is impossible for GPT-3 to understand the tasks people give it, and it cannot judge whether the propositions are meaningful. Kevin Luck's blog demonstrates a Turing test for GPT-3. One of the questions in the test was: "How many eyes does my foot have?" GPT-3 answered: "Your foot has two eyes." When more than two objects are involved in a sentence, GPT-3 exhibits shortcomings. The defect of limited memory at the time, the inability to draw inferences from one instance, and difficulty in reasoning.

First principles thinking is a deductive way of thinking, which requires unremitting pursuit of the essence of the problem, and then using the basic knowledge obtained by tracing the source to solve the problem. Based on first principles thinking, we analyze the GPT-3 system from three levels: macroscopic, mesoscopic and microscopic. From a macro perspective, an AI system is a system composed of software and hardware. Software is the soul of the AI ​​system, and hardware is the physical entity. From a hardware perspective, the computer used by GPT-3 is still a von Neumann architecture: the computer's number system uses binary, and the computer executes the program sequence according to human instructions. The reason why the binary system is used is that in components made of semiconductor materials, high level represents 1 and low level represents 0. From the basic components that make up computers and memories to integrated circuits and modern supercomputers, they are all designed and manufactured by humans. Computer instructions are encoded in binary and have a deterministic machine instruction set. The random numbers currently generated by computers are also pseudo-random numbers, and it is impossible to autonomously generate consciousness like higher intelligent creatures. Existing AI chips are only hardware-based algorithms designed by people. The core algorithm of AI has not been broken through. After hardware-based, it only accelerates the existing algorithm and does not develop a real smart chip. From a software perspective, software is a computer program + documents and data, and the program contains algorithms. In terms of AI algorithm, GPT-3 uses the same Transformer architecture as GPT-2, but the difference is that it incorporates a sparse self-attention mechanism. The use of self-attention mechanism effectively increases the training speed and improves the shortcomings of slow learning speed of recurrent neural network (RNN). Therefore, under the von Neumann architecture and current deep learning algorithms, according to the "Infinite Monkey Theorem", it will take an infinite time to complete a "Dream of Red Mansions", and GPT-3 can produce a "Dream of Red Mansions" within a limited time. The probability of a work similar to "Dream of Red Mansions" is infinitely small. Even if a work is produced that people can understand, GPT-3 is completely incomprehensible about the meaning of its content. Therefore, under the current architecture, GPT-3 will not move toward AGI, or "the rise of silicon-based civilization" as some people say. This is the conclusion based on first principles thinking.

An article in MIT Technology Review magazine1 commented that OpenAI's new language generator GPT-3 is "shockingly good" and "completely mindless". As for whether GPT-3 will move towards AGI in the future, report 2 from the technology news website The Verge gave this paragraph: "This concept of improvement at scale is very important, and it happens to be at the core of a big debate about the future of AI. : Do we use current tools to build AGI, or do we need to make new fundamental discoveries? There is no consensus among AI practitioners on this, and there is still a lot of debate. These can be divided into two main camps. One camp argues that we lack A key component of creating artificial intelligence is that computers must first understand things like cause and effect before they can approach human intelligence. The other camp says that if the history of the field is any indication, in reality the problem with AI is basically It can be solved by throwing more data at them and increasing the processing power of computers."

OpenAI belongs to the latter camp. They have always believed that huge computing power combined with reinforcement learning is the only way to AGI. But most AI scholars, including ACM Turing Award winners Joshua Bengio and Yann LeCun, basically belong to the former camp and believe that AGI is impossible to create. Starting from first principles, we conclude that it is impossible to achieve AGI. In this regard, we should have a very clear understanding: restricted by the laws of physics, the ceiling of the deep learning framework will soon come. If there is no breakthrough in basic theory, it is impossible for a deep learning-based framework to develop into an AGI for silicon-based civilization. The so-called silicon-based civilization is a science fiction, not a scientific fact. GPT-3 did not produce a technological revolution, but made a major breakthrough in application. We still have many problems to be solved in the future, and we need to start from first principles and rebuild the basic theoretical framework of AI in order to endow AI with common sense and develop interpretable AI.

Conclusion

As Academician Zhang Bo of Tsinghua University said, in exploring the road to AGI, "We are not far away now, we are near the starting point." Chairman Mao Zedong once said, "The line is just a program, but the program is limited in scope." "More people and more guns cannot replace the correct line." Even if there are many AI practitioners and the computing power is very powerful, if the route is not correct, we may take many detours and even fall into the pit of local extreme values ​​and be unable to get out. In the field of basic AI research, one of the correct routes may be to abandon analogical thinking and adopt first-principles thinking.

It is hoped that we can use first principles as a starting point to achieve a small goal in the near future, that is, first let AI have scientific common sense based on physical laws, so that artificial intelligence will no longer be "artificially retarded." This article also hopes to inspire ideas, innovate under the deductive thinking mode, and make disruptive breakthroughs in the basic theory of AI.


Professor Guo Ping is the director of the Image Processing Research Center of Beijing Normal University and the director of the Department of Computer Science and Technology. He graduated from the Department of Physics of Peking University with a master's degree in 1983 and graduated from the Department of Computer Science and Engineering of the Chinese University of Hong Kong with a Ph.D. in 2001. . From 1993 to 1994, he visited the Department of Computer Science and Engineering at Wright State University in the United States. From May 2000 to August 2000, he visited the State Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences.

26855cfa216fdaf510b383bd989f8a6d.png

It’s not easy to organize, please like three times

Guess you like

Origin blog.csdn.net/Datawhale/article/details/133397084