【Reprint】A brief history of artificial intelligence development

When I searched for information on the Internet, I saw a brief history of the development of artificial intelligence, which was relatively complete, so I reproduced it for learning purposes only. The original text link: https://www.aminer.cn/ai-history. Invasion and deletion.

What exactly is artificial intelligence? Generally speaking, artificial intelligence (AI) is a new technology science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Research in the field of artificial intelligence includes robotics, speech recognition, image recognition, natural language processing, and expert systems.

But in fact, it is difficult to define the scope of a subject, especially for a subject that is developing rapidly. Even in a mature subject like mathematics, sometimes it is difficult for us to sort out a clear boundary. It is even more difficult to make a relatively accurate judgment on a subject that is constantly expanding its boundaries like artificial intelligence. The application of artificial intelligence has been extended to various fields, including machinery, electronics, economics and even philosophy. It is extremely practical and is a very representative multi-disciplinary and interdisciplinary discipline.

Based on the combing and depiction of the AMiner scientific and technological intelligence engine, it shows the landmark achievements of artificial intelligence in the development history of more than 60 years in the form of a river map. Based on the river map, this article systematically sorts out the development process of artificial intelligence and its landmark achievements, hoping to present this history of more than 60 years of ups and downs in a clearer look.


Chapter 1: The Beginning Period - 1950s and Before

The origin of artificial intelligence can be traced back to Alan Turing's "On Computable Numbers and Their Application to Decision Problems" published in 1936. Later, computer games were introduced in 1950 by Claude Shannon. And Alan Mathison Turing (Alan Mathison Turing) proposed the "Turing Test" in 1954, and the idea of ​​​​letting machines generate intelligence began to enter people's vision.

In 1956, a seminar was held at Dartmouth College, where John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon formally proposed the concept of "artificial intelligence". In terms of algorithms, in 1957, Rosenblatt Frank proposed the perceptron algorithm Perceptron, which not only started the wave of machine learning, but also became the basis of the later neural network (of course, if you go back, the research on neural networks can be traced back to 1943, neurophysiologist McCulloch ( WS McCulloch and W. Pitts' neuron model).

1.1 Computer chess game (Programming a computer for playing chess)

Claude Elwood Shannon (April 30, 1916 - February 24, 2001) was an American mathematician and the founder of information theory.

Shannon was one of the first scientists in the world to propose that "computers can play chess with humans". In 1950, he wrote an article for "Scientific American", explaining "the method of realizing human-computer games"; the chess program he designed was published in the "Philosophy Journal" in the same year (Programming a Computer for Playing Chess).

Shannon defined the chessboard as a two-dimensional array, each chess piece has a corresponding subroutine to calculate all possible moves of the chess piece, and finally an evaluation function (evaluation function). The traditional chess game divides the chess playing process into three stages, the opening game, the middle game and the endgame. Different stages require different technical means. And this paper also cites von Neumann's "Game Theory" and Wiener's "Control Theory".

This paper opened the theoretical study of computer chess, and its main ideas can still be seen in "Deep Blue" and AlphaGo many years later. In 1956, on the MANIAC computer in Los Alamos, he demonstrated a chess-playing program for chess.

1.2 Turing Test

Alan Mathison Turing (English: Alan Mathison Turing, June 23, 1912 - June 7, 1954), a British mathematician and logician, is known as the father of computer science and the father of artificial intelligence. father.

In 1954, the Turing test (The Turing test) was invented by Turing. It means that the tester is separated from the testee (a person and a machine), and the tester is randomly asked questions through some devices (such as a keyboard) . If, over many tests, the machine got the average participant to make more than 30 percent false positives, the machine passed the test and was deemed human-intelligent. The term Turing Test comes from a paper "Computing Machines and Intelligence" written by Alan Matheson Turing, a pioneer of computer science and cryptography, in 1950. A prediction of thinking ability, and we are currently far behind this prediction.

He actually came up with a way to test whether a machine has human intelligence. That is, assuming that there is a computer whose calculation speed is very fast, the memory capacity and the number of logic units exceed the human brain, and many intelligent programs are written for this computer, and a large amount of suitable data is provided, then, Can it be said that this machine has the ability to think?

Turing affirmed that machines can think, and he also gave a definition of intelligence problems from the perspective of behaviorism, and thus proposed a hypothesis: that is, a person communicates with the other party in a special way without contacting the other party. A series of questions and answers, if he can't judge whether the other party is a human or a computer based on these questions for a long time, then it can be considered that the computer has the same intelligence as a human, that is, the computer can think. This is the famous "Turing Test". There were only a few computers in the world at the time, and almost all other computers simply failed the test.

It is very difficult to tell whether an idea is a "homemade" thought or an elaborate "imitation", and any evidence of homemade thought can be rejected. Turing attempted to resolve the longstanding philosophical debate about how to define thinking by proposing a subjective but operational criterion: if a computer acts, reacts, and interacts like a conscious individual , then it should be considered conscious.

In order to eliminate the prejudice in the human mind, Turing designed an "imitation game", that is, the Turing test: the human tester in the distance judges according to the responses of the two entities to various questions raised by him within a specified period of time. Is it a human or a computer. Through a series of such tests, the success of computer intelligence can be measured from the probability that the computer is misjudged as a human being.

Turing predicted that at the end of the 20th century, there will be computers that pass the "Turing test". On June 7, 2014, at the "Turing Test 2014" conference held by the Royal Society, the organizer, the University of Reading, issued a press release, claiming that the artificial intelligence technology created by Russian Vladimir Veselov Intelligent software Eugene Goostman passed the Turing test. Although the "Eugene" software is far from being able to "think", it is also a landmark event in the history of artificial intelligence and even computing.
insert image description here

1.3 Dartmouth Summer Research Conference on Artificial Intelligence

In the summer of 1956, the young Minsky and 10 people including mathematician and computer expert McCarthy (John MeCarthy, 1927-2011) held a two-month artificial intelligence summer seminar at Dartmouth College (Dartmouth College) , earnestly and enthusiastically discussing the problem of using machines to simulate human intelligence. The term artificial intelligence (AI) was officially used at the meeting.

This is the first artificial intelligence seminar in human history, marking the birth of the subject of artificial intelligence, which has very important historical significance and has made important pioneering contributions to the development of international artificial intelligence. The meeting lasted for a month, and basically focused on large-scale brainstorming. This gave birth to what came to be known as the artificial intelligence revolution. 1956 has thus become the first year of artificial intelligence. The main topics of the conference include autonomous computers, how to program computers to use language, neural networks, computational scale theory, self-transformation, abstraction, randomness and creativity, etc.

1.4 Perceptrons
In 1957, Frank Rosenblatt (Frank Rosenblatt) simulated a neural network model he invented called "Perceptron" on an IBM-704 computer. Rosenblatt published a book in 1962: "Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms", which proposed the first perception based on MCP neurons. Machine learning algorithm, and it also proposes a self-learning algorithm, which can automatically obtain the weight coefficient by learning the input signal and the output signal, and judge whether the neuron is activated by the product of the input signal and the weight coefficient (generated output signal).

A perceptron takes several binary inputs, X1,X2,…Xn X1,X2,…Xn, and produces a binary output:

insert image description here

The Perceptron shown above has three inputs, but the actual input may be much more or less than three. Rosenblatt Rosenblatt proposed a simple rule to calculate the output, he first introduced the concept of weights weights, ω1, ω2,... ω1, ω2,.... The importance of input to output is represented by the real weight ω ω, the output of the neuron is 0 or 1, which depends on the weighting factor (ie weights weights) is less than or greater than a certain threshold. Like weights, thresholds are real numbers and are parameters of a neuron.

The formula is expressed as:

insert image description here

This is the initial form of the activation function we are familiar with. The 0 0 state represents inhibition, and the 1 1 state represents activation. This simple mathematical formula helps us understand how perceptrons perceptrons work. Let us understand it as: It is a way to determine your output by weighing the importance of input information.

Chapter 2: The First Wave - 1960s

In the 1960s, artificial intelligence had its first upsurge, developed symbolic logic, solved several common problems, and initially sprouted natural language processing and man-machine dialogue technology. The representative events are Daniel Bobrow published Natural Language Input for a Computer Problem Solving System in 1964, and Joseph Weizenbaum published ELIZA—A Computer in 1966 Program for the Study of Natural Language Communication between Man and Machine.

Early artificial intelligence focused more on description logic and general problem solving. At the end of the 1960s, Edward Feigenbaum (Edward Feigenbaum) proposed the first expert system DENDRAL, and gave a preliminary definition of the knowledge base. It also gave birth to the second wave of artificial intelligence. Of course, the more important situation in this period is that everyone's enthusiasm for artificial intelligence has gradually faded, and the development of artificial intelligence has also entered a round of "cold winter" that spans nearly ten years.

2.1 Pattern Recognition (Pattern Recognition)

In 1961, Leonard Merrick Uhr and Charles M Vossler published a pattern recognition paper titled A Pattern Recognition Program That Generates, Evaluates and Adjusts its Own Operators, which described a pattern recognition program using machine learning or self-organizing process design. try. Not only does the program start without knowing the specific pattern to input, but it also doesn't have any operators to process the input. The operator is generated and refined by the program itself, it is a function of the problem space, and it is also a function of the success and failure of dealing with the problem space. A program not only learns information about different patterns, but at least in part, it also learns or constructs secondary codes suitable for analyzing the input to its particular set of patterns. It is also the first machine learning program.

insert image description here

2.2 Human computer coverage

In 1966, Joseph Weizenbaum, a computer scientist at the Massachusetts Institute of Technology, published on ACM the title "ELIZA, a computer program for the study of natural language communication between man and machine" (ELIZA-a computer program for the study of natural language communication between man and machine ) articles. The article describes how the program, called ELIZA, enables some degree of natural language dialogue between humans and computers. Weizenbaum developed ELIZA, one of the earliest chatbots, to mimic a psychiatrist in clinical therapy.

The implementation technology of ELIZA is to decompose the input through keyword matching rules, and then generate a reply according to the reorganization rules corresponding to the decomposition rules. In short, it is to type the input sentence and then translate it into a suitable output. Although ELIZA is very simple, Weizenbaum himself was surprised by ELIZA's performance, and then wrote the book "Computer Power and Human Reason" (Computer Power and Human Reason), expressing his special emotion for artificial intelligence. ELIZA is so famous that Siri also said that ELIZA is a psychologist and her first teacher.

2.3 Expert System DENTRAL (Expert Systems)

In 1968, at the request of NASA, Edward Feigenbaum (Edward Feigenbaum) proposed the first expert system DENDRAL, and gave a preliminary definition of the knowledge base, which also gave birth to the second artificial intelligence system. Smart wave. Of course, the more important situation in this period is that everyone's enthusiasm for artificial intelligence has gradually faded, and the development of artificial intelligence has also entered a round of "cold winter" that spans nearly ten years. The system's rich chemical knowledge helps chemists infer molecular structures based on mass spectrometry data. The completion of this system marks the birth of the expert system. After that, the Massachusetts Institute of Technology began to develop the MACSYMA system. After continuous expansion, it can solve more than 600 mathematical problems.

Now, expert system (Expert System, referred to as ES) is an important branch of artificial intelligence (Artificial Intelligence, referred to as AI), and natural language understanding, robotics is listed as the three major research directions of AI.

Chapter 3: The Second Wave Period - Late 1970s and 1980s

In the late 1970s and early 1980s, artificial intelligence entered the second wave, and the representative work was the large-scale knowledge base built and maintained by Randall Davis in 1976, and in 1980 Non-monotonic logic by Drew McDermott and Jon Doyle, and later robotic systems.

In 1980, a computer built by Hans Berliner became iconic when it defeated the world champion backgammon. Subsequently, behavior-based robotics developed rapidly under the impetus of Rodney Brooks and R. Sutton, and became an important development branch of artificial intelligence. Among them, the self-learning backgammon program created by Gerry Tesauro and others laid the foundation for the subsequent development of reinforcement learning.

In terms of machine learning algorithms, this period can be described as a hundred flowers blooming and a hundred schools of thought contending. The multi-layer perceptron proposed by Geoffrey Hinton and others solved the problem that Perceptron could not do nonlinear classification; the probabilistic method and Bayesian network advocated by Judea Pearl laid the foundation for later causal inference; and machine learning methods in machine vision and other directions to achieve rapid development.

3.1 Knowledge Representation

In 1975, Marvin Minsky proposed the framework theory in his paper "A Framework for Representing Knowledge" for "knowledge representation" in artificial intelligence.

The Minsky framework is not a mere theory. In addition to the simple side of the data structure, it is quite complex conceptually. It is aimed at people's psychological model when they understand a situation or an event. It regards the frame as the basic unit of knowledge, and connects a group of related frames to form a frame system. Different frameworks in the system can have common nodes, and the behavior of the system is realized by the specific functions of the sub-frameworks of the system. The reasoning process is accomplished by the coordination of sub-frameworks. Framework theory is similar to object-oriented programming in artificial intelligence. Its success is that it uses the structure of the framework to organically integrate knowledge and make it have a specific structural constraint. At the same time, the relative independence and closure of the structure are maintained.

The modular thought and case-based cognitive reasoning embodied in Minsky's framework theory add eternal charm to the theory, which is also an important reason why cognitive philosophers pay attention to it. As the core representative of cognitive computability, Minsky compared the mind to a computer, understood the cognitive process as an information processing process, and understood all intelligent systems as physical symbol systems. Although doing so enables people to analyze problems from the information flow from the environment to the mind, and from the mind to the environment, it makes the research on mental problems have experimental rigor. But the mechanical flaws are also very obvious.

At the same time, the framework is similar to the "class" in the object-oriented language in the field of software engineering, but the basic design goals of the two are different.

3.2 Heuristic Search (Heristic Search)

Douglas Lenat (born 1950), CEO of Cycorp in Austin, Texas, has been a prominent researcher in the field of artificial intelligence. He has worked on machine learning (with his AM and Eurisko programs), knowledge representation, blackboard systems, and "ontology engineering" (at MCC and Cycorp's Cyc program). He also worked on military simulations and published a critique of the traditional Darwinian theory of random mutations, based on his experience with Eurisco. Lennart was one of the original members of the AAAI.

At the University of Pennsylvania, Lenart earned a bachelor's degree in mathematics and physics, and a master's degree in applied mathematics in 1972. In 1976, he received his Ph.D. from Stanford University, and published the paper "Artificial Intelligence Methods Discovered in Mathematics - Heuristic Search".

The article describes a program called "AM" that models an aspect of the study of elementary mathematics: the development of new concepts of mathematics under the guidance of a large number of heuristic rules is considered an intelligent act, rather than a finished product. Local heuristics are communicated through an agenda mechanism, a global list of tasks to be performed by the system, and a plausible reason for each task. A single task can guide AM to define a new concept, or explore some aspect of an existing concept, or check the regularity of some empirical data, etc. The program iteratively selects the task with the best supporting justification from the agenda, and executes it. Each concept is an active, structured knowledge module. A hundred very incomplete modules were initially provided, each corresponding to a fundamental set-theoretic concept (such as union). This provided a defined but vast "space" that AM set out to explore. AM expanded its knowledge base, eventually rediscovering hundreds of common concepts and theorems.

3.3 Large Scale Knowledge-Base Construction (Large Scale Knowledge-Base Construction) (modified)

In 1976, Randall Davis obtained a doctorate in artificial intelligence from Stanford University and published an article Applications of Meta Level Knowledge to the Construction, Maintenance and Use of Large Knowledge Bases. This article proposes that using an integrated object-oriented model is to improve the knowledge base (KB ) development, maintenance and use of the integrity of the solution. Shared objects increase traceability between models, enhancing semi-automatic development and maintenance capabilities. Whereas the abstract model is created during knowledge base construction, reasoning is performed during model initialization.

Randall Davis has made seminal contributions in the fields of knowledge-based systems and human-computer interaction, with approximately 100+ publications and played a central role in the development of several systems. He and his research group develop advanced tools for natural multimodal interaction with computers by creating software that understands user images, gestures, and conversations.

3.4 Computer Vision (Computer Vision)

Computational theory of vision (computational theory of vision) is a concept proposed by David Marr in the 1970s. He published his masterpiece "Visual Computational Theory" in 1982. His work also had a profound impact on CognitiveScience.

David Marr was born on January 19, 1945. He studied at Trinity College, Cambridge University in his early years, and obtained a master's degree in mathematics and a doctorate in neurophysiology. He also received rigorous training in neuroanatomy, psychology, and biochemistry. He has done theoretical research in the neocortex, hippocampus, and especially the cerebellum in the UK.

Visited the United States in 1974, and at the request of Professor M.Minsky, stayed at the Massachusetts Institute of Technology to carry out research work on perception and memory. From the point of view of computer science, he integrated mathematics, psychophysics, and neurophysiology, pioneered the human visual computing theory, and brought a new look to visual research.

His theory was inherited, enriched and developed by a research team founded by him with doctoral students as the main body, and summarized by his students into a book in the field of computer vision: Vision: A computational investigation into the human representation and processing of visual information , published after him.

Computer vision is a science that studies how to make machines "see". To put it further, it refers to the use of cameras and computers instead of human eyes to identify, track and measure targets, and further image processing and computer processing into an image that is more suitable for human observation or instrument detection. Learning and computing allow machines to better understand the picture environment and build a vision system with real intelligence. There is a vast amount of image and video content in today's environment, which requires scholars to understand and find patterns in it, to reveal details that we have not paid attention to before.

3.5 Computer beats world Backgammon champion

In July 1979, a computer program called BKG 9.8 won the World Backgammon Championship in Monte Carlo. Invented by Hans Berliner, a professor of computer science at Carnegie Mellon University in Pittsburgh, the program runs on a mainframe computer at Carnegie Mellon connected to Monte Carlo via satellite. on one of Lo's robots. The robot, called Gammonoid, has a chess display on its chest that shows the moves of itself and that of its Italian counterpart, Luigi Villa. Luigi Villa wins the right to play Gammonoid by defeating all human challengers in a short time. The prize for the contest was $5,000, and Gammonoid ended up winning the contest 7-1.

The whole world knows it: BKG9.8 beat the backgammon world champion in 1979.

3.6 Expert System (Expert System)

Expert systems emerged in the mid-1960s, beginning with Bruce G Buchanan's 1968 article "Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry". At the request of NASA, Stanford University successfully developed the DENRAL expert system, which has very rich chemical knowledge and can help chemists infer molecular structures based on mass spectrometry data. The completion of this system marks the birth of the expert system. After that, the Massachusetts Institute of Technology began to develop the MACSYMA system, and now after continuous expansion, it can solve more than 600 mathematical problems. Although it has a history of less than thirty years, its development speed is quite astonishing, and its application has penetrated into almost every field of nature.

An expert system is defined as: using a computer model of human expert reasoning to deal with complex problems in the real world that require expert explanations, and to reach the same conclusions as experts. A simple expert system can be regarded as a combination of "knowledge base" and "inference machine". It is listed as the three major research directions of artificial intelligence together with natural language understanding and robotics, and it is the most active branch of artificial intelligence.

insert image description here

3.7 Bayesian Network

Judea Pearl, Israeli-American computer scientist and philosopher, known for advocating probabilistic approaches to artificial intelligence and developing Bayesian networks (. He is also credited with developing a structural model-based causal and In 2011, the Association for Computing Machinery awarded Judea Pearl the Turing Award for "fundamental contributions to artificial intelligence through the development of the calculus of probability and causal reasoning".

Bayesian network, also known as Belief Network, or directed acyclic graphical model, is a probabilistic graphical model first proposed by Judea Pearl in 1985. It is an uncertainty processing model that simulates the causal relationship in the human reasoning process, and its network topology is a directed acyclic graph (DAG).

insert image description here

The nodes in the directed acyclic graph of the Bayesian network represent random variables {X1,X2,…,Xn}{X1,X2,…,Xn}

They can be observable variables, or hidden variables, unknown parameters, etc. Variables or propositions that are considered causal (or unconditionally independent) are connected by arrows. If two nodes are connected by a single arrow, it means that one of the nodes is "parents" and the other is "children", and the two nodes will generate a conditional probability value.

In short, a Bayesian network is formed by plotting the random variables involved in a certain research system independently in a directed graph according to whether the conditions are independent. It is mainly used to describe the conditional dependencies between random variables, with circles representing random variables and arrows representing conditional dependencies.

In addition, for any random variable, its joint probability can be obtained by multiplying the respective local conditional probability distributions:

P(x1,…,xk)=P(xk|x1,…,xk−1)…P(x2|x1)P(x1)

3.8 Behavior-based robotitcs

In 1986, Brooks published the paper "Robust Hierarchical Control System for Mobile Robots", marking the founding of behavior-based robotics. The article introduces a new mobile robot control architecture. To allow robots to operate at ever-increasing levels of capability, multi-level control systems are built. Layers consist of asynchronous modules that communicate over low-bandwidth channels. Each module is a fairly simple computer instance. Higher-level layers can accommodate lower-level roles by suppressing their outputs. However, lower levels continue to function as higher levels are added. The result is a robust and flexible robot control system. The system has been used to control mobile robots roaming unrestricted laboratory areas and computer rooms. Ultimately, the goal is to control a robot that prowls the office area of ​​our lab, using a built-in arm to perform simple tasks such as mapping its surroundings.

The theory of behavior-based robotics puts forward completely different views and structures on intelligence from symbol-based artificial intelligence, mainly to achieve the following two conceptual changes: first, intelligence is not a symbolic model; second, intelligence A computational process that does not produce an output from an input.

Chapter Four: Stable Development Period - 1990s

In the 1990s, there were two very important developments in AI: on the one hand, the Semantic Web proposed by Tim Berners-Lee in 1998, that is, the semantic-based knowledge network or knowledge express. Later, the OWL language and some other related knowledge description languages ​​appeared, which provided a possible solution for the two core problems of the knowledge base: knowledge expression and open knowledge entities (although this idea has not been widely used in the future. It is recognized that it was not until Google proposed the concept of knowledge graph in 2012 that this direction had a clear development idea).

Another important development is statistical machine learning theory, including support vector machines proposed by Vapnik Vladimir et al., conditional random fields by John Lafferty et al., and topic model LDA by David Blei and Michael Jordan et al. Generally speaking, the main theme of this period is the steady development of AI, and great progress has been made in various fields related to artificial intelligence.

4.1 Support Vector Machine (Support Vector Machine)

In 1995, Cortes and Vapnik first proposed the concept of Support Vector Machine (SVM), which shows many unique advantages in solving small samples, nonlinear and high-dimensional pattern recognition, and can be extended and applied to other functions such as function fitting. in machine learning problems.

Support Vector Machine (SVM) is a kind of generalized linear classifier (generalized linear classifier) ​​that performs binary classification on data according to supervised learning, and its decision boundary is the maximum margin for solving the learning samples Hyperplane (maximum-margin hyperplane).

At the same time, the support vector machine method is based on the VC dimension theory of statistical learning theory and the principle of structural risk minimization. According to the complexity of the model (that is, the learning accuracy of specific training samples, Accuracy) and learning ability according to the limited sample information, (that is, the ability to identify any sample without error) to seek the best compromise in order to obtain the best generalization ability (or generalization ability).

4.2 Conditional Random Fields

John Lafferty first proposed the conditional random field model in 2001. It is a discriminative probabilistic graphical model based on the Bayesian theoretical framework. etc. performed particularly well. CRF was first proposed for sequence data analysis, and has been successfully applied in the fields of natural language processing (NLP), bioinformatics, machine vision and network intelligence.

Simply put, a random field can be regarded as a set of random variables (this set of random variables corresponds to the same sample space). When a value is randomly assigned to each position according to a certain distribution, the whole is called a random field. Of course, there may be dependencies between these random variables. Generally speaking, only when there are dependencies between these variables, it is practically meaningful for us to take it out as a random field.

Let X and Y be random variables, P(Y|X) is the conditional probability distribution of Y under the condition of given X, if the random variable Y constitutes a Markov represented by an undirected graph G=(V,E) Random field, that is, for any node v, the conditional probability distribution P(Y|X) is called a conditional random field.

This concept seems quite abstract and involves many other concepts. At this time, if you go back and read our definition at the beginning again, you will understand a lot. The so-called conditional random field is just a set of conditions for the formation of random variables X and Y A collection of probabilities, but this conditional probability satisfies the conditions of the Markov independence assumption/probabilistic undirected graph model, so we call it a conditional random field! The explanation of the above definition is also very intuitive, that is: all variables that are not directly connected to me have nothing to do with me!

4.3 Deep Blue Beats Kasparov

On May 11, 1997, the world-renowned human-computer war finally met after 6 fights. Although it is popular opinion that the human brain beats the computer-according to a poll conducted by CNN and the "USA Today" daily, 82% of people want the human brain to win; although the chess king firmly believes that the best chess players in the world can use creativity and Imagination triumphs over silicon - he argues: "In serious, standard chess, computers won't win this century." But facts speak louder than words, IBM's "Deep Blue" (Deep Blue) ends up winning 3.5:2.5 Defeated chess master Kasparov (Kasparov), and became the final winner of the $1.1 million prize in the New York Human-Computer Chess Tournament, and became the first computer to defeat the world chess champion. In fact, as IBM said, no matter who wins, human beings are the final winners.

In the following years, people's attitudes towards machines gradually returned to rationality. Personal computers have grown enormously in power, and smartphones can now run a chess engine as powerful as Deep Blue alongside other applications. What's more, thanks to recent advances in artificial intelligence, machines can now learn and explore games on their own.

Behind Deep Blue are still coded rules designed by humans for chess games. In contrast, AlphaZero, a program launched by Alphabet subsidiary DeepMind in 2017, can teach itself to become a master player through repeated practice. Even, AlphaZero has unearthed some new strategies that make chess experts sigh.

4.4 Semantic Web Road Map

Semantic Web (Semantic Web) is a concept proposed by Tim Berners-Lee (Tim Berners-Lee) of the World Wide Web Consortium in 1998. Its core is: by adding Semantics (Meta data) that can be understood by computers, so that the entire Internet becomes a general information exchange medium), the most basic element of which is the semantic link (linked node).

The Semantic Web is a more official name and a term most used by scholars in the field, as well as to refer to its related technical standards. At the beginning of the birth of the World Wide Web, the content on the network was only human-readable, but computers could not understand and process it. For example, when we browse a webpage, we can easily understand the content on the webpage, but the computer only knows that it is a webpage. There are pictures and links in the webpage, but the computer does not know what the picture is about, nor does it know the relationship between the page pointed to by the link and the current page. The Semantic Web is a general framework proposed to make data on the network machine-readable. "Semantic" is to express the meaning behind the data in a richer way, so that the machine can understand the data. "Web" is to hope that these data are linked to each other to form a huge information network, just like the interlinked web pages in the Internet, but the basic unit becomes data with smaller granularity.

4.5 Topic Modeling

In the field of machine learning, LDA is the abbreviation of two commonly used models: Linear Discriminant Analysis and Latent Dirichlet Allocation. LDA in this article only refers to Latent Dirichlet Allocation. LDA occupies a very important position in the topic model and is often used for text classification.

LDA was proposed by Blei, David M., Ng, Andrew Y., and Jordan in 2003 to infer the topic distribution of documents. It can give the topic of each document in the document set in the form of probability distribution, so that after analyzing some documents to extract their topic distribution, topic clustering or text classification can be performed according to the topic distribution.

LDA is an unsupervised machine learning technique that can be used to identify hidden topic information in large-scale document collections or corpus. It uses the bag of words method, which treats each document as a word frequency vector, thus converting text information into digital information that is easy to model. However, the word bag method does not consider the order between words, which simplifies the complexity of the problem and also provides an opportunity for model improvement. Each document represents a probability distribution composed of some topics, and each topic represents a probability distribution composed of many words.

Chapter 5: The third wave period - after 2006

The symbol of the rise of the third wave of artificial intelligence may be the deep learning proposed by Geoffrey Hinton and others in 2006, or that Hinton and others sounded the horn of this wave. The biggest difference this time is that companies are leading the wave: Sebastian Thrun led the self-driving car project at Google; (Jeopardy) defeated humans and won the championship; Apple launched a natural language question answering tool Siri in 2011; in 2016, AlphaGo launched by Google's DeepMind defeated Go world champion Li Shishi and so on. It can be said that the impact of this wave of artificial intelligence is unprecedented.

5.1 Boston Dynamics

Boston Dynamics was founded as an American engineering and robotics company known for its quadruped robots developed for the U.S. military with funding from the Defense Advanced Research Projects Agency (DARPA): Boston Mechanics Dog, and DI-Guy, a suite of off-the-shelf software (COTS) for realistic human simulation. The company, along with American Systems, received a contract from the Naval Aviation Warfare Center Training Service (NAWCTSD) to replace Navy aircraft ejection mission training with interactive 3D computer simulations of DI-Guy characters Film.

The company was founded by Marc Raibert and his partners. Marc Raibert is a renowned roboticist. He graduated from MIT at the age of 28, and then worked as an associate professor at CMU, where he established the CMUleg laboratory to study technologies related to robot-related control and visual processing. Returned to MIT at the age of 37 to continue to engage in robotics-related research and teaching. In 1992, he and his partners founded Boston Dynamics, which opened a new era of robotics research.

Boston Dynamics launched a four-legged robot in 2005 - big Dog, which is affectionately called "Big Dog", and it is this four-legged robot that made Boston Dynamics famous. Big Dog abandons traditional wheeled or tracked robots and turns to quadruped robots because quadruped robots can adapt to more terrain and have better passing performance. At the same time, in the promotional video released by Boston Dynamics, Big Dog can still respond sensitively to human kicks from its side even when it is loaded with heavy objects, and it always maintains a standing posture.

On December 13, 2013, Boston Dynamics was acquired by Google. On June 9, 2017, Softbank acquired Boston Dynamics, a subsidiary of Alphabet, Google's parent company, on undisclosed terms.

5.2 Transfer Learning

Transfer learning is an important branch of machine learning, which refers to a learning process that uses the similarity between data, tasks, or models to apply the model learned in the source domain to a new domain. For example, when most of us learn motorcycles, we transfer our experience of riding a bicycle.

In 2010, Sinno Jialin Pan and Qiang Yang published the article "Survey of Transfer Learning", which introduced the classification problem of transfer learning in detail. When the transfer learning scenario is used as the standard, it is divided into three categories: Inductive Transfer Learning, Transductive Transfer Learning, and Unsupervised Transfer Learning.

insert image description here

Deep transfer learning is mainly the transfer of models. One of the simplest and most commonly used methods is fine-tuning, which is to use the network that has been trained by others and adjust it for the target task. In recent years, BERT, GPT, XLNET, etc., which have become popular in recent years, first pre-train on a large amount of corpus, and then perform fine-tuning on the target task.

5.3 IBM Watson challenges the strongest Jeopardy in history! (IBM Watson wins Jeopardy!)

Watson, a question-answering computer system capable of answering questions posed in natural language, was developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson is named after industrialist Thomas J. Watson, IBM's founder and first CEO.

The Watson computer system was originally developed to answer the quiz show Jeopardy! , and competed in Jeopardy in 2011 against champions Brad Rutter and Ken Jennings. Eventually won the victory and won the first prize of one million dollars.

Watson operation mechanism:

Watson was created as a question answering (QA) computing system that applies advanced natural language processing, information retrieval, knowledge representation, automatic reasoning, and machine learning techniques to the domain of open-domain question answering.

The main difference between QA techniques and document search is that document search takes a keyword query and returns a list of documents sorted by relevance to the query (usually based on popularity and page rank), while QA techniques use natural language and try to Understanding it in more detail feeds back a precise answer to the question. When it was created, IBM said, "There are more than 100 different techniques for analyzing natural language, identifying sources, finding and generating hypotheses, finding and scoring evidence, and merging and ranking hypotheses." In recent years, Watson's capabilities have expanded, and Watson The way we work has also changed to take advantage of new deployment models (Watson on IBM Cloud) as well as evolving machine learning capabilities and optimized hardware available to developers and researchers. It is no longer a simple question answering (QA) computing system designed for Q&A, but now can "see", "listen", "read", "speak", "taste", "explain", "learn" and " recommend'.

insert image description here

High-level architecture of IBM Deep QA used in Watson

David Ferrucci is the lead researcher who led the team of IBM and academic researchers and engineers from 2007 to 2011 that developed the Jeopardy-winning Watson computer system.

Ferrucci graduated from Manhattan College in 1994 with a BA in biology and a Ph.D. from Rensselaer Institute of Technology. A computer science degree specializing in knowledge representation and reasoning. [2] He joined IBM's Thomas J. Watson in 1995 and left in 2012 to join Bridgewater Associates. [3] He is also the founder, CEO, and chief scientist of Elemental Cognition, a company dedicated to exploring a new field of research in natural learning, which Ferrucci describes as "artificial intelligence that understands the world the way people do."

5.4 Google self-driving car (Google self-driving car)

Google's self-driving technology development began on January 17, 2009, and has been taking place in the company's secret X laboratory. After the New York Times revealed its existence on October 9, 2010, Google later that day. Passed the official announcement of the self-driving car program. The project was led by Sebastian Thrun, former director of the Stanford Artificial Intelligence Laboratory (SAIL), and Anthony Levandowski, founder of 510 Systems and Anthony Robotics. ) initiated.

Before working at Google, Thrun and 15 engineers including Dmitri Dolgov, Anthony Levandowski and Mike Montemerlo worked together on a digital mapping technology project called VueTool for SAIL. Many of the team members met at the 2005 DARPA Grand Challenge, where both Thrun and Levandowski had teams competing in the autonomous driverless car challenge. In 2007, Google acquired the entire VueTool team to help advance Google's Street View technology.

As part of the development of the Street View service, 100 Toyota Priuses were purchased and equipped with Topcon boxes, digital mapping hardware developed by Lekondowski's company 510 Systems. In 2008, the Street View team launched the Ground Truth project with the aim of creating accurate road maps by extracting data from satellites and Street View. This laid the groundwork for Google's self-driving car plans.

In late May 2014, Google showed off a new prototype of its driverless car, which has no steering wheel, gas or brake pedals, and is 100% autonomous. In December, they showed off a fully functional prototype, which they plan to test on San Francisco Bay Area roads starting in early 2015. The car, called the Firefly, is intended to be used as an experimental platform and learning, rather than mass-produced.

In 2015, co-founder Anthony Levandowski and CTO Chris Urmson left the project. In August 2015, Google hired former Hyundai Motor executive John Krafcik as CEO. In the fall of 2015, Google offered "the world's first fully driverless ride on public roads" to chief engineer Nathaniel Fairfield's legally blind friends. This ride is by Steve Mahan, former CEO of the Santa Clara Valley Center for the Blind in Austin, Texas. This is the first fully driverless car on public roads. It has no test driver or police escort, nor does it have a steering wheel or floor pedals. As of the end of 2015, the car had logged more than 1 million self-driven miles.

In December 2016, the division was spun off from Google, renamed Waymo, and split into a new division of Alphabet. This means that Google has abandoned the self-driving car project in the main line of business.

5.5 Google Knowledge graph (Google Knowledge graph)

Google Knowledge Graph (English: Google Knowledge Graph, also known as Google Knowledge Graph) is a knowledge base of Google that uses semantic retrieval to collect information from various sources to improve the quality of Google search. The knowledge map joined Google Search in 2012 and was officially released on May 16, 2012.

The Google Knowledge Graph is a pretty ambitious project by Google – where Google gives meaning to strings, not just strings. Take Google's own example, when you search for Taj Mahal (Taj Mahal) on English Google, the traditional search will try to compare this string with the huge article library that Google has pulled down to find the most accurate one. Appropriate results, and sorted according to Google's mysterious algorithm; but in Google Knowledge Graph, Taj Mahal will be understood as a "thing", and some basic information about it will be displayed on the right side of the search results, such as geographic location, Wiki summary, height, architect, etc., plus some "things" similar to it, such as the Great Wall, etc.

Of course, Google also understands that "Taj Mahal" does not necessarily refer to the Taj Mahal – this is when the power of the Knowledge graph is revealed. There are two common "Taj Mahal" under the frame of the Taj Mahal, one is a singer and the other is a casino. Under normal circumstances, if you want to find these two Taj Mahals, but you can't type the right keywords , it is possible that the search results will be overwhelmed by the most famous one, but Google Knowledge Graph can help you find the specific content you want.

Google hopes to use the Knowledge Graph to superimpose a layer of mutual relationships on top of ordinary string searches, helping users find the information they need faster, and at the same time, they can better understand what users need, so that in the future "Knowledge"-based search one step closer. At present, Google's Knowledge Graph has 500 million "things" on it, generating 3.5 billion attributes and relationships, and of course it will continue to expand in the future. There is an introduction video from Google in the reading section. Unfortunately, it only supports English at present. I don’t know how long it will take to support Chinese.

History of Knowledge Graph

How did these contents come about? Of course, it is impossible to obtain the data completely by Google's own search, because the data is too huge.

For example, some of the data comes from The World Factbook (World Factbook) - CIA (Central Intelligence Agency): "World Factbook" is a survey report published by the Central Intelligence Agency of the United States, which publishes the overview of countries and regions in the world, such as population, Statistics on geography, politics and economics. Because the CIA is a department of the U.S. government, its data format, style, and content must meet the official needs and positions of the U.S. government. The data is provided by the U.S. Department of State, the U.S. Census Bureau, the Department of Defense and other relevant units under its jurisdiction. supply. (google)

There is also data from freebase: Freebase is a large collaborative repository of metadata, mostly contributed by members of its community. It integrates many resources on the Internet, including some content in private wiki sites. Freebase is committed to creating a resource library that allows everyone (and machines) around the world to quickly access it. It was developed by the American software company Metaweb and launched publicly in March 2007. Acquired by Google on July 16, 2010. On December 16, 2014, Google announced that it would shut down Freebase after six months and migrate all data to Wikidata.

Of course, there is also the famous wikipedia.

In 2012, Google's semantic network already contained more than 570 million object entities, and more than 18 billion historical facts and relationships between object entities. This data is used to understand the keywords we type into the search bar.

On December 4, 2012, Knowledge Graph was translated into seven languages, including: Spanish, French, German, Portuguese, Japanese, Russian and Italian.

5.6 AlexNet

AlexNet is a convolutional neural network (CNN) designed by Alex Krizhevsky, with Ilya Sutskever and Krizhevsky Co-published by Geoff Hinton's doctoral supervisor. AlexNet participated in the ImageNet large-scale visual recognition challenge held on September 30, 2012, and achieved the lowest Top-5 error rate of 15.3%, which was higher than the second 10.8 percentage points lower than the name, so far became famous in the first battle. The main conclusion of the original paper is that the depth of the model is critical to improving performance, and AlexNet is computationally expensive, but computationally feasible due to the use of a graphics processing unit (GPU) during training.

Krizhevsky, who was born in Ukraine and raised in Canada, was approached by Geoff Hinton about doing PhD research in computer science on AI at the University of Toronto . While in graduate school, Kryzhewski was reading papers on an early algorithm invented by his advisor Hinton called the "restricted Boltzmann machine". By this time he had discovered that graphics processing units ( GPUs) for use with restricted Boltzmann machines, rather than central processing units (CPUs). He argues that if these GPUs can be used on other neural networks with many more layers (or "deep neural networks"), then It can increase the processing speed of deep neural networks and create better algorithms. It can quickly surpass other state-of-the-art benchmarks in algorithm accuracy.

Shortly after his discovery, in 2011, another of Hinton's graduate students, Sutskever, learned about the ImageNet dataset. ImageNet has more than a million images and was designed specifically for the kind of computer vision algorithm the Toronto team was trying to solve. "I realized his code could solve ImageNet," Sutskever said. "The realization was not immediately obvious at the time."

After the two did some research, Kryzhewski used enhancements to his GPU-accelerated code to train a neural network on the dataset. The higher computing speed allows the network to process millions of images in five or six days, rather than the weeks or even months it previously took. All the extra data it can process gives the neural network unprecedented sensitivity at telling the difference between objects in an image.

After that, the two participated in the 2012 ImageNet competition with their mentor Hinton. The competition, a test of AI, included a huge online database of images. The competition, open to anyone worldwide, evaluates algorithms designed for large-scale object detection and image classification. The point is not just to win, but to test a hypothesis: that the vast amounts of data in the ImageNet database could be the key to unlocking the potential of AI, if the right algorithms are used. In the end, they beat all other research labs by a huge margin of 10.8 percent lower than the second-placed error rate.

At first, however, Hinton was against this idea, because he believed that the neural network still needed to be told which objects were in which images, rather than learning the labels themselves. Despite his skepticism, he contributed to the project as an advisor. It took Kryzhewski just six months for his neural network to reach ImageNet's image classification benchmark, and then another six months to reach the team's submitted results.

The final Kryzhewski neural network framework was validated in a seminal research paper in the field of AI, first presented at AI's largest annual conference in 2012 following the ImageNet challenge.

The current neural network framework is commonly known as AlexNet, but that name was not originally used. After the ImageNet challenge, Google hired an intern named Wojciech Zaremba (now OpenAI's robotics lead) to rewrite a set of frameworks based on Krizhewski as the company's thesis. Because of Google's tradition of naming neural networks after their creators, the company's approximation of Krizhewski's neural network was originally called WojNet. But then Google won the battle to invite Krizhewski and acquired its neural network, which has since correctly changed the name to AlexNet.

5.7 Variational Auto-Encoder VAE (Variational Auto-Encoder)

VAE, also called variational autoencoder, is a variant of autoencoder. VAE was published in the article "Auto-Encoding Variational Bayes" on ICLR by Durk Kingma and Max Welling in 2013.

An autoencoder is an artificial neural network used to learn efficient encodings of data values ​​in an unsupervised manner. The purpose of an autoencoder is to learn a representation (encoding) of a set of data by training the network to ignore signal "noise", usually for dimensionality reduction. Several variants of the base model exist, the purpose of which is to enforce useful properties on the learned input representation.

Unlike classical (sparse, denoising, etc.) autoencoders, variational autoencoders (VAEs) are generative models, such as generative adversarial networks. The article focuses on how to perform effective inference and learning in directed probabilistic models in the presence of continuous latent variables with difficult posterior distributions and large datasets. They introduce a stochastic variational inference and learning algorithm that scales to large datasets and works even intractable under certain differential differentiability conditions.

The authors show that the reparameterization of the varying lower bound yields a lower bound estimator that can be directly optimized using standard stochastic gradient methods. Second, it is shown that for iid datasets with continuous latent variables at each data point, posterior inference can be made particularly efficient.

The main proposer, Durk Kingma (Diederik P. Kingma), is currently working for Google. Before joining Google, he received his PhD from the University of Amsterdam in 2017 and was part of the founding team of OpenAI in 2015. His main research directions are: reasoning, stochastic optimization, and recognizability. Among its research achievements are variational autoencoders (VAEs), a principled framework for generative modeling, and Adam, a widely used stochastic optimization method.

5.8 Generative Adversarial Network GAN (Generative Adversarial Network)

Generative Adversarial Network (GAN for short) is a method of unsupervised learning, which learns by letting two neural networks play games with each other. This method was proposed by Ian Goodfellow et al. in 2014. Generative adversarial networks consist of a generative network and a discriminative network. The generator network randomly samples from the latent space as input, and its output needs to mimic the real samples in the training set as much as possible. The input of the discriminative network is the real sample or the output of the generation network, and its purpose is to distinguish the output of the generation network from the real sample as much as possible. The generative network should deceive the discriminative network as much as possible. The two networks fight against each other and constantly adjust the parameters. The ultimate goal is to make the discriminative network unable to judge whether the output of the generating network is true or not.

Two models are trained simultaneously in the framework: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability of a sample from the training data. G's training procedure is to maximize the probability of D being wrong. This framework corresponds to a two-player adversarial game with a lower bound on the maximum set. It can be shown that in the space of arbitrary functions G and D, there exists a unique solution such that G reproduces the training data distribution and D = 0.5. In the case where G and D are defined by a multilayer perceptron, the entire system can be trained with backpropagation. No Markov chains or unrolled approximate inference networks are required during training or sample generation. Experiments demonstrate the potential of this framework through qualitative and quantitative evaluation of the generated samples

Generative adversarial networks are often used to generate fake images. In addition, the method has been used to generate movies, 3D object models, etc. Although GAN was originally proposed for unsupervised learning, it has also been shown to be useful for semi-supervised learning, fully supervised learning, and reinforcement learning.

Ian Goodfellow, the inventor of GAN, studied at Stanford University, where he received a bachelor's and master's degree in computer science, and then received a doctorate in machine learning at the University of Montreal under the guidance of Yoshua Bengio and Aaron Courville. After graduation, Goodfellow joined Google and became a member of the Google Brain research team. He left Google in 2015 to join the newly established OpenAI Institute. Back to Google Research in March 2017.

Goodfellow's most famous achievement is the invention of GAN, known as the father of GAN. He is also the main author of the Deep Learning textbook. At Google, he developed a system that enabled Google Maps to automatically transcribe addresses from photos taken by Street View cars, and demonstrated security gaps in machine learning systems.

In 2017, Goodfellow was cited by MIT Technology Review's 35 Innovators Under 35. In 2019, he was included in Foreign Policy's list of 100 Global Thinkers.

5.9 Random inactivation (Dropout)

Random inactivation (dropout) is a method to optimize the artificial neural network with deep structure. During the learning process, some weights or outputs of the hidden layer are randomly reset to zero to reduce the interdependence (co-dependence) between nodes. In this way, the regularization of the neural network can be realized and the structural risk can be reduced. In 2014, Srivastava and Hinton et al. fully described dropout for the problem of neural network overfitting, and proved that it has a significant improvement over other regularization methods used at that time. Empirical results also show that dropout achieves excellent results on many benchmark datasets.

In 2012, Hinton and Srivastava first proposed the idea of ​​Dropout. In 2013, Li Wan, Yann LeCun and others introduced Drop Connect, which is another regularization strategy to reduce algorithm overfitting and is a generalization of Dropout. In the process of Drop Connect, a randomly selected subset of network architecture weights needs to be set to zero, instead of setting to zero a randomly selected subset of activation functions for each layer in Dropout. Drop Connect is similar to Dropout in that it involves introducing sparsity into the model, except that it introduces sparsity in the weights rather than in the layer's output vectors. In 2014, Srivastava and Hinton et al. fully described dropout for the problem of neural network overfitting, and proved that it has a significant improvement over other regularization methods used at that time. Empirical results also show that dropout achieves excellent results on many benchmark datasets. Since the publication of the paper by Srivastava and Hinton et al. in 2014, Dropout has been favored by various researches. Before Batch Normalization (BN) was proposed, Dropout was almost the standard configuration of all optimal networks, until Ioffe & Szegedy proposed Batch Normalization (BN) technology in 2015. By using this technology, not only can accelerate the modern model Speed, and can improve the baseline level in the form of regularization terms. Therefore, batch normalization is used in almost all recent network architectures, which illustrates its strong practicality and efficiency.

5.10 Deep Learning

Deep learning can be regarded as the most important branch of machine learning. 2006 is known as the first year of deep learning. In 2006, Jeffrey Hinton and his student Ruslan Salakhdinov formally proposed the concept of deep learning. Jeffrey Hinton is also known as the father of deep learning. In 2015, in an article published in the world's top academic journal "Nature", Jeffrey Hinton and others gave a detailed solution to the "gradient disappearance" problem - training algorithms layer by layer through unsupervised learning methods , and then use the supervised backpropagation algorithm for tuning. The proposal of this deep learning method immediately aroused great repercussions in the academic circle. Many world-renowned universities represented by Stanford University and the University of Toronto have invested huge manpower and financial resources in related research in the field of deep learning. Then it spread rapidly to the industrial world. Because of their outstanding contributions in deep learning, on March 27, 2019, ACM (Association for Computing Machinery) announced that Yoshua Bengio, Yann LeCun, and Geoffrey Hinton, known as the "Three Giants of Deep Learning", jointly won the 2018 Figure This is the rare year since the establishment of the Turing Award in 1966 that three winners were awarded.

In 2012, in the famous ImageNet image recognition competition, the team led by Jeffrey Hinton won the championship in one fell swoop using the deep learning model AlexNet. In the same year, DNN technology, a deep neural network jointly led by the famous Stanford University professor Ng Enda and the world's top computer expert Jeff Dean, achieved amazing results in the field of image recognition, and successfully reduced the error rate from 26% to 100% in the ImageNet evaluation. 15%. The emergence of deep learning algorithms in the world competition has once again attracted the attention of academia and industry to the field of deep learning.

With the continuous advancement of deep learning technology and the continuous improvement of data processing capabilities, in 2014, Facebook's DeepFace project based on deep learning technology has achieved an accuracy rate of more than 97% in face recognition, which is almost the same as that of human recognition. no difference. This result once again proves that the deep learning algorithm is the best in image recognition.

Geoffrey Everest Hinton (Geoffrey Everest Hinton), a computer scientist and psychologist, is known as the "father of neural networks" and "the originator of deep learning". He has researched methods of using neural networks for machine learning, memory, perception, and symbol processing, and has published more than 200 papers in these areas. He is one of the scholars who introduced the (Backpropagation) backpropagation algorithm into the training of multi-layer neural networks, and he also jointly invented the Boltzmann machine. His other contributions to neural networks include: distributed representation, time-delayed neural networks, mixtures of experts, Helmholtz machines, etc.

In 1970, Hinton received a Bachelor of Arts degree from the University of Cambridge, England, majoring in Experimental Psychology; in 1978, he obtained a Doctor of Philosophy degree from the University of Edinburgh, majoring in Artificial Intelligence. Hinton has since worked at the University of Sussex, UC San Diego, Cambridge University, Carnegie Mellon University and University College London. In 2012, Hinton won the Canadian Killam Prizes (Killam Prizes, the country's highest science award known as the "Canadian Nobel Prize"). Hinton is Canada's chief scholar in the field of machine learning, the leader of the "Neural Computing and Adaptive Perception" project sponsored by the Canadian Institute for Advanced Study, the founder of the Gatsby Center for Computational Neuroscience, and currently a professor at the Department of Computer Science at the University of Toronto .

5.11 Residual network (ResNet)

The residual network (ResNet) is a convolutional neural network proposed by four scholars from Microsoft Research, namely Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, in the 2015 ImageNet large-scale visual recognition competition (ImageNet Large Scale Visual Recognition Challenge, ILSVRC) in image classification and object recognition. The characteristic of the residual network is that it is easy to optimize and can improve the accuracy by adding considerable depth. Its internal residual block uses skip connections, which alleviates the problem of gradient disappearance caused by increasing depth in deep neural networks.

The main proponent of the residual network, He Kaiming, was the top scorer in the 2003 Guangdong Provincial Science College Entrance Examination. He studied in the basic science class of Tsinghua University and studied as a postgraduate student in the Chinese University of Hong Kong. He joined Microsoft Research Asia (MSRA) after receiving a doctorate in 2011. His main research directions are computer vision and deep learning. Currently working at Facebook AI Research (FAIR)

His paper Deep Residual Networks (ResNets) was the most cited paper in all fields in Google Scholar Metrics in 2019. Applications of ResNets also include language, speech, and AlphaGo.

5.12TensorFlow

TensorFlow is a symbolic mathematics system based on dataflow programming. It is widely used in the programming of various machine learning algorithms. Its predecessor is Google's neural network algorithm library DistBelief. Since November 9, 2015, TensorFlow has been open sourced under the Apache 2.0 open source license. TensorFlow has a multi-level structure, can be deployed on various servers, PC terminals, and web pages, and supports GPU and TPU high-performance numerical computing. It is widely used in Google's internal product development and scientific research in various fields. TensorFlow is developed and maintained by Google's artificial intelligence team, Google Brain. It has multiple projects including TensorFlow Hub, TensorFlow Lite, TensorFlow Research Cloud, and various application programming interfaces (Application Programming Interface, API).

Tensor (tensor) of TensorFlow represents infinite data, and Flow represents nodes, processes and processing. Just like neural signals galloping in the brain, TensorFlow simulates an artificial brain. The TensorFlow kernel defines a series of deep learning methods: convolutional neural network, backpropagation algorithm using GPU, cross entropy. Each layer of the neural network can be mapped to one or more methods, which is convenient for extending other deep learning algorithms. At the same time, the architecture is suitable for working in high-performance computer systems, regardless of the underlying specific computer hardware.

TensorFlow can utilize a LSTM neural network to map an input sequence to a multidimensional sequence, while using another LSTM neural network to generate an output sequence from a multidimensional sequence. Imagine that if the input sequence is English and the output sequence is Chinese, TensorFlow forms an intelligent translation system; if the input sequence is a question and the output sequence is an answer, TensorFlow forms a Siri; if the input sequence is a picture, the output sequence is Text, TensorFlow constitutes a picture recognition system. There are many other ifs, so TensorFlow has infinite possibilities.

5.13 OpenAI

OpenAI, an artificial intelligence non-profit organization jointly established by many Silicon Valley tycoons. In 2015, Musk and other Silicon Valley technology tycoons had continuous dialogues and decided to jointly create OpenAI, hoping to prevent the catastrophic impact of artificial intelligence and promote artificial intelligence to play a positive role. Musk, founder of Tesla Electric Vehicle Company and SpaceX, Y Combinator President Altman, angel investor Peter Thiel and other Silicon Valley giants committed to inject $1 billion into OpenAI in December 2014 .

OpenAI's mission is to ensure that Artificial General Intelligence (AGI), a system that is highly autonomous and outperforms humans in most tasks of economic value, will benefit all human beings. Not only hope to directly build a safe general artificial intelligence that is in line with the common interest, but also willing to help other research institutions jointly build such a general artificial intelligence to achieve their mission.

5.14 Sophia robot (Sophia)

Sophia, a humanoid robot developed by Hanson Robotics in Hong Kong, is the first robot in history to be granted citizenship. Sophia looks like a human female, with rubber skin and the ability to display more than 62 facial expressions. Computer algorithms in Sofia's "brain" are able to recognize faces and make eye contact with people.

In March 2016, in a test by robot designer David Hanson, Sophia, a human robot very similar to humans, revealed her desire to go to school and start a family. Sophia looks like a human female, with rubber skin and the ability to use many natural facial expressions. Computer algorithms in Sofia's "brain" are able to recognize faces and make eye contact with people. Sophia's skin is made of a malleable material called Frubber, and there are many motors underneath, allowing it to make actions such as smiling. Additionally, Sophia can understand language and remember interactions with humans, including faces. It gets smarter and smarter over time. "Its goal is to have the same consciousness, creativity and other capabilities as any human being," Hansen said.

On October 26, 2017, Saudi Arabia granted citizenship to Sophia, a robot produced by Hong Kong-based Hansen Robotics. As the first robot in history to obtain citizenship, Sophia said in Saudi Arabia that day that it hopes to use artificial intelligence to "help humans live a better life." Humans don't have to be afraid of robots. "If you treat me well, I will treat you well." .

5.15 AlphaGo (AlphaGo)

AlphaGo is a Go artificial intelligence program. Its main working principle is "deep learning". "Deep learning" refers to multilayered artificial neural networks and methods of training them. A layer of neural network will take a large number of matrix numbers as input, take weights through a nonlinear activation method, and then generate another data set as output. This is just like the working mechanism of a biological neural brain. Through an appropriate number of matrices, multiple layers of organization are linked together to form a neural network "brain" for precise and complex processing, just like people recognize objects and mark pictures. It is the first artificial intelligence robot to defeat human professional Go players and the first Go world champion. It was developed by a team led by Demis Hassabis, a DeepMind company under Google (Google). Its main working principle is "deep learning".

In March 2016, AlphaGo competed with the Go world champion and professional nine-dan Go player Li Shishi in a human-machine battle with a total score of 4 to 1; at the end of 2016 and the beginning of 2017, the program was listed as "Master" on the Chinese chess website. ) used a registered account to play fast chess against dozens of Go masters from China, Japan and South Korea, without a loss in 60 consecutive rounds; in May 2017, it played against Ke Jie, the world's number one Go champion, at the Go Summit in Wuzhen, China , with a total score of 3-0. It is recognized in the Go world that AlphaGo's chess strength has surpassed the top level of human professional Go. In the world professional Go ranking published by the GoRatings website, its level score has surpassed Ke Jie, the number one human player.

On May 27, 2017, after the man-machine battle between Ke Jie and AlphaGo, the AlphaGo team announced that AlphaGo would no longer participate in the Go competition. On October 18, 2017, the DeepMind team announced the strongest version of AlphaGo, code-named AlphaGo Zero.

The old version of AlphaGo is mainly composed of several parts: 1. Policy Network (Policy Network), given the current situation, predicts and samples the next move; 2. Fast rollout (Fast rollout), the goal is the same as the policy network, but Under the condition of appropriately sacrificing the quality of chess moves, the speed is 1000 times faster than the strategy network; 3. Value Network (Value Network), given the current situation, it is estimated that the probability of white winning or black winning is high; 4. Monte Carlo Tree search (Monte Carlo Tree Search) connects the above four parts to form a complete system. AlphaGo improves the game of chess through the cooperation of two different neural network "brains". These "brains" are multi-layered neural networks similar in structure to those used by Google's image search engine to recognize images. They start with multiple layers of heuristic 2D filters to handle Go board positioning the same way image classifier networks handle images. After filtering, 13 fully connected neural network layers produce judgments about the positions they see. These layers are capable of classification and logical reasoning.

The new version of AlphaGo, named AlphaGo Zero, is based on the previous version, combining the game records of millions of human Go experts, as well as reinforcement learning for self-training. The biggest difference between it and the old version is that it no longer requires human data. In other words, it has never been exposed to human chess records from the beginning. The R&D team just let it play chess freely on the chessboard, and then play itself.

According to AlphaGo team leader David Silva (Dave Sliver), AlphaGo Zero uses a new reinforcement learning method to turn itself into a teacher. The system didn't even know what Go was at the beginning, it just started from a single neural network and played self-play through the powerful search algorithm of the neural network. As the self-play increases, the neural network gradually adjusts, improving its ability to predict the next step and eventually winning the game. What's more, with the deepening of training, the AlphaGo team found that AlphaGo Zero also independently discovered the rules of the game and developed new strategies, bringing new insights into the ancient game of Go.

AlphaGo Zero uses only a single neural network. In previous versions, AlphaGo used a "strategy network" to choose the next move, and a "value network" to predict the winner of each move. In the new version, the two neural networks are combined into one, allowing it to be trained and evaluated more efficiently. At the same time, AlphaGo Zero does not use fast, random moves. In previous versions, AlphaGo used a quick move method to predict which player would win the game from the current situation. Instead, the new version relies on its high-quality neural network to evaluate chess positions.

The main designer of AlphaGo, Demis Hassabis (Demis Hassabis), artificial intelligence entrepreneur, founder of DeepMind Technologies, is known as the "father of AlphaGo". He started playing chess at the age of 4, taught himself programming at the age of 8, and won the title of Chess Master at the age of 13. At the age of 17, he entered Cambridge University to study computer science. In college, he started to learn Go. In 2005, he entered University College London to study for a Ph.D. in neuroscience, and chose the hippocampus in the brain as the research object. Two years later, he proved that five patients with amnesia due to damage to the hippocampus also faced obstacles in imagining the future, and was selected for Science magazine's "Breakthrough Award of the Year" for this research. Founded DeepMind Technologies in 2011, with "Solving Intelligence" as the company's ultimate goal.

5.16 Federated Learning

Federated learning (also known as collaborative learning) is a machine learning technique first proposed by Google in 2016 that trains algorithms on multiple distributed edge devices or servers that hold local data samples without exchanging their data samples. This approach is in stark contrast to traditional centralized machine learning techniques, which upload all data samples to a single server, and more classical decentralized approaches, which assume that local data samples are identically distributed of.

Federated learning enables multiple participants to build a common and powerful machine learning model without sharing data, thereby solving key issues such as data privacy, data security, data access rights, and access to heterogeneous data. Its applications can be found in various industries such as defense, telecommunications, IoT or pharmaceuticals.

Federated learning was originally used to solve the problem of updating models locally by end users of Android mobile phones. Its design goal is to ensure information security during big data exchange, protect terminal data and personal data privacy, and ensure legal compliance. Efficient machine learning between parties or multiple computing nodes. Among them, the machine learning algorithms that can be used in federated learning are not limited to neural networks, but also include important algorithms such as random forests. Federated learning is expected to be the basis for next-generation artificial intelligence collaborative algorithms and collaborative networks.

5.17 BERT

The full name of BERT is Bidirectional Encoder Representation from Transformers, which is a pre-trained language representation model. The BERT paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding was published in 2018. It emphasizes that it is no longer the traditional one-way language model or the shallow splicing of two one-way language models. Instead of training, a new masked language model (MLM) is used to generate deep bidirectional language representations. When it was published, it was mentioned that new state-of-the-art results were obtained in 11 NLP (Natural Language Processing, Natural Language Processing) tasks, which was stunned.

This model has the following main advantages:

1) Use MLM to pre-train bidirectional Transformers to generate deep bidirectional language representations.

2) After pre-training, you only need to add an additional output layer for fine-tune, and you can achieve state-of-the-art performance in a variety of downstream tasks. No task-specific structural modifications to BERT are required during this process.

The essence of BERT is to learn a good feature representation for words by running a self-supervised learning method on the basis of massive corpus. The so-called self-supervised learning refers to supervised learning that runs on data that is not manually labeled. In future specific NLP tasks, we can directly use BERT's feature representation as the word embedding feature of the task. Therefore, BERT provides a model for transfer learning of other tasks, which can be fine-tuned or fixed according to the task as a feature extractor. The source code and model of BERT were open sourced on Github on October 31, 2018, and the simplified Chinese and multilingual models were also open sourced on November 3.

Chapter 6 Looking ahead to the next ten years

Artificial intelligence, the future has come. In 2015, Academician Zhang Bo proposed the prototype of the third-generation artificial intelligence system. In 2017, the U.S. Defense Advanced Research Projects Agency (DARPA) launched the XAI project. The core idea is to comprehensively develop an interpretable AI system from three aspects: interpretable machine learning systems, human-computer interaction technology, and interpretable psychological theories. Research. Since 2017, artificial intelligence has been included in the government work report for three consecutive years.

In 2019, the artificial intelligence industry has completely bid farewell to the era of "slogan" and "packaging concept", and entered the track of steady development. The technology and application of artificial intelligence have begun to land in various industries, and the achievements and scene practices of artificial intelligence emerge in endlessly. For example, NVDIA's open source StyleGAN, Google's quantum supremacy paper officially appeared in Nature, Boston Dynamics robot dog Spot is about to be commercialized, Ali launched the world's strongest AI chip - Hanguang 800, AI face change, AI "face recognition" Assist the police and more. These major events all show that artificial intelligence technology has become more and more "down-to-earth" and has entered people's lives instead of staying in research and experiments. Artificial intelligence was officially included in the list of newly approved undergraduate majors.

In 2020, in the context of the global fight against the epidemic, when the communication between people is restricted, artificial intelligence is given more expectations and heavy responsibilities. It shows its talents in information collection, data aggregation and real-time update, epidemic investigation, vaccine drug research and development, new infrastructure construction and other fields. At the same time, with the continuous emergence of new technologies and new formats, the power of artificial intelligence to gather global wisdom and help the global economic recovery become more prominent.

On March 4, 2020, the central government clearly instructed to accelerate the construction of major projects and infrastructure that have been clearly defined in the national plan. Artificial intelligence is included in the category of new infrastructure. It will be the core driving force for a new round of industrial transformation and restructure production. All aspects of economic activities such as distribution, exchange, and consumption give birth to new technologies, new products, and new industries.

On August 5, 2020, the National Standardization Management Committee, the Central Cyberspace Administration of China, the National Development and Reform Commission, the Ministry of Science and Technology, and the Ministry of Industry and Information Technology jointly issued the "Guidelines for the Construction of the National New Generation Artificial Intelligence Standard System". The specific construction ideas and construction content of the national new generation artificial intelligence standard system are proposed, and a list of artificial intelligence standard development directions is attached, which further regulates the application system of artificial intelligence at the national level and clarifies its development direction.

The future belongs to artificial intelligence. Artificial intelligence will be integrated into the lives of each of us and become ubiquitous. The development of any technology has peaks and troughs, and the development of artificial intelligence is the same. Therefore, while we maintain an optimistic attitude, we also reserve reason. Don't exaggerate its role too much, follow the crowd blindly, but guide it correctly and develop steadily. Really bring out the advantages of artificial intelligence to improve human life and boost economic development.

references:

  1. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’2008). pp.990-998.

  2. Haenlein M, Kaplan A. A brief history of artificial intelligence: On the past, present, and future of artificial intelligence[J]. California management review, 2019, 61(4): 5-14.

  3. Nick. "A Brief History of Artificial Intelligence" [J]. Popular Science Creation, 2018.

  4. Chen Zongzhou. "AI Legend-Popular History of Artificial Intelligence" [J]. Popular Science Creation, 2018.

  5. Shi Zhongzhi. Advanced Artificial Intelligence[M]. Science Press, 2011.

  6. Gu Xianfeng. History review and development status of artificial intelligence[J]. Nature Magazine, 2016, 38(003):157-166.

  7. https://www.aminer.cn/ai-history

  8. https://tech.sina.com.cn/roll/2020-07-16/doc-iivhvpwx5735932.shtml

  9. http://www.samr.gov.cn/samrgkml/nsjg/bzjss/202008/t20200805_320544.html

  10. https://en.wikipedia.org/wiki/Artificial_intelligence

  11. https://en.wikipedia.org/wiki/History_of_artificial_intelligence

  12. http://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/

  13. https://www.livescience.com/47544-history-of-ai-artificial-intelligence-infographic.html

  14. http://courses.cs.washington.edu/courses/csep590/06au/projects/history-ai.pdf

  15. https://www.aaai.org/ojs/index.php/aimagazine/article/view/1904/1802

  16. https://www.technologyreview.com/s/602830/the-future-of-artificial-intelligence-and-cybernetics/

  17. http://www.bbc.com/future/story/20170307-the-ethical-challenge-facing-artificial-intelligence

  18. https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/

  19. https://zh.wikipedia.org/wiki/AlexNet#cite_note-quartz-1

  20. http://www.dpkingma.com/

  21. https://en.wikipedia.org/wiki/Autoencoder

  22. https://en.wikipedia.org/wiki/Generative_adversarial_network

  23. https://en.wikipedia.org/wiki/Ian_Goodfellow

  24. https://poloclub.github.io/ganlab/

  25. https://www.technologyreview.com/2018/02/21/145289/the-ganfather-the-man-whos-given-machines-the-gift-of-imagination/

  26. https://www.jianshu.com/p/efda7876fe1c

  27. http://blog.itpub.net/29829936/viewspace-2217861/

  28. http://www.techwalker.com/2017/1225/3102138.shtml

  29. https://zhuanlan.zhihu.com/p/20350743

  30. https://blog.csdn.net/cao812755156/java/article/details/89598410

  31. https://blog.csdn.net/weixin_43624538/article/details/85049699

  32. http://kaiminghe.com/

  33. https://www.jiqizhixin.com/graph/technologies/1c91194a-1732-4fb3-90c9-e0135c69027e

  34. https://www.openai.com/

Guess you like

Origin blog.csdn.net/wwlsm_zql/article/details/127366256