New Book Launch | "This Is ChatGPT" Preface by Liu Jiang, Editor-in-Chief

miracle

The subject of this book, ChatGPT, is a wonder.

It has been almost half a year since its release in November 2022, and the attention and impact ChatGPT has attracted may have surpassed almost all hot spots in the history of information technology.

Its number of users reached 1 million in 2 days and 100 million in 2 months, breaking TikTok's previous record. And after the release of the iOS App in May 2023, it will also top the overall list of the Apple App Store without any suspense.

For the first time in their lives, many people have come into contact with such a highly intelligent dialogue system that can correct mistakes. Writing articles, although many times you will be very confident, "serious nonsense", and even simple addition and subtraction are not correct, but if you remind it that it is wrong, or let it go step by step, it will be really very reliable, list the steps of doing things in an orderly manner, and then get the correct answer. For some complicated tasks, you are waiting to see it joke, but it surprises you by giving you reasonable answers without haste.

Many industry experts are also conquered by it:

Gates, who was initially pessimistic and even voted against Microsoft’s decision to invest in OpenAI in 2019, now compares ChatGPT with PCs, the Internet, etc. Huang Renxun called it the iPhone moment, OpenAI’s Sam Altman compared it to a printing press, and Google CEO Sundar Pichai said it was fire and electricity. Alibaba Zhang Yong’s suggestion is: “All industries, applications, software, and services are worth redoing based on the capabilities of the large model.” Many experts, represented by Musk, called for the suspension of the development of powerful AI models because the breakthrough capabilities of ChatGPT may pose a threat to human beings.

At the just-concluded 2023 Zhiyuan Conference, Sam Altman confidently said that AGI is likely to come within ten years, and global cooperation is needed to solve various problems caused by it. The three scientists who won the Turing Award for jointly promoting deep learning from the edge to the center of the stage have significantly different opinions:

  • Yann LeCun made it clear that the autoregressive large model represented by GPT has essential flaws, and it is necessary to find a new way around the world model, so he is not worried about the threat of AI.

  • Although Yoshua Bengio, who appeared in another speaker's video, did not agree that the GPT route alone could lead to AGI (he is optimistic about the combination of Bayesian reasoning and neural networks), he admitted that there is great potential in large models, and there is no obvious ceiling from the first principle. Therefore, he signed an open letter calling for a moratorium on AI development.

  • Geoffrey Hinton, who gave the finale speech, obviously agreed with his disciple Ilya Sutskever’s view that the large model can learn the compressed representation of the real world. He realized that with backpropagation (in layman’s terms, it means a built-in error-recognition and correction mechanism) and an artificial neural network that can be easily expanded, the intelligence may soon surpass that of humans. Therefore, he also joined the team calling for the risk of AI.

The counterattack journey of the artificial neural network represented by ChatGPT has also been regarded as ups and downs in the entire history of science and technology. It has been repeatedly discriminated against and attacked in the artificial intelligence community with many genres. More than one pioneering genius ended in tragedy:

In 1943, Walter Pitts and Warren McCulloh were only 20 years old when they proposed the mathematical representation of neural networks. He did not finish middle school. Later, he broke away from academia because of his discord with his tutor Wiener. He died prematurely at the age of 46 due to excessive drinking;

In 1958, at the age of 30, Frank Rosenblatt, who actually realized the neural network through the perceptron, drowned on his 43rd birthday;

The main proponent of backpropagation, David Rumelhart, was suffering from a rare incurable disease in his prime in his 50s. He began to suffer from dementia in 1998, and died in 2011 after fighting the disease for more than ten years.

……

Some top conferences and academic giants like Minsky have unceremoniously opposed or even rejected neural networks, forcing Hinton and others to adopt more neutral or obscure terms such as "associative memory", "parallel distributed processing", "convolutional network", and "deep learning" to win a living space for themselves.

Hinton himself started from the 1970s, sticking to the unpopular direction for decades, from the United Kingdom to the United States, and finally established a foothold in Canada, the former academic frontier, and worked hard to establish a school with a small number of elites despite the lack of financial support.

Until 2012, when his doctoral student Ilya Sutskever and others used new methods to soar into the sky in the ImageNet competition, deep learning began to become the prominent science of AI and was widely used in various industries.

In 2020, he led the team at OpenAI and opened the era of large models through GPT-3 with hundreds of billions of parameters.

ChatGPT's own life experience is also very dramatic.

In 2015, 30-year-old Sam Altman and 28-year-old Greg Brockman teamed up with Musk to summon 30-year-old Ilya Sutskever and other top AI talents to co-found OpenAI, hoping to establish a neutral frontier AI research force outside of Google, Facebook and many other giants, and ambitiously set human-level artificial intelligence as their goal.

At that time, the media basically reported that Musk supported the establishment of a non-profit AI organization as the title, and not many people were optimistic about OpenAI. Even a soul like Ilya Sutskever went through some ideological struggles before joining.

In the past three years, they have made multi-line attacks in reinforcement learning, robots, multi-agents, AI security, etc., and indeed they have not achieved particularly convincing results. So much so that the main sponsor, Musk, was dissatisfied with the progress and wanted to manage directly. After being rejected by the council, he chose to leave completely.

In March 2019, Sam Altman began to serve as the CEO of OpenAI, and within a few months completed the establishment of a commercial company and received a US$1 billion investment from Microsoft, making preparations for subsequent development.

In terms of scientific research, Alec Radford, who joined OpenAI two years after graduating from Olin College of Engineering in 2014, began to work hard. As the main author, under the guidance of Ilya Sutskever and others, he successively completed PPO (2017), GPT-1 (2018), GPT-2 (2019), Jukebox (2020), ImageGPT (2020), CLIP (2021), Whisper (2022) and many other pioneering works. In particular, the work of emotional neurons in 2017 created a minimalist architecture of "predicting the next character" combined with a technical route of large models, large computing power, and big data, which had a key impact on the subsequent GPT.

The development of GPT has not been smooth sailing.

It can be clearly seen from Figure 1 below that after the publication of the GPT-1 paper, OpenAI's intentionally simpler decoder-only architecture (accurately speaking, an encoder-decoder with autoregressive) did not receive much attention. The limelight was taken away by Google's BERT (encoder-only architecture, precisely speaking, an encoder-non-autoregressive decoder) a few months later. There has been a series of very influential work like xxBERT.

e7b1f24961fa124474243ab8978f4bd4.png

Figure 1 Large model evolutionary tree, from the paper "Harnessing the Power of LLMs in Practice" by Amazon Yang Jingfeng et al. in April 2023

Even today, the latter has accumulated more than 68,000 references, which is still an order of magnitude higher than GPT-1's less than 6,000. The technical routes of the two papers are different. Whether it is academia or industry, almost everyone chose the BERT camp at that time.

GPT-2, released in February 2019, increased the maximum parameter scale to 1.5 billion. At the same time, using larger-scale, higher-quality, and more diverse data, the model began to show strong general capabilities.

At that time, it was not the research itself that made GPT-2 hit the headlines in the technical community (until today, the number of citations in papers is still in the early 6,000s, which is far less than that of BERT), but that OpenAI only open sourced the smallest 345 million parameter model for safety reasons, which caused an uproar. The community's impression of whether OpenAI is not Open begins here.

Before and after this, OpenAI also conducted research on the impact of scale on language model capabilities, and proposed the "Scaling Law" (Scaling Law), which determined the main direction of the entire organization: large models. For this reason, other directions such as reinforcement learning and robotics were cut off. What is commendable is that most of the core R&D personnel chose to stay, change their research direction, give up their ego, and concentrate on doing big things. Many people turned to engineering and data work, or repositioned their research direction around large models (for example, reinforcement learning played a major role in GPT 3.5 and its subsequent evolution). This organizational flexibility is also an important factor for OpenAI's success.

With the emergence of GPT-3 in 2020, some insightful people in the small NLP circle began to realize the great potential of the OpenAI technical route. In China, Beijing Zhiyuan Artificial Intelligence Research Institute has jointly launched models such as GLM and CPM with Tsinghua University and other universities, and is actively promoting the concept of large models in domestic academic circles. As can be seen from Figure 1, after 2021, the GPT route has completely gained the upper hand, while the evolutionary tree of the "species" of BERT has almost stopped.

At the end of 2020, the brothers and sisters of Dario and Daniela Amodei, two vice presidents of OpenAI, led a number of GPT-3 and security team colleagues to leave and founded Anthropic. Dario Amodei's position in OpenAI is extraordinary. He is another maker of the technology roadmap besides Ilya Sutskever, and he is also the general director of the GPT-2 and GPT-3 projects and the security direction. And with him, there are many cores of GPT-3 and scale law papers.

A year later, Anthropic published the paper "A General Language Assistant as a Laboratory for Alignment" and began to study alignment problems with chat assistants. Since then, it has gradually evolved into Claude, an intelligent chat product.

6de745a331726cb8211b459da58629f9.png

In June 2022, the "Emergent Abilities of Large Language Models" paper was released. The first work was Jason Wei, a Google researcher who graduated from Dartmouth College for only two years as an undergraduate (he also went to OpenAI in February this year during the wave of Google's elite job-hopping). In this paper, the emerging ability of large models is studied. This kind of ability does not exist in small models, and it will only appear when the model scale expands to a certain level. That is to say, "quantitative change will lead to qualitative change" that we are familiar with.

By mid-November, the OpenAI employees who had been developing GPT-4 received instructions from the management to suspend all work and launch a chat tool with all their strength because of competition. Two weeks later, ChatGPT was born. What happened after that has been recorded in history.

The industry speculates that the management of OpenAI should have learned about the progress of Anthropic Claude, realized the huge potential of this product, and decided to act first. This demonstrates the super strategic judgment of the core personnel. You know, even the core developers of ChatGPT don't know why the product is so hot after it's launched ("My parents finally know what I'm doing"), and they didn't feel amazing at all when they tried it out.

In March 2023, after half a year of "evaluation, adversarial testing, and iterative improvements to models and system-level mitigations," GPT-4 was released.

Microsoft Research's study of its internal version (capable of exceeding the publicly available online version) concludes: "Across all these tasks, GPT-4 performs surprisingly close to human performance...Given the breadth and depth of GPT-4, we believe that it can reasonably be regarded as an early (but still incomplete) version of artificial general intelligence (AGI) systems."

Since then, domestic and foreign enterprises and scientific research institutions have followed up, and one or more new models are launched almost every week, but OpenAI is still the best in terms of comprehensive capabilities, and the only one that can compete with it is Anthropic.

Many people will ask, why didn't China produce ChatGPT? In fact, the correct question (prompt) should be: Why is only OpenAI in the world able to make ChatGPT? What is the reason for their success? Thinking about it is still relevant today.

ChatGPT, what a miracle.

odd man

The author of this book, Stephen Wolfram, is a strange man.

1e3347b66d30d8d9857fc40b8923a910.png

Although he is not a well-known technology celebrity like Musk, he is indeed well-known in the small circle of technology geeks, and he is called "the smartest person alive."

One of the founders of Google, Sergey Brin, was attracted to Wolfram's company for an internship during his college years. Wang Xiaochuan, the founder of Sogou and Baichuan Intelligent, is also a famous die-hard fan of him, "with reverence and fanaticism... following and following for many years."

Wolfram was known as a child prodigy as a child. Because they disdain to read the "stupid books" recommended by the school, and they are not good at arithmetic, and they don't want to do the questions that have already been answered. At first, the teachers thought that the child was not good.

As a result, at the age of 13, I wrote several physics books by myself, one of which was called "Subatomic Particle Physics".

At the age of 15, he published a serious high-energy physics paper "Hadronic Electrons?" in the Australian Journal of Physics, proposing a new form of high-energy electron-hadron coupling. This paper also has 5 citations.

cdb1878879751dda594132918c1c2d42.png

Wolfram spent a few years in famous universities such as Eton College and Oxford University in the UK, and he didn't take classes very much. He hated the problems that had been solved by others, so he ran away before graduating. Finally, at the age of 20, he directly got a Ph.D.

He then stayed on and became a professor at Caltech.

In 1981, Wolfram won the first MacArthur Genius Award and was the youngest winner. The same group are all masters of various disciplines, including the 1992 Nobel Prize winner Walcott.

He quickly lost interest in pure physics. In 1983, he transferred to the Institute for Advanced Study in Princeton and began to study cellular automata, hoping to find more underlying laws of natural and social phenomena.

This transformation has had a huge impact. He became one of the founders of the discipline of complex systems, and is considered by some to have produced Nobel Prize-worthy work. In his 20s, he did participate in the early work of the Santa Fe Institute with several Nobel Prize winners Gell-Mann and Philip Anderson (it was he who published the article "More is Different" in 1972 and proposed the concept of emergence), and founded the complex system research center at UIUC. He also founded the academic journal Complex Systems.

In order to make computer experiments related to cellular automata more convenient, he developed the mathematical software Mathematica (the name was given by his friend Jobs), and then founded the software company Wolfram Research, turning into a successful entrepreneur.

The power of Mathematica software can be intuitively felt from the highly abstract and clear grammar when interpreting ChatGPT later in this book. To be honest, this made me want to study this software and related technologies seriously.

In 1991, Wolfram returned to the state of research and began to hide at night, burying himself in experiments and writing for ten years every night, and published a masterpiece of more than 1,000 pages, A New Kind of Science.

The main point of view in the book is: everything is calculated, and various complex phenomena in the universe, including those produced by humans or spontaneous in nature, can be simulated by simple calculations with some rules.

The statement of the book review on Amazon may be better understood: "Galileo once claimed that nature is written in the language of mathematics, but Wolfram believes that nature is written in programming languages ​​(and very simple programming languages)."

Moreover, these phenomena or systems, such as the work of the human brain and the evolution of the meteorological system, are equivalent in terms of calculation and have the same complexity, which is called the "principle of computational equivalence".

The book is very popular, because the language is very popular, and there are nearly a thousand pictures, but there are also many criticisms from the academic circle, especially the old physics colleagues. The theories mainly concentrated in the book are not original (Turing's work on computational complexity, Conway's game of life, etc. are similar), and lack mathematical rigor, so many conclusions are difficult to stand the test (for example, natural selection is not The root cause of biological complexity, Scott Aaronson, author of Turing's book "Quantum Computing Open Course", also pointed out that Wolfram's method cannot explain the very core Bell test results in quantum computing).

Wolfram responded to criticism by launching the Wolfram|Alpha knowledge computing engine, which is considered by many to be the first truly practical artificial intelligence technology. Combining knowledge and algorithms, it enables users to issue commands in natural language, and the system returns answers directly. Users all over the world can use this powerful system through the web, Siri, Alexa, including the ChatGPT plug-in.

If we take the neural network represented by ChatGPT to look at Wolfram's theory, we will find a coincident relationship: the underlying autoregressive architecture of GPT, compared with many machine learning models, can indeed be classified as "computation with simple rules", and its ability emerges through the accumulation of quantitative changes.

Wolfram often provides technical support for Hollywood sci-fi movies, using Mathematica and Wolfram programming language to generate some realistic effects, the more famous ones include the black hole gravitational lens effect in "Interstellar", and the magical alien language in "Arrival" that can transcend time and space after mastering, all of which are very imaginative.

4b5b8780ab89d1bdbe3f099f2a9f8d24.png 44e2bb57659c2582e2af34a4d44c8179.png

He eventually left academia that year, linked to a feud with his Princeton colleagues. Teacher Feynman wrote to persuade him: "You will not understand the thoughts of ordinary people, they are just fools to you."

I did my own thing and lived a wonderful life.

Stephen Wolfram is amazing.

strange book

Strange things + strange people, this book is of course a strange book.

It is a miracle in itself that a master like Stephen Wolfram can write a popular book on a topic of great interest to a wide range of readers.

He turned from pure physics to complex systems 40 years ago because he wanted to solve the first principles of phenomena such as human intelligence, and he has accumulated a lot. Because of his extensive contacts, he has communicated with key figures such as Geffrey Hinton, Ilya Sutskever, and Dario Amodei, and has first-hand information, which ensures the accuracy of the technology. No wonder the CEO of OpenAI called it "the best explanation of the principle of ChatGPT" after the publication of this book.

The whole book is divided into two parts, and the space is very small, but the most important points about ChatGPT are mentioned, and the explanation is popular and thorough.

I initiated the "ChatGPT Learning Camp" in the Turing community. I had a lot of exchanges with students of various technical levels and professional backgrounds. I found that it is very important to understand the large model and correctly establish some core concepts. Without these pillars, even if you are a senior algorithm engineer, your cognition may be greatly biased.

For example, one of the core concepts of the GPT technology route is to use the simplest autoregressive generation architecture to solve the unsupervised learning problem, that is, to use the original data without human labeling, and then learn the mapping from the data to the world. Among them, the autoregressive generative architecture is the very popular "just add one word at a time" in the book. It is important to note here that the purpose of choosing this architecture is not to do generation tasks, but to understand or learn, and to realize the general capabilities of the model. In the years before and even after 2020, many professionals in the industry took it for granted that GPT was for generating tasks and chose to ignore it. As everyone knows, the title of the GPT-1 paper is "Improving Language Understanding Through Generative Pre-Training".

For another example, for readers who do not have much technical background or machine learning background, the immediate difficulty they may encounter when understanding the latest developments in artificial intelligence is that they cannot understand the old basic concepts of "model" and "parameters (weights in neural networks)", and these concepts are not so easy to explain clearly. In this book, the great author explained it with intuitive examples (functions and knobs) very thoughtfully. (See the section "What is a model")

The several sections about neural networks are rich in pictures and texts. I believe it will be very helpful for all kinds of readers to have a deeper understanding of the nature of neural networks and their training process, as well as concepts such as loss functions and gradient descent.

The author did not ignore the ideological nature in the explanation. For example, the following paragraphs are a good introduction to the meaning of deep learning:

The big breakthrough in "deep learning" around 2012 was related to the discovery that it might be easier to minimize (at least approximately) when many weights are involved than when relatively few weights are involved.

In other words, it might seem counterintuitive that complex problems are sometimes easier to solve with neural networks than simple ones. The general reason is that when there are many "weight variables", there are "many different directions" in the high-dimensional space that can lead us to the minimum; and when there are fewer variables, it is easy to fall into the "mountain lake" of the local minimum and cannot find the "direction out".

This paragraph makes clear the value of end-to-end learning:

In the early days of neural nets, people tended to think that "neural nets should do as little as possible". For example, when converting speech to text, it is believed that the audio of the speech should be analyzed first, broken down into phonemes, and so on. But it turns out that (at least for "human-like tasks") the best approach is usually to try to train a neural network to "solve the problem end-to-end", letting it "discover" the necessary intermediate features, encodings, etc. on its own.

Mastering the why of these concepts is beneficial to understand the background of GPT.

The concept of embedding is crucial to algorithm researchers engaged in large-scale model development, programmers based on large-scale model application development, and ordinary readers who want to understand GPT in depth. It is also the "central idea of ​​ChatGPT", but it is relatively abstract and not particularly easy to understand. The section "The Concept of 'Embedding'" in this book is the best explanation of this concept I have ever seen. Through the three methods of diagram, code and text interpretation, I believe everyone can grasp it. Of course, there are many color pictures in the section "Meaning Space and Laws of Semantic Motion" in the following text, which can further deepen this concept.

At the end of this section, common word tokens (token) are also introduced, and several intuitive English examples are given.

The following introduction to the working principle and training process of ChatGPT is also popular and rigorous. The more complicated technology of Transformer is very detailed, and it is also truthfully informed that the current theory has not figured out why this is effective.

The first part ends at the end, combining the author's computational irreducibility theory, raising the understanding of ChatGPT to a higher level, which is similar to the general idea of ​​GPT that Illya Sutskever emphasized in multiple interviews is to obtain the compressed representation of the world model through generation.

In my opinion, this passage is very thought-provoking:

What does it take to produce "meaningful human language"? In the past, we might have thought the human brain was essential. But now we know that ChatGPT's neural network can also do a very good job. …I strongly suspect that the success of ChatGPT hints at an important "scientific" fact: that meaningful human language is actually more structured and simpler than we know it to be, and it may end up being possible to describe how to organize such a language with fairly simple rules.

Language is a tool for serious thinking, decision-making, and communication. Compared with perception and action, it should be the most difficult task in intelligence from the perspective of children's acquisition sequence and difficulty. But ChatGPT is likely to have broken the password, as Wolfram said "". This does indicate that in the future, we may further greatly improve the overall intelligence level through computing language or other representations.

Extended from this, the progress of artificial intelligence may have similar effects in various disciplines: subjects that were considered difficult before are actually not so difficult from another perspective. Coupled with the blessing of a general-purpose intelligent assistant like GPT, "some tasks have changed from basically impossible to basically feasible", and eventually the technological level of all mankind has reached a new height.

The second part of this book is an introduction to the comparison and combination of ChatGPT and Wolfram|Alpha systems, with many examples. If the general intelligence of GPT is more like human beings, most human beings are indeed not good at precise calculation and thinking by nature. The combination of general-purpose models and special-purpose models in the future should also be a promising direction.

It is a little regretful that this book only focuses on the pre-training part of ChatGPT, but does not cover the following fine-tuning steps that are also important: supervised fine-tuning (SFT), reward modeling and reinforcement learning. A better learning material in this regard is the speech "State of GPT" given by Andrej Karpathy, a founding member of OpenAI and former Tesla AI leader, at the Microsoft Build conference in May 2023.

8b0c8f187405846f0931b84917f96808.png

I have provided a video of this speech and Chinese refined text pictures in the Turing community "ChatGPT Learning Camp". There will also be a guided reading class of this book in the future. Welcome everyone to join.

Mr. Liu Jiang and Mr. Wan Weigang will share the content related to the book "This is ChatGPT" on July 25th. Welcome to make an appointment to follow!

Click to read the original text and join the ChatGPT Learning Camp.

Guess you like

Origin blog.csdn.net/turingbooks/article/details/131714280