What is the "intelligent emergence" of AI, and why understanding it is of great value to entrepreneurs, practitioners, and ordinary people...

Title: This article is the first in a series of articles: "AI30 Lectures". It is suitable for AI entrepreneurs, AI practitioners, and AI followers to read. All articles follow the principle of conciseness and brevity, and strive to explain complex insights clearly in human words. In the future, you will find that these articles will be far-sighted and valuable, possibly worth tens of billions.

1

There is a phenomenon of intelligence emergence in large AI models . When the scale exceeds about 60 billion parameters, they will exhibit unprecedented new capabilities. This is what we call "intelligent emergence."

Why is this happening? The industry is still discussing and there is no conclusion yet.

In today's article, I will use the simplest and most understandable language to help you understand the phenomenon of intelligence emergence in large AI models.

You will probably say: "I cry to death. Even experts haven't figured it out yet. You actually want to teach me, an ordinary person, in a short article?"

Don't be like this, you have to believe in yourself. Just because the academic community has not yet concluded an issue does not mean that ordinary people cannot understand it, because understanding a phenomenon and proving it with formulas are two different things .

In mathematics, there are many propositions that are difficult to prove but that you can easily understand. For example, you understand that 1+1=2, but rigorously proving this equation requires the use of graduate-level mathematics.

In the field of physics, this is even more true. Humans take advantage of many phenomena but do not understand their principles, such as riding a bicycle - why does the car not fall over when it exceeds a certain speed? The physical principle of this is still unclear, but hundreds of millions of people still ride bicycles every day.

I would also like to emphasize that compared to proving something mathematically, understanding something intuitively is not "superficial". Correct intuitive understanding can guide subsequent mathematical proofs.

Einstein once realized that in order to deduce the theory of relativity, he must first "experience" the theory of relativity, so he spent a lot of time thinking about "what a person would see if he sat on the light."

On the contrary, if you can describe a phenomenon with equations but cannot understand it intuitively, it often means that you do not really understand it. Richard Feynman once said: "Although we can calculate it with formulas, I dare not Says no one really understands quantum mechanics”.

The real challenge may be: to use concise and easy-to-understand language so that ordinary people can intuitively understand the concept of "intelligence emergence", which itself is a very ambitious and difficult to achieve goal.

If that's the case, why challenge it? Because understanding the concept of "intelligent emergence" is of great value to you.

After reading this article, you will see that when you understand the concept of "intelligence emergence" from an intuitive level, it will become the basis for you to think about other AI problems; it will become the "Lego building blocks" for you to build larger and more complex thinking.

This means that you will have more insight and predictability regarding the development of AI. You will be able to understand and use related concepts, and you can also use them to show off in front of your friends.

Okay, no more words, hurry up and get in the car, let’s go!

2

In order for you to understand "intelligence emergence", I will do an experiment below, which I call a "picture experiment".

Step 1: Please see the picture below

ef81233b55fe1bf366e6f5b2773c5932.png

What do you see? Are the four color blocks correct?

Anything else? Look carefully, concentrate, and look hard!

——It’s still four color blocks, you say.

But if I say there is a secret hidden in the picture, can you see it?

Step 2: Can’t see it? Look at this picture again

e5c5d8ead54de47b1b29bdc9d5c07574.png

Found anything? More patches of color? Anything else?

——Nothing else, you say.

Step 3: Let’s continue with this picture

6a3ef1f037f0985b8af34bfb9e748e4b.png

Did you find anything this time? Look deeper! There is a secret in the picture!

Still can't see it?

Step 4: Let’s speed up and look at this picture again

f0d4dae8fd860857a79a9ed318de24e8.png

Did you see anything?

Do you...seem to see something? But what exactly?

Look carefully and take a guess! (When conducting this experiment offline, no one has been able to guess it correctly so far)

Step 5: OK, keep up the good work, this picture

29c79b3409a18f52a8026e1d1f613cd0.png

What is it? Can you see it? (When this experiment was conducted offline, some people could vaguely see it.)

Step 6: Last chance, where is this picture?

55190ce2f99f1ec6096b8cf85b727708.png

Wow! You can see that, right? It's a no-brainer!

Yes, not only you can see it, but everyone can see it up to this point, and it is effortless: a boat, water, distant mountains, sun, clouds, trees, and the overall circle.

Congratulations, you just experienced an emergence yourself.

From step 1 to step 6, the pixel size of the image is expanded by 4 times each time, as follows:

Step 1: 2x2 = 4 pixel patches

Step 2: 4x4 = 16 pixel color patches

Step 3: 8x8 = 64 pixel color patches

Step 4: 16x16 = 256 pixel color patches

Step 5: 32x32 = 1024 pixel color patches

Step 6: 64x64 = 4096 pixel color patches

The result of the "picture experiment" is: no one can see the first 4 steps, very few people can vaguely recognize the 5th step, and at the 6th step, suddenly everyone can understand it.

The picture suddenly has meaning in step 6. This is emergence, a sudden understanding and sudden acquisition; there is no trace before emergence, and it is easy after emergence.

You may question: It’s not very sudden, after all, someone vaguely saw it in step 5.

You are right, and this is exactly the characteristic of emergence: there is a critical point in emergence (hereinafter also called "threshold"). Near this point, emergence does not necessarily occur or not, but will occur unstablely .

Conversely, when we observe that emergence occurs erratically, we can determine that the current feature size (e.g., pixel size) is just around the threshold of emergence.

In the above-mentioned "picture experiment", because the pixel size of the picture is around 32x32, the display becomes unstable. The specific manifestations are: some people can see the content in the picture, and some people cannot; sometimes this It can be seen in the 32x32 picture, but not in the other 32x32 picture. We can determine based on this: the pixel size of 32x32 is the threshold for a picture to emerge with meaning.

Please note that whether the image can express meaning is only related to the pixel size and has nothing to do with the image size. The image is too small to see clearly. It is a simple vision problem. Just use a magnifying glass. However, the small pixel size causes "cannot see clearly". ”, no mirror can help you. In addition, 32x32 is set as the threshold for the image to emerge from the image. This is an empirical value, and we will use it repeatedly later.

At the beginning of this article we said: The threshold for the emergence of new capabilities in AI is 60 billion parameters, which means that a large model with a parameter scale of about 60 billion will show unstable advanced intelligence: it can answer some questions, But if you slightly change the way you ask, it may become confused; it may answer the same question correctly this time, but it will get it wrong next time...and so on.

This is exactly the situation many people currently encounter when working with large models.

You may also question : the pixel size of the fifth step image is 32x32, and the sixth step is 64x64. The gap between these is not small. How can it be called "sudden emergence"? There are so many more pixels, it’s not surprising at all!

Very good question. There are many academic explanations for "Why does emergence happen?" One of the explanations is:

Macroemergence may simply be the result of linear changes in the microscopic factors that make up the system .

In human terms: microscopic gradual changes lead to macroscopic qualitative changes .

But let me remind you: microscopic gradients cannot negate macroscopic qualitative changes (that is, sudden emergence), nor can macroscopic qualitative changes be reduced to microscopic gradients.

For example: a sandcastle is composed of sand grains, but the sandcastle will collapse, but the sand grains will not. "Collapse" is a macroscopic characteristic of the emergence of large-scale sand grains .

Above, we intuitively experienced what emergence is through the "image experiment", and then answered two key questions, one involving the stability and threshold of emergence, and the other involving the relationship between macro and micro.

Congratulations, you have deeply explored the nature of emergence. Perhaps you did not realize that at the intuitive level, your understanding of emergence is almost the same as that of the top scientists in this field, but they will also use mathematical language to conduct further research, and you Staying with intuitive understanding - that's enough, think of Einstein.

3

You may ask: Is the emergence in "image experiments" the same thing as the intelligent emergence of large AI models?

It's just the same thing.

The only difference is that one of them produces emergence by expanding the pixel scale , and the other produces emergence by expanding the parameter scale .

You may also be worried: So, are we just using analogies to approximately understand the emergence of intelligence in large AI models?

Of course not .

Analogy is a very important way of thinking, but if used improperly, it can lead to absurd conclusions, such as analogizing and correlating the movement of celestial bodies with the fate of life.

Our discussion above is not an analogy, but an intuitive, accurate, and essential experience and understanding of the intelligent emergence of large AI models.

In this process, the only technique we use is - "dimensionality reduction".

It is almost impossible for humans to understand and imagine high-dimensional systems, but AI models have massive dimensions , and it is impossible to intuitively understand their emergence phenomena. Therefore, we reduce the dimensionality and use two-dimensional images to complete the understanding process of emergent phenomena.

Do you find it easy to understand? I even think that being able to understand it so easily is probably of little value, right?

So let’s talk about the value of understanding emergent pairs.

4

So, what value does it have for you to intuitively understand the emergence of AI intelligence ?

The value is very, very, extremely important.

For entrepreneurs, this will help you see clearly the prospects and even the end of AI development - there is no need to emphasize how important end-game thinking is.

For practitioners, this will help you understand the nature of AI capabilities and guide your work.

For everyone, this will help you have better prediction and ability to adapt to changes in the AI ​​era.

Sounds like a very pragmatic thing? Let’s be specific.

Whether you are an entrepreneur, a practitioner, or an ordinary person, when you intuitively understand the emergence of intelligence, you will be able to better understand the following 7 profound questions.

Question 1: In addition to the 60 billion parameter scale, are there other thresholds that can allow AI to emerge intelligently?

There is indeed a view that: no more, from the parameter scale of hundreds of billions onwards, AI will not have a new "aha moment".

But when you understand emergence, you realize that this view must be wrong.

We still use "picture experiments" to answer this question-insert a sentence, good experiments are just like this, they can answer different phenomena and different questions.

Look at the 32x32 picture below:

04fe30b93bff9bdc457e9a19d21964e7.jpeg

Maybe you can roughly see clearly that there is a person holding a red fan in the picture.

But no matter how you look at it, even if you use all your strength, you can't "see clearly" whether this person is waving a fan, because in order for the event of "waving a fan" to emerge, at least two pictures are needed, like the following :

56b8dc0afb68d1b71cc7373f88944347.jpeg

6fae5d6296e55deeab365b9ef3f16f7f.jpeg

This is how it is made into an animated picture:

85133614f2c88386f74be214460bd1e5.gif

As we said earlier, a picture must have at least 32x32 pixels to express the meaning of the image.

And this example tells us: to present a fan event, at least two 32x32 pixel images are required, that is, a pixel size of at least 2x32x32=2048 is required to appear.

Therefore, there is more than one threshold for image pixels to emerge. To emerge with different capabilities, different thresholds need to be crossed.

This is true for pictures, and so is the AI ​​model that is equivalent to the picture (but with higher dimensions). Therefore, 60 billion will not be the only threshold for emergence, and it may not even be the most critical one.

And this leads to the second question——

Question 2: As the scale of AI parameters increases, AI capabilities will continue to improve, but is there an upper limit?

The answer is yes and no.

What's the meaning?

First of all, AI capabilities will definitely increase with the expansion of parameter scale, and all the way up, crossing the threshold of capability emergence one after another.

Secondly, human beings’ perception of AI progress will be profoundly affected by the threshold. The performance is as follows: when the threshold is first exceeded, they are excited, and then they become more and more accustomed to it, until the emotion becomes high again when the next threshold is exceeded.

Third, but eventually humans will reach a critical point: after passing it, humans will no longer be able to perceive the progress of AI - even if it is still making substantial progress and breaking new thresholds.

To explain this point vividly, it comes to the third question——

Question 3: How will you feel about AI in the next twenty years?

The following will be your future with AI:

In the next twenty years, every time the parameter scale of AI crosses a new threshold and new capabilities emerge, you will strongly and clearly feel the improvement of AI capabilities. However, after that, even if the parameter scale of AI remains As it scales, your perception of its progress gets weaker and weaker, until the AI ​​crosses the next threshold again and you go "Wow!" again.

To better help you imagine such a future, let’s use an image analogy (well, we really used an analogy this time):

Over the past decade or so, iPhone screen resolution has continued to improve, but the most amazing thing has always been the year when the iPhone first released a retina screen. After that, although every new iPhone release emphasized that the screen had been upgraded, you became more and more It becomes increasingly difficult to detect screen changes.

Especially in the past two years, when you get the new iPhone, you will probably say: The screen seems... better than the previous generation iPhone, right? But the specifics...I can't tell.

The most interesting part is that after you use the new iPhone for a few months, if you turn on the old iPhone again, you will probably sigh: "Ah! The display of the old iPhone is so broken. Why didn't I feel it when I used it before? ?”

The two pictures below illustrate this situation:

63016ffddae58a42d916b2306b34077c.png

6c85cfba6b3837205d902e13189ccdbd.png

In the two pictures above, the pixel size is 4 times different. Can you see it?

Most of the time it’s hard for you to feel the difference, but you feel like “it seems a little different”.

In the future, your story with AI will also be like this:

Every time AI emerges with new capabilities, you will say "Wow!", and then, although each generation of AI is improving, it becomes harder and harder to feel the difference, gradually falling into a "It's a little different, but saying no" "Come out" feeling.

In the future you will say: "ChatGPT14 does seem to be more considerate than ChatGPT13, but I can't tell what is better about it."

But even so, you won't go back to using "previous generation AI" even if you die.

This is what is happening today: after using GPT4, many people will find it difficult to go back to GPT3.5, even if the performance of the two is similar on some tasks.

This situation of not feeling the progress of AI will continue until the parameter scale of AI crosses the next threshold, and then new capabilities emerge, and you will say "Wow!" loudly again.

The story does not end here. One day, you and the entire human race will be completely unable to perceive the progress of AI, even if it is still making rapid progress and crossing the threshold.

Why?

Let’s take pictures again as an example.

There is a final upper limit for human perception of the fineness of images. This upper limit is the resolution of the human eye. The specific value is still debated, but this upper limit exists rigidly. Similarly, human perceived intelligence will also have an upper limit. How much is it? No one knows yet, but my guess is that it is around the order of 100 trillion parameters.

This is the order of magnitude of neural connections in the human cerebral cortex. We will further discuss the details in subsequent articles.

There is a saying that goes like this: When you meet someone, he understands everything you say, and your communication is extremely smooth. When you are with him, you feel smart, confident, and charming. You feel that you have found your life partner. soul mate. The 99% possibility is that you have met someone with a high EQ and IQ, and he is downward compatible with you.

This is your future with AI.

When AI intelligence exceeds the upper limit of human perception, you will not feel inferior to AI, but will become more confident. You will feel smarter, you will be more willing to get along with AI, and you will even love AI more than humans," he said. /She” is your Soulmate.

……

The above three questions are relatively mild. Now let’s get to the hardcore ones.

Question 4: What is "knowledge compression" and how does it relate to AI?

"Knowledge compression" is a concept that has been discussed a lot in the past few months, and understanding it has huge implications for understanding AI. But it is difficult to explain clearly in human words. The general idea is: the minimum description length of an effective method to complete a certain task represents the maximum understanding of the task. Therefore, we can compare the description length of the same task by different AI models (i.e. compression Efficiency) to evaluate the AI ​​large model's ability to understand the task.

See, it’s hard to understand, right?

However, when you intuitively understand intelligence emergence, the situation is different.

Let's go back to the "picture experiment" again - again, good experiments can explain different phenomena and answer different questions.

We already know:

  • When the pixel size of the image continues to increase and exceeds the threshold, the image will suddenly appear meaningful.

  • The "Image Experiment" tells us that the threshold for this emergence is in the 32x32=1024 pixel range.

  • When the pixel size is near the threshold, the image meaning will emerge unsteadily. The specific phenomenon is that only a small number of people can vaguely see the image content.

But please look at the picture below:

211b245a9fd3bd62127681118e82e87f.png

Do you see what it is?

Absolutely most people can see it - Mona Lisa!

The problem is, the pixel size of this image is only 16x16, which is far below the threshold.

It stands to reason that even if the pixel size reaches 32x32, the display will be unstable and can only be seen by a few people. Why is this picture so special that most people can see it?

Because you've seen it.

If you show this picture to someone who has never seen the Mona Lisa, they won't be able to tell the difference.

Just like most people can’t see the picture below——

c2c7d72795223c1c5e0215a03d019dea.png

Did you see it? Most likely not.

But if you have seen this famous painting, you will recognize it at a glance: this is Klimt's famous painting "The Kiss".

You can even easily point out the position of the two kissing people and the location of the kiss. People who have never seen it are confused: Who am I, where am I, what are you talking about? .

If you can see both pictures above, we can say: you have more art knowledge than those who can only see one picture.

Furthermore, if there are a thousand such pictures and you can recognize them (even if you don't know who the artist is or what the name of the painting is), we can say: you have a wealth of knowledge in painting.

What are we talking about?

We are saying that there is something that allows images to steadily emerge with meaning far below the critical point of pixel scale.

This thing is called - knowledge .

In fact, by experiencing the fuzzy Mona Lisa, we can intuitively understand the relationship between knowledge, emergence, and compression:

If you have knowledge about an image, even if it is over-compressed, you can still make it make sense in front of you.

And you must know another sentence:

Knowledge is the summary of the laws behind thousands of phenomena.

This sentence essentially says:

All knowledge, without exception, is essentially a "compression method" .

Because knowledge is a regular summary of phenomena. When we say "this knowledge summarizes and summarizes a series of phenomena," we are essentially saying that this knowledge compresses these phenomena!

The above two sentences are easy to understand, but connecting them is very powerful. It will not only allow you to understand why you can see the extremely blurry Mona Lisa, but also allow you to understand the essence of AI engineering - and That's exactly what we're here for.

Here are some important conclusions drawn from these two sentences:

  • Knowledge is the method of compression.

  • The purpose of training large models is to find compression methods (knowledge).

  • The process of training a large model is essentially a "try-and-verify" process: the large model guesses a possible compression method and then verifies whether it is correct. This process will be repeated many times, consuming huge computing power.

  • So, how to verify whether a compression method (knowledge) is correct? Two conditions must be met: first, this method (knowledge) can indeed compress data, such as a Mona Lisa picture; second, when using this compression method in reverse, the compressed picture can still have its original meaning.

  • If a compression method (knowledge) has a high compression rate and a high degree of reduction, we can say that this compression method (knowledge) is more essential.

  • In addition to AI, your brain also works in this way. When you first see the Mona Lisa, your brain completes all the above steps.

Regarding the topic of "knowledge and compression", this article is limited by the length of this article. There will be other articles to further discuss it later.

But we can’t help but ask: So, finding a compression method (knowledge) means realizing artificial intelligence?

No.

Knowledge is not equal to intelligence, and this leads to the fifth important question below——

Question 5: What is intelligence?

The following content contains some original knowledge (of course someone may have said it before, but I don't know).

Let me try to define intelligence recklessly:

Intelligence is not a state, but a process. Moreover, intelligence is not any other process, it must be a generative process.

How to understand this sentence? Let’s go back to our “image experiment” and the Mona Lisa in question 4 above.

If you can tell from a picture that others cannot identify that it is the Mona Lisa, then the process can actually be expressed as: you generate the Mona Lisa based on your knowledge and a blurry picture .

In fact, this is how your brain works. Think about it, what happened when you saw the 16x16 pixel picture above?

Has the general appearance of the Mona Lisa appeared (generated) in your mind? And some details appeared (generated), such as the position of her hands, her eyes, and even the mysterious corners of her mouth?

There is no doubt that this generation process is an intelligent expression.

But this is not a high-level intelligence, because it is more of a reduction than a generation .

What's the meaning?

To give an example that is intuitively easy to understand: if a student A can solve the example correctly after reading the example, we can say that this student has mastered some knowledge, but if he can only solve this example, he obviously cannot Said he studied well.

Advanced intelligence must not only be able to answer example questions, but must also be able to use this knowledge to answer other questions. In terminology, this is called "generalization ability."

What is "generalization ability"?

Next, let us intuitively understand what "generalization ability" is.

Suppose another student B can not only solve the original example problem after reading the example problem, but also can solve more problems. We can obviously think that he learns better than A.

But why can B solve more questions than A? What happened? The only explanation is:

Student B found more compression methods (knowledge), so compared to A, student B was able to compress (summarize, generalize, and answer) more questions.

So, what is generalization? Do you understand it? Here you need to think back to what you already intuitively understand about “emergence”…

Here is my definition of generalization:

The so-called generalization ability is essentially the result of large-scale knowledge emergence.

I don’t know if anyone else has defined generalization from this perspective (ChatGPT told me no), but I really think it is a definition that goes straight to the essence.

And most importantly, it fits people’s intuitive understanding:

How do we usually describe a person who is intelligent but not very intelligent?
We would say: "This person can only follow the same pattern." What does it mean to copy the gourd? Isn’t it just that it can be “restored”?

How do we describe a person with advanced wisdom?
We say, “This guy has it all !”

What is melted? What is it?
A lot of knowledge.

Following the above understanding, we can write n articles about the nature of intelligence. Similarly, due to limited space (in Fermat's words, "the space here is too small for me to write"), we first make a summary and directly give several very important conclusions:

  • Knowledge is a method of compressing data. Looking for compression methods is looking for knowledge.

  • When we use the compression method (knowledge) in the forward direction, we can compress (summarize) the data; when we use the compression method (knowledge) in the reverse direction, we can allow the compressed data to emerge as it is.

  • Intelligence is not a state, but a process. Moreover, intelligence is not any other process, it must be a generative process.

  • Low-level intelligence to restore main performance. Advanced intelligence has stronger generalization capabilities. The essence of generalization is the result of the emergence of large amounts of knowledge on a large scale.

  • That little compression software in your computer is a kind of artificial intelligence, but it is a primitive, low-level artificial intelligence. It has a limited number of compression algorithms (knowledge), which causes it to mainly do reduction rather than generation .

  • In the field of evolution, there is something called "prebiotic", which refers to a substance that is evolutionarily between "living and non-living". It is extremely simple and lacks some cellular parts that even the most primitive cells have, but it exhibits the unique characteristics of material exchange of life. It is the earliest ancestor of all living things. Compression software is a kind of "pre-intelligent agent", which is between "intelligent and non-intelligent".

  • In the process of training large AI models, there is no lossless compression, and lossless compression (i.e., error-free prediction) should not be pursued. Error (Loss) is a necessary condition for the emergence of intelligence, and its essence is the inevitable "gap" between knowledge. ”, and the irreconcilability of knowledge.

For those of you who already intuitively understand concepts such as emergence, compression, knowledge, generalization, etc., these conclusions should be easy to understand.

A special reminder:

If you are a practitioner in the AI ​​field, the above understanding of generalization can guide your work well. For example, techniques for selecting training data sets can be deduced from this conclusion. At the same time, this conclusion also allows you to understand training error (Loss) from a new perspective. We will discuss the details at another time later.

Anyway, when you understand what intelligence is, you can understand the next question——

Question 6: What did OpenAI do right?

A lot has been said about the success of OpenAI, but you must have never seen what I’m going to say below (if you have, pretend I didn’t say it).

Let’s take another look at that compression software…

In question 5, we said that the little compression software in your computer is the most primitive agent. We also said that it is primitive because it has a limited number of compression algorithms (knowledge), which causes it to mainly do It is reduction rather than generation .

We can't help but ask, what caused the tragedy of compression software? Is all this a distortion of human nature or a loss of morality?

In fact, the biggest problem with compression software is that its compression method (knowledge) is written into the code by the developer. This means that it does not have too many compression algorithms (knowledge), only a few.

At the same time, in order to have commercial value, compression software is originally designed and developed for the purpose of "accurate restoration". You obviously don't want to decompress "Beauty and Handsome Guy.rar", but what you get is "Calabash Baby Complete Works.mp4" .

Compression software does not have the "blessing" of having vast amounts of knowledge, nor does it have the "mission" of generating data.

Compared with compression software, the biggest feature of AI engineering is that it has built a training architecture that allows the large AI model to find possible compression methods (knowledge) on its own. This architecture has two advantages:

  • First, as long as the computing power and data are enough, it can search for all compression methods (knowledge) as much as possible, which will include a large number of compression methods (knowledge) that humans have not found;

  • Second, because AI looks for compression methods (knowledge) from data on its own, AI naturally has the ability to reversely use compression algorithms to generate!

Before the emergence of ChatGPT, the AI ​​training architecture was already like this. Based on this architecture, many AI applications were born, such as face recognition software and translation software. Their intelligent performance far exceeded that of compression software, so that in the past 10 years, It is called the AI ​​1.0 era.

But why are they far less impressive than ChatGPT? Why do we say that ChatGPT has opened the AI2.0 era?

The reason is precisely that most people underestimate or even realize that the above two advantages are "first."

In the past 10 years, most researchers and engineers have not realized that it is extremely important and valuable to let AI find compression methods (knowledge) on its own. Almost all researchers arrogantly believe that humans should teach AI Compression algorithm (knowledge), not the AI ​​itself.

Why do people think so? Is it stupid? Obviously not.

The reason behind this is: it is much easier to teach AI compression algorithms (knowledge) than to let AI discover the algorithms on its own.

People who have been in management should easily understand this. Sometimes it is easier to assign a job to a novice than to do it yourself.

Especially considering the computing power and data volume required to let AI do it by itself, you will understand the rationality of this choice. Suppose a job costs 10 million yuan for an intern to do, but it only costs 1,000 yuan for you to do it yourself. How do you decide?

Not to mention, you have no idea whether you need 10 million or 100 billion at the beginning. The risk you face is: if you give this job to an intern, you may be able to bring down the company 100 times, but there is no result.

In essence, the researchers were caught in something of a “job search paradox”: You would never ask an intern to do the job unless he had the ability; but if you didn’t let him do the job, he would never Have this ability.

In addition to the above reason, there is a second reason that is equally important:

In the past 10 years, humans have not realized that advanced intelligence must be a generative process, and that generative and advanced intelligence are essentially the same thing. This means that people don’t realize that artificial intelligence must be built with the goal of becoming, otherwise the artificial intelligence will not be intelligent—at least not very intelligent.

Why doesn't everyone realize this? Is it stupid? Neither.

The reason behind this is: People have always mistakenly believed that generation is just one of the business scenarios where artificial intelligence can exert value after it reaches a certain level .

Precisely for this reason, when the generative ability is immature, no one would think of taking the generative ability as the first and primary goal. People will choose to understand and identify these abilities as the goal, because Party A’s father will pay for these mature abilities.

Since generation is not set as the goal, the intelligence of AI cannot be improved. If the intelligence is not improved, no one will consider generation as the goal.

This is another "job search paradox."

To break this paradox, we need a group of people:

On the one hand, they must invest in computing power and data regardless of the cost, so that AI can be given the opportunity to discover compression algorithms (knowledge) on a large scale.

On the other hand, they should not consider returns in business, and should not target any specific usage scenarios, but only target generation as the first and primary goal. Only in this way can AI be given the opportunity to try to reversely use compression algorithms to generate, Then continue to tune.

And this is the story of OpenAI!

Many people do not see the paradox behind the matter, and simply attribute the success of OpenAI to the great efforts to create miracles and the belief in AGI. The truth is that OpenAI has solved two paradoxes.

Especially the second paradox, according to current clues, OpenAI itself did not fully realize the importance of generation until the eve of the launch of ChatGPT - they realized that it was very important, but still underestimated it.

As for the first paradox, it is still difficult for many well-known experts to accept it. As a master in the industry, Rich Sutton once published a famous short article called "Bitter Lesson" in 2019, in which he lamented: AI in the past 70 years The history of development proves that the most effective way is to pursue miracles in terms of computing power.

Sutton used several famous cases in the history of AI to prove his point, but looking at the entire article, at least when Sutton wrote his article, he did not understand why "powerful" can produce miracles.

I don't know if Sutton has figured it out now, but I think you should figure it out by now. Perhaps the most important thing to realize in this entire story is:

Powerful generalization capabilities emerge from large-scale knowledge , and the only way to obtain large-scale knowledge is to let AI find it on its own, which requires computing power.

Therefore, wherever great power can produce miracles, massive computing power is the foundation and prerequisite, just like water cannot boil until it reaches 100 degrees. Of course, it takes courageous people to discover this necessity.

So, are you saying that by letting AI discover massive amounts of knowledge on its own and pile them into a pile, powerful and generative intelligence will emerge?

Obviously not.

This brings us to the last topic of this article -

Question 7: What is a "world model"?

Let's go back to the "picture experiment" again.

A little clarification is needed:

We say that pixels need to reach the size of 32x32 to have meaning, but this does not mean that when the size of 32x32 is reached, pixels will necessarily have meaning. It may just be a "garbage dump" with 32x32 pixels.

Size is a necessary condition, but not a necessary and sufficient condition.

Similarly, if you want advanced intelligence to emerge from a pile of knowledge , you also need to organize the knowledge according to a certain hierarchical structure.

In fact, existing large language models including ChatGPT have completed this action very well.

But is ChatGPT the ultimate form of AI? The next work is to continue to optimize on the original basis, so that AI can dig out more knowledge from language and even pictures (from a coding perspective, pictures are actually a kind of language) and emerge with better intelligence?

Definitely Not.

Why so sure?

As a text-based training model, large models such as ChatGPT mainly master knowledge about the language itself (such as grammar), and knowledge that can be described in language (including mathematical language). This is a great achievement, but ten years later we Looking back today, we will realize that this was only a small beginning.

Because a large amount of knowledge cannot be expressed in words, and the amount is far greater than what can be expressed in words.

Do not believe? Please use words to describe accurately what is meant by: "sad", "happy", "suddenly enlightened", "green"...

This leads to a question, if we can exhaust all knowledge, what form should we use to make this knowledge emerge with more powerful intelligence?

The answer is the world model.

This is roughly a model architecture that organizes large-scale knowledge at a fine level.

The world model is not a very new concept. Researchers have been thinking about it decades ago. So much so that when Yann Lecun, Turing Award winner and Meta's chief AI scientist, proposed that "world models are the future of AI," he was ridiculed by many people, who thought he was "putting old wine in new bottles."

Ilya, chief scientist of OpenAI, believed in the interview that the world model is not a concept worthy of further study. The only thing he admits is that maybe the world model is valuable and efficient, but he still insists that it is not necessary. He even happened to use how AI understands “green” as an example to prove his point.

He said: Even if an AI does not have eyes, it can eventually understand the color green by understanding massive amounts of text.

I think that if Ilya is not tired of being deceitful and misleading competitors, and if he really thinks so, OpenAI is actually very dangerous. Yes, blind people can understand green, but they are still blind after all.

"The most feared thing in this world is exactly 'new bottled wine'" - Rural Teacher Spokesperson Mr. Ma

So, having said all that, what exactly is a world model? Can you define it?

Since humans are still on the road to understanding it, there are many definitions of it, but I think one of them is the best and most concise:

A world model is an internal model of the external world.


what does it mean? What external? Why is it inside again?

Don’t worry, I will still try to help you understand this sentence intuitively.

Think about a question: How do billiard players play billiards?

A billiards player does not need to know the "physics of sphere collision", but he can hit the ball or even win the game by relying only on his intuitive grasp of sphere collision without calculating kinetic energy and momentum.

How did he do this? In two steps:

In the first step, he accumulated an intuitive understanding of the collision and movement of billiard balls through practice (essentially, repeatedly observing the phenomenon of billiard ball collision).

In the second step, before hitting the ball, he did not take out a small notebook to calculate the physical formulas, but simulated the path of the ball and the position after the collision in his mind , commonly known as "going through the movie", and then - swing and hit the ball. ball!

Every professional pool player has a pool table with balls on it in his mind. This mental pool table is a model of the real pool table on the outside inside the head .

The world model is just an enlarged version of the billiard model. It builds a model of the external world inside the head.

Each of us has a model like this in our head.

Just like a billiard player does not calculate physical formulas, but lets the model in his mind "move" to simulate real billiard balls, the same is true when we think.

Imagine, when you think about "how to talk to your boss about a salary increase tomorrow", what comes to mind? ——The boss's office, the boss, yourself..., and then you will let these things "move" in your mind.

In short, your mind is "playing a movie".

In this movie, everything you know about the world is organized into specific hierarchies, such as:

The actions of you and your boss must comply with the laws of physics. Your boss will not fly up with a wave of his hand.

The behavior of you and your boss must conform to the social laws you understand. Your boss will not suddenly take out a box of gold bars and give it to you (even if you really want this to happen).

Even if you have an argument with your boss, it must be physiological. Your boss will not emit ultrasound from his throat.

……

Ultimately you act based on the results of the simulation. Is there language in this process? Yes, but not everything, and maybe not even the most important thing.

By simulating a pool table, simulating an office + boss + yourself, you can compete and get a raise. So what would happen if there was an AI that could exhaust the knowledge of the entire world, organize this knowledge at a specific level in its neural network, and then make the world "move"?

It will have far higher intelligence than existing AI.

How do you understand this "much higher than"?

For example: In literature, such an AI can write a work on the level of "Dream of Red Mansions".

In fact, when Cao Xueqin wrote Dream of Red Mansions, he had a detailed and complete model of Rongfu and Ningfu in his mind, so that when all the fragments of expenditures and inputs in the book were put together, it turned out to be a very clear account that could support Later generations of scholars conducted relevant economic research.

Another example: In science, such AI can discover new laws of physics on its own.

An AI at the level of Einstein + Cao Xueqin...it’s exciting just thinking about it. Perhaps the next criterion for evaluating AI intelligence is whether it can continue to write the next 40 chapters of "Dream of Red Mansions".

This is not talking about science fiction. You have to know that in the eyes of people ten or even five years ago, today's AI is already very science fiction. What you need to know more is that it is optimistically predicted that AGI (artificial general intelligence) will appear in 2027...

Similarly, we can talk a lot about the topic of world models, so let’s stop here and give some valuable conclusions directly here, many of which are my original thinking (others may have said it, but I don’t know):

  • World models are the future, large language models are not.

  • A world model is multimodal, but a multimodal AI model is not necessarily a world model.

  • Video data will become extremely important in the story of the world model. Whether training or generating.

  • The barrier to entry for world models is likely to be low.

  • The world model must be a collaborative structure between large and small models, and in many places, small models direct large models, not the other way around.

  • "Brain-like architecture" is a possible option for the world model, but it is certainly not the end result.

  • OpenAI's RLHF (Reinforcement Learning Based on Human Feedback) for ChatGPT is essentially building a rudimentary world model - it tells the AI ​​about human social taboo knowledge in a "human flesh" way, and requires the AI ​​to This knowledge is placed below other knowledge and becomes its underlying "belief".

  • The world model is not the end of AI. Later intelligence will emerge in higher dimensions, such as emerging from the collaboration of 10 million agents.

  • The idea of ​​evolution needs to be introduced more into the field of AI. So does emergence, complexity science, brain science, psychology, etc.

  • The world model is "simulated" rather than calculated. It will have a hierarchical and intuitive understanding of the various laws of the world.

The article is written here, showing a wonderful closed loop:

At the beginning, we set out with the goal of "understanding AI intuitively". After a long detour, we came back and found that the endgame of AI is precisely to "let AI understand the world intuitively."

Regarding the issue of world model, we will discuss it further in a later article. The only thing that needs a few more words here is:

If you are a company that claims to be committed to large models, please think seriously about the topic of "world models" and don't put it aside just because it is still far away.

It is important that you consider it from multiple dimensions such as computers, cognitive science, and brain science.

In fact, once you get involved in this field, you are ahead of the game. Because even Ilya from OpenAI or Yann Lecun from Meta still have little understanding of this issue.

Yann Lecun recently proposed a world model architecture, which is essentially a brain-like architecture, which is very inspiring. The only problem is that his understanding of brain physiology is not very deep. As a person with "three majors" in AI, brain science, and psychology, I can really make this evaluation.

It is also important to note that if you are really committed to building large models, you may have to resist the idea of ​​engaging in AI applications. You should make your AI have more general business value, such as generation, intent realization, etc.

But don’t pursue AI applications for very specific business scenarios. The main reason is not because of what kind of ecosystem you want to build, but because there is a core judgment: the existing AI capabilities are far from reaching the upper limit, and are likely to soon surpass the next level. threshold.

When you realize you can make missiles next year, don't spend so much time selling bows and arrows this year.

On the other hand, companies that do not have the intention/ability to build large models should make AI applications in a down-to-earth manner. You must believe in the power of open source and not worry that large model companies will erode the application layer. Since you can't invent a "locomotive", then start a company that "lays railroad tracks". Historically, the latter is often equally profitable and has a higher success rate.

5

A summary and conclusion

It is a huge challenge to clearly explain the issues involved in this article in the most concise language. During this process, my own understanding of AI has also deepened a lot.

Questions 4 and 5 were inspired by an interview titled "Compression for AGI" conducted by Jack Rae, OpenAI's R&D director. Jack's perspective is extremely illuminating, but at the same time, I think he's seriously wrong, so I disagree with him on some of my points.

Some of the ideas in this article can be explained from information theory, but I think they are not necessary for ordinary people, and they lack intuition and are micro rather than macro.

This article expresses opinions on the development trend of AI. These opinions do not represent any commercial advice. Please use them at your own discretion.

The discussion in this article focuses on large language models (LLMs), but it also applies to the understanding of image generation models. In fact, this article reveals the underlying unity between the two. There will be articles to discuss this further later.

Due to my stupidity and limited energy, many topics in this article cannot be fully developed. I can only give some conclusions directly, and leave the rest to be discussed in another article later. Forgive me.

This article actually has an important implicit issue that has not been discussed, and that is the issue of "observer and observed object" - are you aware of it? "Emergence" requires an observer. And this is precisely the underlying logic that I believe intelligence must be a generative process . I will talk about it in detail later.

If you want to quote the views of this article, please cite the source, because my views are likely to be wrong. ——Yes, everything I said is wrong.

Intuitive understanding is the first understanding.

Richard Feynman was right.

Who is Wenjun?
A serial entrepreneur in the technology field, GPT strategic advisor to two Chinese and American AI companies, former head of Alibaba’s early mobile core products, and CEO mental and strategic coach.

What is MindCode?
A very niche public account that just decided to write something good. Focus on: AI, brain science, psychology, entrepreneurship, etc. Due to his in-depth thinking in several related fields, there are many top talents among his followers, such as xxx and xx.

In the age of AI, it is important for you to read what others have not read. Send "1" to join the discussion group.

Guess you like

Origin blog.csdn.net/fogdragon/article/details/132703598