Let's talk about the big model

A technology with a major breakthrough usually goes through the following stages from its appearance to its popular use:

1. Blind worship in the initial stage. The characteristics of new technologies are completely beyond people's traditional experience and cognition, so they are regarded as "miracles". Both the technology and the people who create the technology will be highly praised and dazzling.

2. Chaos in the follow-up phase. Of course, it will also attract many followers, all trying to get a share of the blue ocean created by new technologies. The prosperity of the world is all for profit; the world is all for profit. Some of the hype hotspots are not bad, some are trying to make a big noise to show their presence, and some are trying to push up the stock price.

3. Questions in the post-hot stage. The birth period of a new technology must be imperfect. It may be reflected in high cost, instability, or poor performance. There are many kinds of things, so people will attack and magnify doubts by grasping small problems, so some people gloat and let it go. in this way.

4. The silence of the product grinding stage. In the spotlight, it is difficult to polish the product. The noise of the stage and the hype of the media will make people feel giddy. Only when the tide of hype recedes, those who truly realize the value of new technologies and are committed to using new technologies to change the world can continue to invest and make continuous efforts.

5. The destruction of the success stage. When the technical shortcomings can be made up and the products can better integrate and apply new technologies to play a role, not only achieve technical success, but also achieve commercial success, so the spotlight of the flower salute once again welcomes the winners.

Electricity in the industrial age, e-commerce, blockchain, and metaverse in the Internet age are all the same. Similarly, large models are also on this road.


In the field of AI, currently the most popular and successful are neural networks. Neural network is an interesting thing. I came into contact with it when I was in college, and helped my seniors write a few pieces of implementation code. Then I changed my direction and went to the field of application software. It has been criticized for its lack of scientific innovation. And the brother who has cultivated deeply on this track is dedicated to the research of robots, and was promoted to an academician of the Academy of Engineering a few years ago. Sutskever, the chief scientist of openAI, came into contact with neural networks at about the same time. He has continued to research and practice for 20 years, and now stands on the top of the field in the world.

If you open the mechanism of the large model in a visual way, you may see a bunch of circles and connections. The circles represent neurons, which serve as computation and storage units. The connecting lines correspond to different weights, also known as parameters, which are adjusted through training. The neuron performs function operations on the input and the corresponding weight to complete the output. If the output is not ideal, adjust the weights until satisfactory.

Obviously, a single neuron can't achieve too much ability, it's just a simple classification. But if the number and layers of neurons are continuously expanded and deepened, magical effects will appear.

More is different, quantitative change leads to qualitative change.

The number of GPT1 neural network layers is 12 layers, and the parameter scale is 150 million; the number of GPT2 layers doubles to 24 layers, and the parameter scale rises to 350 million. When GPT3 reaches 96 layers, the parameter scale changes exponentially to 175 billion. The number of parameters is more than one trillion.

As the number of layers gets deeper and the size of the parameters larger, the neural network exhibits mysterious effects and becomes as if it has a mind. After training with high-quality corpus and adjusting the parameters in place, the large model not only understands human expression, but also responds in a logical way that fits the scene.

Although the machine does not really understand, it recognizes and generates corresponding content in a statistical and probabilistic manner by vectorizing text representations and deriving based on neural networks. Combined with the blessing of the attention mechanism, large sections of text can be integrated and presented in a logical and reasonable manner.

The most important thing is that after training, the comprehension ability, generalization ability, and reasoning ability displayed by the large model have reached the human-like level. And once the machine reaches the human-like level, it has general intelligence, coupled with the unlimited storage of the machine (another key is the compression of knowledge information), unlimited input, restless energy, and continuous evolution, the whole Societies can indeed change dramatically.


After the big model came out, it caused a lot of worries, the worry of being replaced by the big model. There are also many people who see opportunities to reshape products and even industries based on large models.

openAI is at the forefront, and the publication of features and papers reveals the direction of the technical route. Walking on a relatively clear path to success can shorten the time required for success.

However, in the competition for large models, only 3-5 companies may win in the end. This is closely related to financial resources and scientific research strength. Putting a shell on the open source and claiming to develop a large model launched by self-developed, fools and fools themselves, either for political achievements or stock prices, is ultimately a benefit.

However, applications based on large models, or applications reshaped by large models, are bound to bloom and be colorful.

Applications that can be directly reconstructed by the large model are intelligent question answering and consulting chatbots, which can not only train large models of vertical industries, but also use external vector database management industry knowledge to generate answers combined with general large model summary capabilities. Compared with the previous knowledge base and word segmentation search, there will be a better experience improvement.

Then there is the application of automatically generating marketing copy based on the ability of AIGC. This type of application was once popular, but the core value comes from large models, the threshold is not high, it is easy to be copied, and it is difficult to last for a long time. The early star company Jasper is a typical example.

However, if the generalization ability and reasoning ability of the large model can be effectively combined and integrated into industry applications, it is possible to make the large model shine.

Industry applications usually talk about the DIKW model, data-information-knowledge-wisdom. The expression is processed from the original recorded data, finds the connection to form information; then refines, summarizes and summarizes a large amount of information to generate knowledge that can reflect the essence of things; and combining information and knowledge to perform deductive reasoning can rise to wisdom.

For example: (20, 24, 50...) is the original messy data, "the installation and maintenance personnel Zhang San's construction volume is 30 sheets a day" is the formed readable and understandable information, which can be seen from the construction situation of a large number of installation and maintenance personnel Summed up the knowledge that "the normal construction volume per day is 15", based on Zhang San's busyness, how to dynamically dispatch other personnel to undertake new construction orders requires corresponding wisdom.

Large models are good at reasoning after training based on domain knowledge. For example, order dispatching involves a series of domain rules, and dispatching orders is based on different dimensions such as skill, region, nearest route, work busyness, and satisfaction. After feeding these rules to the big model, the big model can be turned into a dispatch engine.

In addition to installation and maintenance scheduling, there are many similar scenarios, such as fault diagnosis, project cutover, and so on.

In the traditional mode, different scenarios need to write different scheduling engines. If the general intelligent analysis can be completed based on the large model, the generated answer steps are decomposed to form a thinking chain, and then based on this, the scheduling is arranged, and the data query or service call of the enterprise IT system is integrated, and the intelligent brain of the enterprise will naturally form, and through With continuous learning, the coverage of the enterprise's production process will become wider and wider.

In this way, the enterprise-level IT structure will undergo essential changes. Based on the large-scale intelligent brain, the business strategy and management rules will be continuously enhanced and enriched in the center, and various business systems of the surrounding executive bodies will be uniformly commanded.

There is a brain in the cloud, and a copilot assistant at the edge to change the interaction form of the existing IT system. It is no longer necessary to search for the modules to be used in the hierarchically deep menu, and the simple dialogue method tells the system what you want to do, and the copilot understands and executes the corresponding actions to complete the production logic, or summarize and present the report. The feeling of having a capable assistant by your side can make you soar.


With the free commercialization of Baichuan, GLM, and LLAMA, it is believed that industrial applications based on large models will be further accelerated.

Guess you like

Origin blog.csdn.net/whalecloud/article/details/131933913