New changes in the era of large models

The technological breakthrough of large models has opened up a corner of the new AI universe for human technological vision, allowing human beings to immediately imagine a kaleidoscope-like future. When belief becomes a force, it will promote immeasurable leaps in development. At the Jiuzhang Yunji DataCanvas new product launch conference, Fang Lei, chairman of Jiuzhang Yunji DataCanvas Company, explained the unique world view of large models from the perspective of an AI technology enterprise.

Insert image description here

Fang Lei, Chairman of Jiuzhang Yunji DataCanvas Company

Speech Record

Thank you, host. I am very happy to have such a time this afternoon to share with you our recent thoughts, our recent work, and the exciting answers that are about to unfold next. Thank you again to all the leaders, guests, friends and online friends who are here today. Welcome to watch Jiuzhang Yunji DataCanvas’ New AI, New Data, and New Software new product launch today.

Today’s topic undoubtedly revolves around large models. Large models are the hottest topic today and a direction that everyone is concerned about. There are many opinions on large models, and there are many opinions. Jiuzhang Yunji DataCanvas will first talk about our world view. Large model How do we see it? There is a lot of logic behind the industrial development of large models. The most important thing is how to treat large models.

For us, there is a fundamental point about large models: large models require a complete infrastructure upgrade. A large model does not mean that if you have a model today, it will automatically help you solve certain problems. Infrastructure is actually a very important thing. It may have taken us humans 100 years to have infrastructure like electricity spread across every village, and it may have taken us 50 years to have infrastructure like highways spread across every village and town. Maybe it took our Internet twenty or thirty years for everyone to have access to the Internet. At a press conference two years ago, I talked to everyone about a point of view. We believe that software is also infrastructure. There is no doubt that in the era of artificial intelligence, software is an infrastructure, and for large models, software is especially important. infrastructure. We say that large models require the upgrade of complete infrastructure, which includes software, hardware, transmission and other various conditions. It needs to be upgraded at any time with technological changes, so that practical problems can ultimately be solved. Later I will focus with you on how we think these infrastructures have changed.

Second, you may have some misconceptions caused by large models. Large models have become more powerful. It is very simple for us to solve the problem. We only need to interact with large models like ChatGPT to solve the problem. Is that true? There are some simple questions, such as writing a short essay for you. But we need to solve more difficult problems, problems that have a profound impact on society. They are not simple summaries and text descriptions. With large model technology, the problems to be solved have become more complex. For example, for the first time, we have realized the control of a robot through multi-modal technology of large models, and unified the movement, thinking, feedback and expression of the robot. We can really build a robot that works like a human. This challenge lies in It was difficult to achieve before. Regarding these profound changes and impacts, in fact, although the capabilities provided by large models have become greater, so have the challenges. Don’t simply understand it as an easier thing. If we want to truly use the technology of large models, we need to upgrade various infrastructures. At the same time, don’t simply understand it as an easier thing. In fact, it is a more difficult thing. .

Difficult things must be solved in good ways. Facing the challenges of large models and wanting to use large models to do more impactful things, let's take a look at how the infrastructure should change. Infrastructure includes many parts, including highways and power facilities. Not to mention these, they are the foundation of the entire civilization. We believe that there are three aspects of the most relevant infrastructure for large models:

The first aspect is computing power. There is no doubt that computing power is the basis of the model. Everyone knows that the model comes from data + algorithm. After calculation, it becomes a model. This model is the crystallization of some kind of intelligence and the embodiment of intelligence. Computing power is of course indispensable. Only with computing power can we process data into models. Everyone knows that there is a popular view: models are the compression of historical data. For our future affairs, by asking the model and using the model, we can find some similar rules and get the answer.

The second aspect is basic software. When you have hardware, which we call Robo power, how do you use it? We have a high-performance computer and hope to have a good operating system. We use a high-performance computer to write articles and hope to have good office software. In fact, the basic software is the carrier of the work you want to do. In the era of artificial intelligence, Jiuzhang Yunji DataCanvas is positioned as a supplier of basic artificial intelligence software. Basic software has undoubtedly become more important in the era of large models. It represents the direction of algorithm evolution and the effectiveness of hardware. If you install a very slow and crashing operating system, use the latest computer and the latest CPU, the evaluation may still be very bad. Basic software largely determines the effectiveness of algorithms and hardware.

The third aspect is data. How can data be stored and calculated more efficiently? For artificial intelligence models, data is the raw material and the source. Even in the future, data will not only be the source of models, but also the object of model services. In addition to calculation and storage, data also needs to be shared. In the era of big models, we have different ranges of data, including social data, industry data, and enterprise data. If the data can be better shared and interacted across boundaries, , the resulting model will also be more intelligent.

Insert image description here

These infrastructure changes are what we are looking forward to, and in the era of large models, we believe that we need to achieve the two basic world views just mentioned and solve more difficult problems. Large models must deal with greater challenges. These infrastructure changes All we need.

The new Moore's Law of computing power. Everyone knows the term Moore's Law, but it actually means something similar. I’ll give you an example here. As you can see, this picture is actually the cost reduction caused by computing power predicted by a consulting agency. Computing power is very challenging. Now everyone wants to obtain computing power. The cost of computing power is very high, but it drops rapidly as a function of time. In 2020, when we train a model like GPT3, the computing power overhead required is 460 million US dollars, while it dropped to around US$450,000 at the end of last year, a drop of an order of magnitude. There were many news reports yesterday. CoreWeave, a cloud company that provides GPUs in the United States, cooperated with NVIDIA to conduct GPT3 model training on 3584 H100 state-of-the-art cards. The training was completed in only 11 minutes, which turned out to be several months. Things turned into a matter of 10 minutes or less than an hour. This is a shock in terms of time, how much does it cost for these 11 minutes? At today's CoreWare charging standards, it's $20,000. On June 30, 2023, a link between the past and the future, only 20,000 US dollars can be used to train a GPT3 model.

The new Moore's Law of computing power improves performance by one order of magnitude every 18-24 months, and costs drop by one order of magnitude. This is a completely no exaggeration. You can see this standard. In 2020, BERT-Large is still relatively large. model, the BERT model only takes 0.13 minutes and is trained in about 8 seconds. There are more than 3,000 cards in this test. Some are breaking records for the sake of breaking records, but you can clearly understand that in today's era of the new Moore's Law of computing power, computing power is in short supply and is very needed, but computing power is The growth, performance improvements and cost reductions are equally astounding, making for a great infrastructure change. We will have abundant computing power and will not live in an era of computing power shortage. We may face a period of computing power shortage, but in the era of large models, this is an era of abundant computing power. Of course, we need to invest in construction, but The development of technology will give us ample computing power to build better, more powerful, and more flexible large models.

Insert image description here

Let's take a look at the data based on the computing power. The more exciting news here comes from the general large model. From our judgment, many people in the industry agree with our judgment: industry and vertical categories The number of large models will greatly exceed the general large models. Simply imagine, a person who graduates from college may be relatively general and has learned world knowledge. He then goes to work in an aircraft factory and enters the industry to learn industry knowledge. Knowledge still has boundaries, and data has boundaries. After learning knowledge, he obtains data. From world knowledge to industry knowledge, I finally have some accumulation in the company, and even the secrets of business management, that is the knowledge of the company.

In this world, we naturally have general world knowledge, industry knowledge, and enterprise knowledge. These knowledges have boundaries, and these data have boundaries. When different enterprise organizations use these large model capabilities, there are naturally boundaries. It is easy for us to come to this conclusion, and we believe in this judgment. If large industry models and large vertical models are used within enterprises and in industries, the final number will greatly exceed that of general large models. We judge that the computing power it consumes will be much greater than that of a general-purpose large model. Although everyone thinks that OpenAI's model is a general-purpose large model, it even dominates everything. Within the boundaries of data, this is not the case. The implementation of large models will be more reflected in industries and enterprises. This is caused by our belief that data has boundaries.

During the changes in infrastructure, we hope that data will flow and bring new applications across certain enterprises and across certain industry boundaries. For example, data can flow. Our model can learn not only world knowledge, industry knowledge, but also part of corporate knowledge, and can be connected in series. This is the change we are looking forward to. In the end, this boundary will exist.

Regarding basic software, we emphasize that software is the core of differentiation. Why? Everyone knows that the performance of hardware is very important. The improvement in computing power just mentioned comes largely from the progress of hardware. Hardware is relatively homogeneous. To put it simply, if you bought an H100 card today, I will buy it too. Together, we may be similar from a hardware perspective. On the contrary, if you really train it, the final results may be very different. I succeeded in training, but you failed in training. I trained for 1,000 hours to complete, and you trained for 200 hours. Your model is not as smart as mine. Software determines performance and cost. Under the same hardware conditions, software is the key to determining performance and cost differentiation.

From another perspective, if we need a better, more flexible and more powerful model, under the same hardware conditions, the optimization space provided by software for hardware is huge. Everyone knows that large models are based on the Transformer Attention mechanism. Transformer was invented only a few years ago, and current hardware may not have optimized the Transformer structure. There is a huge space for unified optimization of software, hardware and models. In the huge space for unified optimization, I believe the main driving force for innovation still comes from software. Software will further adapt to hardware and accelerate proprietary algorithm structures. Of course the hardware will also improve.

We believe that software will reflect the differentiation of current hardware, and it will be the most active place for innovation in conjunction with the development of hardware to improve our performance. We just talked about the new Moore's Law of computing power, which will increase by an order of magnitude every 18 to 24 months, coupled with the acceleration of software. In fact, OpenAI mentioned a similar statement: every 18 months, our software will double its speed. It seems that it is not. 10 times is so exciting, but if you increase the speed by 1 times above 10 times, it will be 20 times, which is also very fast. Based on the improvement of computing power, software will bring differentiated power, which is needed by end users, end customers, and end businesses. A more efficient AI infrastructure will make virtual training smarter. model.

If we assume that the data is getting better and better and can open up some boundaries, our computing power will become faster and more abundant along the new Moore's Law, and our software will become more and more optimized. If everything gets better, eventually What are the challenges of getting it off the ground? I want to focus on the last mile. In the era of small models, it is more difficult to implement small models in the last mile because there are many changes in data, including insufficient generalization of model capabilities. Everyone says that adaptation in the last mile is troublesome. The big model has arrived, but has the last mile disappeared? We have had many waves that have made the last mile shorter and simpler, 995 meters or even 95 meters. However, the last mile will not disappear and will still be a challenge for the implementation of large-scale model technology. In the last mile, our knowledge, whether it is world knowledge, industry knowledge or corporate knowledge, as well as software and hardware, still needs to be combined with the business, and the space for this combination is actually the space for innovation. We do not expect a model to automatically bridge all the gaps after being trained on some historical data. We believe that this possibility is not possible. There is still the last mile. It is not a simple use-ism and buying large models to solve all business problems. There is no such possibility. In reality, there are so many smart people in the world. As a species, humans have given birth to extremely smart brains such as Einstein and Newton. There is no situation today where everything can be solved by just hiring one person to come to this company. This is not a problem. realistic.

Large models are very similar. At the boundaries of knowledge and data, large models are not simply borrowed. They still need to solve the last mile problem. How to solve the last mile? Benefiting from the three elements just mentioned, more computing power, cheaper computing power, and a very flexible and open white box model allow us to adjust to the company's situation, adapt it to the company, learn the knowledge of the company, and ultimately For the use of this enterprise and for users, we call it an open and elastic white-box model, making the process of crossing the chasm in the last mile simpler and cheaper. Similarly, we need practitioners who understand the business. This practitioner may not be a very sophisticated algorithm expert before, but must be an analyst who understands the business, or even a business practitioner himself, but he is still an indispensable person to cross the gap. the elements of. Powerful and flexible basic software, an open and flexible white box model, and practitioners who understand the business can finally bridge the last mile gap together. We must have a clear understanding that the implementation of large models is still the biggest challenge. Every step of the work we do is to make the last mile surmountable and simpler.

Let’s look at a macro picture. We believe that there are three major driving forces for the artificial intelligence industry. Computing power construction is a very important force, and the large-scale models discussed today are also driving forces. I would also like to add another very powerful force, which I call the Central Enterprise Cloud. The cloud computing market has gone through public cloud in the United States, and similar public cloud in China does not seem to be very successful, or seems to still need to be developed. Now our domestic cloud computing market has actually entered a new era. In this era, for example, central enterprises are the core. Enterprises have built their own clouds and have clearly occupied their own position in the market, even occupying a major position. Some of their operating methods, customer acquisition capabilities, and construction scale are somewhat different from before. Today is not a special session on cloud computing, so I won’t go into details. Everyone realizes that our cloud computing market is undergoing profound changes and is undergoing an upgrading. Computing power is undergoing an unprecedented large-scale construction, and large models have also brought us an unprecedented technological change. When these forces come together, it is indeed an unprecedented change. Jiuzhang Yunji DataCanvas is an opportunity, and it is a huge opportunity for all people, all companies, and all individuals.

I summarize these driving forces into what we need to do, what actions we need to take, what should Jiuzhang Yunji do? We hope to embed our basic software capability AI Foundation Software, through the cloud-in-cloud strategy, as a core capability into these cloud vendors in the market, into intelligent computing centers, and regard the intelligent computing centers as GPU clouds. Many partners in the cloud market focus on improving the GPU cloud of central enterprises. This is the cloud-in-cloud strategy that Jiuzhang Yunji has communicated with everyone many times before. We will use our cloud-in-cloud strategy to provide one-stop services with partners such as cloud vendors and intelligent computing centers, thereby realizing the transformation from AIFS (AI Foundation Software) products to AIFS (AI Foundation Service) services.

Guess you like

Origin blog.csdn.net/weixin_46880696/article/details/131837898
Recommended