QPS increased nearly 10 times! Interpretation of Wenxin Yiyan Full Moon Transcript under the Blessing of Flying Paddle

Recently, various domestic AI-related manufacturers, who have been holding the pipa half-hidden, gathered together to release large models. For a while, hundreds of "models" fought, and good and bad people mixed together.

Previously, as the world's first major manufacturer to officially release large-scale models, Baidu Wenxin's every word and every move has become the focus of the industry.

Just on April 19th, one month and three days after Wenxin Yiyan was released, a screenshot of "Baidu Flying Paddle Customizing and Optimizing Wenxin Yiyan Regular Meeting Minutes" was exposed, attracting countless attention.

Three statistics , eye-catching performance

From the minutes of this meeting, we can see that Baidu Fei Paddle has iterated 4 times within a month after Wenxin Yiyan started the invitation test, and has now iterated to version 3.5. It is the first distributed reasoning engine in the industry that supports dynamic insertion. Stand-alone QPS (query rate per second) is 123% higher than the online version. In terms of data performance, there are three effects of the joint optimization of Flying Paddle and Wenxin Yiyan:

1. Model inference efficiency increased by 10 times: Compared with the first version of large model inference service, the cumulative QPS of a single machine has increased by nearly 10 times, which means that the cost of large model inference is reduced to 1/10 of the original, in other words, it can be concurrently reduced to the original 10 times the number of users to provide services;

2. The performance of model reasoning is improved by 50%: the improvement of performance means the improvement of model effect, which also confirms that Wenxin Yiyan has evolved faster and learned faster and better;

3. The computing power utilization rate of the model is doubled: it shows that the flying paddle framework is co-optimized with the training and deployment of large models. Only by stimulating the potential of the chip can the utilization rate of the model computing power be improved.

All of these are the result of the collaboration between Wenxinyiyan and Flying Paddle deep learning framework. Through the performance of these data, we can also discover two deeper connotations, perhaps that is where the large models of various companies will finally fight the bayonet.

Full-stack layout, autonomous security

The quality of technical indicators will reflect the importance of the top-level design strategy behind the technology to a certain extent. In the era of artificial intelligence, fundamental changes have taken place in the IT technology stack, from a three-tier architecture to a four-tier architecture of "chip-framework-model-application". At the same time, how to ensure the security of large models, which are recognized as core assets in the era of artificial intelligence, has become a top priority. Doing a good job in independent and self-research of the four-tier architecture has become a feasible path.

Globally, there are few companies with leading products at each layer of the four-tier architecture. With a four-layer full-stack layout, you can completely control the initiative of the sustainable development of the large model in your own hands, which means building a technical moat.

From the high-end chip Kunlun Core, to the Fei Paddle deep learning framework, to Wenxin Yiyan, to search, smart cloud, autonomous driving, Xiaodu and other applications, Baidu has a layout for each layer, and also has a wealth of terminal application scenarios . The full-stack layout and full-end coverage provide sufficient nutrients and soil for the learning, growth, safety and sustainability of large models.

Formwork and frame integration, a match made in heaven

To borrow an engine, for example. If the large model is the engine, the framework is the engine manufacturer, which can make the combination of the various components of the engine more precise, more powerful, and iteratively evolve independently. Conversely, if there is no framework for training, reasoning, and collaborative optimization, the large model is like an engine that cannot evolve autonomously, and the power may not be strong.

The importance of AI frameworks to large models has also been shown in the industry. Light Years Beyond, a large-scale model company founded by the former co-founder of Meituan Wang Huiwen, has reached a merger and acquisition intention with AI framework startup company Oneflow, a first-class technology company, with the intention of making up for the shortcomings at the framework level.

There are only a handful of large models and frameworks in the industry. Most manufacturers or frameworks use Tensorflow, PyTorch, or do not have their own large models. It is said that deep learning frameworks such as Google and Meta are not designed based on large models, so that TensorFlow and PyTorch cannot be directly applied when large model requirements arise, but plug-ins must be developed on the basis of them.

Like Baidu, Wenxin Yiyan and Flying Paddle's deep learning framework are from the same family, and they work together to complement each other.

On the one hand, the rapid iteration of Wenxin Yiyan is a kind of stimulation and feedback for the development of the framework layer, chip layer, and even the application layer.

On the other hand, in addition to the engine example mentioned above, Flying Paddle, as an open source distributed architecture launched by Baidu in 2016, was born to cope with parallel GPU training. Large-scale distributed training has always been a very distinctive function of Paddle, realizing the parallel training technology of hundreds of billions of sparse features, trillions of parameters, and hundreds of nodes. For example, it supports a wide range of parallel modes and acceleration strategies including model parallelism and pipeline parallelism, and introduces the industry's first general-purpose heterogeneous parameter server architecture, 4D hybrid parallel strategy and end-to-end adaptive distributed training technology. The development trend of large-scale distributed training technology.

Baidu's advance layout on the deep learning framework and its unique vision provide a guarantee for the development of large models one step ahead.

Facing the future, what we can meet is that thousands of "models" and tens of thousands of "models" will continue to emerge rapidly, and who will be the last "model king" depends on the capabilities of full-stack layout, independent security, and model framework collaboration. .

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/130331535