Huawei's latest large model is here! Pangu 3.0 came out, with a scale of 100 billion parameters and 3 trillion tokens, saying "do not write poetry but do things"

Source | Qubit | Public Account QbitAI

Finally, the trend of Huawei's large model is here!

Pangu Large Model 3.0 is officially released today.

The bottom-level basic large model includes four versions: 10 billion parameters, 38 billion parameters, 71 billion parameters, and 100 billion parameters. Over 3 trillion tokens were used for pre-training.

picture

But unlike earlier rumors, Pangu Large Model 3.0 is not the Pangu version of ChatGPT, but an industry-oriented large-scale model series.

In Huawei's own words:

The big model of Pangu does not write poetry.

(And there is no cue to the keyword "generative" in the whole scene)

Therefore, in the usual on-site demonstrations, Huawei also sent large-scale models of the industry to play.

For example, let the government affairs model judge which vehicles in the photo violated the regulations besides the truck. You can see that the large model marked three cars and gave the reason.

picture

At the same time, the Ascend AI cloud service, which can provide a single cluster of 2000P Flops, was launched simultaneously in Ulanqab and Gui'an.

"Large models with a scale of 100 billion have the ability to emerge and think chain"

Pangu Large Model 3.0, who doesn't want to write poetry, wants to be industry-oriented.

This can be felt from its structure. The Pangu model 3.0 is divided into three layers:

  • L0: Basic large models, including natural language, vision, multimodality, prediction, and scientific computing;

  • L1: Large models of N industries, such as government affairs, finance, manufacturing, mining, meteorology, etc.;

  • L2: A more detailed model of the scene, providing "out-of-the-box" model services

Among them, the basic large model of the L0 layer is responsible for providing general skills.

Models can be divided into natural language large models and multimodal large models. Capabilities cover dialogue question and answer, copy generation, image generation, image understanding, etc.

picture

The pre-training data contains more than 3 trillion tokens , using more than 1000+TB of data for training, and the instruction fine-tuning data is also in the tens of millions .

picture

And the Pangu basic model is a scalable and highly scalable sparse-dense language model.

The 100-billion-level dense model already has the ability to emerge and think chains, forming a base; through sparseness, it can become different "industry experts", which can make the reasoning process more efficient.

picture

The L1 layer consists of N large industry models.

In this regard, Huawei has used industry public data to train general-purpose large-scale models in multiple industries, such as government affairs, finance, manufacturing, mining, and meteorology.

For example, in terms of meteorology, the latest results of the Pangu Meteorological Large Model have just been published in the main issue of Nature, and it only takes 1.4 seconds to complete a 24-hour global weather forecast.

In addition, the industry can change or train its own dedicated large model based on its own data on the basis of L0 and L1.

picture

The L2 layer provides a more detailed scene model, which focuses on "out of the box". It can focus on specific industry applications or specific business scenarios such as government affairs hotlines, network assistants, leading drug screening, foreign object detection on conveyor belts, and typhoon path prediction.

It is understood that in order to quickly adapt and quickly meet the needs of the industry, the Pangu large model adopts a complete layered decoupling design.

On the basis of the L0 and L1 large models, HUAWEI CLOUD also provides a large model industry development kit. Through secondary training on its own data, you can have your own exclusive industry large model.

At the same time, according to different data security and compliance requirements of customers, Pangea Large Model also provides public cloud, large model cloud area, and diversified deployment forms of hybrid cloud.

Huawei has built an AI computing power cloud platform based on Kunpeng and Ascend at the bottom layer, as well as the heterogeneous computing architecture CANN, the full-scenario AI framework MindSpore, and the AI ​​development production line ModelArts, which can provide distribution for the development and operation of large models. Key capabilities such as parallel acceleration, operator and compilation optimization, and cluster-level communication optimization.

Based on Huawei's AI root technology, the performance of large model training can be adjusted to 1.1 times that of mainstream GPUs in the industry.

picture

HUAWEI CLOUD's 2000P Flops single-cluster Ascend AI cloud service was launched simultaneously in Ulanqab and Gui'an.

The disclosed data shows that the 30-day long-term stability rate of the kilocalorie training of the Ascend Cloud AI service reaches 90%, and the breakpoint recovery time does not exceed 10 minutes.

In addition to supporting Huawei's AI framework Mindspore, it also supports mainstream AI frameworks such as Pytorch and Tensorflow. 90% of the operators in the framework can be smoothly migrated from GPU to Ascend through Huawei's migration tool.

For example, Meitu migrated 70 models to Ascend in just 30 days. At the same time, HUAWEI CLOUD and the Meitu team jointly optimized more than 30 operators and accelerated the process in parallel. The AI ​​performance was improved by 30% compared with the original solution. .

The meteorological large model is published in the main issue of Nature

After demonstrating the basic capabilities of the Pangu Model 3.0, Huawei also disclosed data on a series of industry applications of the Pangu Model.

Recently, the Pangu meteorological model appeared on the news of Nature.

It is reported that the Pangu Meteorological Large Model is the first AI forecasting model whose accuracy exceeds traditional numerical forecasting methods, and the forecasting speed has also been greatly improved.

It turns out that predicting the path of a typhoon in the next 10 days requires 5 hours of simulation on a high-performance computer cluster with 3,000 servers. Now, based on the pre-trained Pangu meteorological large model, through AI reasoning, researchers only need to configure a single card on a single server, and can obtain more accurate prediction results within 10 seconds.

picture

In the field of drug research and development, the original research and development of a new drug takes an average of 10 years and costs 1 billion US dollars. The large molecular model of Pangu drugs helped the team of Professor Liu Bing of the First Affiliated Hospital of Xi’an Jiaotong University discover the world’s first new target and new class of antibiotics in 40 years, and shorten the lead drug development cycle to one month and reduce the development cost by 70%.

In the field of railways, the large model of Pangu Railway can accurately identify 67 types of freight cars and more than 430 types of faults running on the current network, and the screening rate of fault-free pictures is as high as 95%. It is liberated from the detection of millions of "picture seas" every day.

picture

Zhang Pingan, executive director of Huawei and CEO of Huawei Cloud, gave the most concise summary of the latest trends:

The Pangea Large Model will allow every industry, every company, and everyone to have their own expert assistants, making work more efficient and easier.

We always adhere to the strategy of AI for Industries, and keep moving forward on the road of deepening the industry. I firmly believe that big models will reshape thousands of industries, and every developer will be a hero who changes the world.

picture

Guess you like

Origin blog.csdn.net/lqfarmer/article/details/131742505