In the GPT era, looking for the computing power fulcrum of Xunfei Xinghuo large model

Author |

Source | Insights New Research Institute

The "emergence" of large models continues.

At the Zhongguancun Forum held at the end of May, experts disclosed that 79 large-scale models with a parameter scale of more than 1 billion in China have been released. At the just-concluded World Artificial Intelligence Conference, another batch of large-scale models were released in batches.

The popularity of large models remains high, and the World Artificial Intelligence Conference even set up an exhibition area with the theme of "Towards General Artificial Intelligence" to showcase a total of more than 30 large models at home and abroad.

The "emergence" of the large model we see is the surface. What is behind the "emergence" of the large model?

At the Ascend Artificial Intelligence Industry Summit Forum, Hu Guoping, Senior Vice President of iFLYTEK and director of the National Key Laboratory of Cognitive Intelligence, demonstrated the various capabilities of the Xinghuo large model on the spot. The Xinghuo large model is extended by the supporting role of the computing power base Become the focus of large-scale model competition.

1. Late comers come first, the large model of Xinghuo squeezed into the first camp

I have to say that HKUST Xunfei still has a keen sense of smell.

Only 15 days after OpenAI released ChatGPT on November 30 last year (December 15), HKUST Xunfei launched a special research on the "1+N" cognitive intelligence model, and more than five months later (May 6) , the Spark Cognitive Model was officially released. One month later, on June 9, the Spark Cognitive Model V1.5 was released.

According to the plan of HKUST Xunfei, there will be two major upgrades to the Xinghuo model within this year, namely:

On August 15th, the code capability was upgraded and the multimodal interaction capability was improved;

On October 24th, the general model was benchmarked against ChatGPT, and the Chinese language surpassed the current version of ChatGPT.

A point worth paying attention to. Different from other large models, the Xinghuo cognitive large model adopts the "1+N" structure, where "1" refers to the general cognitive intelligence large model, and "N" refers to the large model in the vertical Landing of the field.

According to Hu Guoping's sharing, the large-scale Spark model has been implemented in education, office, automobile, medical, industrial and other fields, and has realized innovative applications from 0 to 1 in multiple industry scenarios.

Let’s take a look at Hu Guoping’s live demonstration of how the Xinghuo model actually performs without practicing fake moves.

The first thing to test is the text generation ability of the Xinghuo large model. Hu Guoping set out a task of "imagining the world after the realization of general artificial intelligence in a poetic way", and the large model immediately gave an answer - "The coming day of general artificial intelligence, the world Changes are like wind, boundless wisdom is within reach, human life has a new look, and autonomous driving gallops across the world..."

In terms of language understanding, the Xinghuo large model can not only straighten out the relationship between contexts, but also clearly understand dialectical understanding and scene-based application of words such as "I would rather die than surrender" and "Be able to bend and stretch".

In the dimension of knowledge question and answer, the Xinghuo large model can give more targeted answers based on the search results and use the language understanding and comprehensive expression capabilities of the large model.

Logical reasoning is the key task to test the intelligence level of large models. After two versions of iterations, iFLYTEK Xinghuo is now very good at complex reasoning under the combination restrictions of "farmers cross the river with wolves, sheep, and vegetables". Finish.

According to Hu Guoping's introduction, the mathematical ability and coding ability of the Xinghuo large model have also made great progress after the release. Among them, the mathematical ability can accurately give the answers to the geometry and algebra problems involved in the high school step by step, and the coding ability has also improved. New breakthroughs, especially Python's code generation capabilities have reached a relatively high level.

In the final demonstration of the multi-modal capability, according to Hu Guoping's task instructions, the Xinghuo large model quickly generated a piece of prose text, and at the same time used a virtual anchor in the image of a girl to recite the prose.

Obviously, the performance of the Xinghuo large model is very good. After scientific and systematic evaluation, the "Xunfei Xinghuo cognitive large model" is at the leading level among the existing domestic measurable systems.

The Xinghuo large-scale model has a very short time for R&D and training for each node from project establishment to release to iteration. However, in terms of its demonstrated capabilities and effects, it can rank firmly in the first echelon of China's major large-scale models. What secrets are hidden behind it?

2. In addition to being amazed, you can clearly see the fineness of the Ascend computing power base

In addition to HKUST Xunfei’s deep technical reserves and accumulation in the field of cognitive intelligence over the years, the computing power base supported by Ascend AI is particularly critical.

The first requirement for large model training is large computing power.

Some industry experts have done calculations to complete a large model with hundreds of billions of parameters. For example, GPT-3 requires 314 ZFLOPs computing power for model training. When a single card has only 312 TFLOPS computing power, it takes time to train a model with one card. 32 years.

Therefore, the introduction of distributed training solutions and the establishment of AI chip clusters to accelerate model training have become the mainstream of the current industry.

However, when the chip cluster becomes larger and larger, since the large model is sliced ​​into the cluster in parallel, a large number of multi-card communication and node communication will be generated between the model slices. At this time, higher requirements are put forward for the communication capability of the cluster.

It can be seen that large-scale model training not only tests the size of computing power, but also tests the engineering and systematization capabilities of computing power clusters.

Still taking the Spark large model as an example, the training time of the entire model is very short, and its iteration speed is very fast, which also means that in addition to computing power, the stability and scalability of model training should also be very good. Performance.

Let's take a look at how the Ascend AI cluster does it.

First of all, after the upgrade of the whole machine system, the computing, storage, network, and energy elements are all integrated together, which is equivalent to turning the AI ​​data center into an AI supercomputer, which doubles the energy efficiency.

Secondly, based on the architecture design of the backplane bus, it realizes full-node blind insertion and precise liquid cooling, with greater computing power density and a PUE lower than 1.15, making the computing power center greener and enabling more flexible expansion. and deploy.

Finally, through the multi-level reliable design of nodes, cabinets, clusters, and job levels, system-level faults are diagnosable, predictable, measurable, and recoverable, and can maintain a stable training cycle of more than 30 days to achieve high availability.

In fact, as early as 2019, Ascend AI had already started to explore kilocalorie clusters. At that time, the scale was only 4,000 cards, and it was put into commercial use in 2020. At the just-concluded Ascend AI Industry Summit Forum, Huawei announced that the Ascend AI cluster will be comprehensive With the upgrade, the cluster size has been expanded to 16,000 cards, which means that a large model with 175 billion parameters and 100B data can complete a training in about half a day.

In fact, supporting the research and development and training of the Spark large model is only a microcosm of Ascend AI's capabilities. At a higher level, Ascend AI has also extensively participated in the construction of more than 20 artificial intelligence computing centers across the country, including Wuhan, Beijing, Xi'an, Chengdu, and Dalian. , Shenyang, etc., 7 cities have been recognized by the state, becoming the first batch of national open innovation platforms for new-generation artificial intelligence public computing power of the Ministry of Science and Technology.

At the same time, Ascend AI also supports the development of nearly half of China's original models. According to the "China Artificial Intelligence Large Model Map Research Report" released in May this year, there are more than 30 domestic large models with a scale of more than 1 billion parameters. It is based on Shengteng's original open source and adaptation, covering NLP, multimodal, cloud, voice and other fields.

With so many projects, Ascend AI has accumulated a lot of experience. Therefore, in terms of promoting the implementation of large-scale model applications, Ascend AI is not only a provider of computing power, but also a shaper of the large-scale model development process starting from efficiency.

The development mode of large models was traditional API-based at the beginning. Ascend AI has moved towards model-based by providing a series of large-scale model development kits. In this development mode, only a few dozen lines of code are needed to implement Full-process script development lowers the threshold for large model development.

Obviously, in the face of many difficulties and challenges in the development and training of large models, Ascend AI faced up to the difficulties and chose the front hard steel. For Ascend AI itself, it is an early occupancy for computing power competition in the era of large models; As a whole, the domestic large-scale model structure is based on independent innovative software and hardware, which is a true manifestation of the country's scientific and technological strength.

3. On the road of innovation, China's AI needs more peers

The era of large-scale models has just begun, and there are still many uncertainties in the future. The only certainty is that there will be a continuous demand for computing power.

Hu Guoping predicted three trends in the development of large models.

The first one is that more new large-scale models will appear in the future. After the existing large-scale models are continuously iterated, the data scale will increase even more. computing power requirements.

The second is that with the improvement of the capacity of the large model, it can generate data and intelligent input and output with more sensors and actuators, and the boundary of the large model will further spread, which will consume more computing power. .

The third is that in the future, everyone may have their own exclusive large-scale model or assistant. Around personal learning and life, personal assistants are evolving and synchronously upgrading all the time. System solutions present challenges.

It is not difficult to see that these three trends are closely related to computing power. In Hu Guoping’s view, the large model is similar to the principle layer of the brain. They are all combined through more than 100 billion neurons, receiving input stimulation, and then generating wisdom. The output has a similar intelligence excitation and operation mechanism.

This also means that "what the brain can do, the large model can also achieve." The large model has unlimited potential, and the exploration of the computing power base is endless.

Of course, computing power alone is not enough to make a large model.

Academician Zhang Bo, academician of the Chinese Academy of Sciences, professor of the Department of Computer Science of Tsinghua University, and honorary dean of the Institute of Artificial Intelligence of Tsinghua University, believes that the success of Chat GPT is not only due to the three elements of data, computing power, and algorithms, but to emphasize four elements , which are knowledge, data, algorithm and computing power.

That is to say, we need to obtain data from the text, and then obtain knowledge from the data. This transformation has led to the current ChatGPT, and these are all based on "text semantic representation based on word embedding"" conversion based on attention mechanism The breakthroughs in the three technologies of machine, "self-supervised learning based on predicting the next word" were realized.

From this point of view, the three elements of data, algorithm, and computing power seem to be independent, but they are closely related in the large model, so the importance of industrial ecological construction is highlighted.

Ascend's AI industry ecology is developing rapidly. Up to now, it has developed more than 30 hardware partners, more than 1,200 ISVs, and jointly launched more than 2,500 industry AI solutions. This ecosystem can be directly transferred to the large-scale industry.

In terms of talent training, more than 300 colleges and universities have cooperated with Ascend AI, training more than 100,000 professional AI talents every year. The number of Ascend AI developers is also growing rapidly, and this year it has exceeded 1.8 million.

Because of this foundation, Ascend AI announced at the conference forum that four ecological partners, iFlytek, Zhipu AI, Yuncong Technology and Facewall Intelligent, jointly released a large-scale model training and push integration solution to speed up large-scale model development. The fast landing speed allows the large model to play its value in more subdivided industries such as smart cities, smart finance, smart coal mines, and smart manufacturing.

There is no doubt that large-scale models will definitely usher in their own era. If the era has come, then its decisive period is definitely not the first year when it just started. Like other disruptive new industrial technologies, the development of large-scale models is destined to It's a race of time and endurance.

Of course, in the process of bullets flying, before the decisive moment of the large-scale model era, we need more iFlytek, and we urgently need Ascend AI that can provide powerful computing power.

Guess you like

Origin blog.csdn.net/DJXYS0309/article/details/131649910