National-level applications, how can 1.4 billion terminals use AI?

Click to follow

Text | Hao Xin, editors | Liu Yuqi, Wang Yisu

The application of AI has always had two extremes:

One is to use the strongest computing power to run the most complex models in data centers; the other is to use the smallest models to implement the most universal applications in smart terminals.

In addition to pursuing the emergence of intelligence brought by large computing power, how to compress AI models and apply them to various intelligent terminals is also an "extreme challenge" that the industry is constantly pursuing.

Among the various top AI conferences in the world, there is a category of top competitions that focuses on how to apply AI to terminals with high performance and efficiency.

ICCV 2023 is one of the three top conferences in the field of AI computer vision. It is held every two years. It represents the most advanced technology of mankind in the field of computer vision and can be called "the strongest on the earth."

At the end of August, the Alipay xNN team, out of its "technical self-excitement" and equipped with "hard skills", participated in the CV modeling inference and training competition for the first time, and won three championships and runner-up in two events. Among them, in the "Inference Challenge", two teams, shanwei_zsw and zhaoyu_x, won the championship and runner-up respectively, with a score of more than 45% ahead of the third place. In the "Training Challenge", shanwei_zsw ranked second with a slight difference of 0.04.

The difficulty of the competition is to maximize the performance of the model under limited computing resources. For example, at runtime, the GPU memory is limited to less than 6GB, and the training cost is controlled within nine hours. Under these conditions, the image recognition accuracy must reach more than 95%, and inference on thousands of images can be completed within one minute. Effect.

"The investment time is relatively short, and the competition is very calm. The topic of the competition is our daily work," said Wang Shihao, Alipay xNN engine engineer who participated in the challenge.

Through the exploration of the xNN team in technical practice, we can also get a glimpse of the latest progress of AI in terminal implementation: from 1.0 to 2.0, the current terminal-side AI technology has climbed to a "small peak", "good and fast" "Becomes the main logic of technology arrival.

The gears of history are starting to turn again. With the advent of large models, the implementation of AI in smart terminals has ushered in a new stage.

01 Device-side deployment problem: limited resources VS large parameter AI model

Maybe you don't know the xNN team, but you must have used their technology.

Scan the QR code to pay, scan the five blessings during the Spring Festival, financial management recommendations, short video live broadcasts...the xNN team is involved in these all-too-familiar scenes.

Zhou Dajiang, the leader of the xNN team, told Guangcone Intelligence that they are mainly responsible for all end-intelligence applications within Ant Group and provide end-side technical engine support for Ant Group.

Deep learning just broke out in 2016. In 2017, the computer vision-based "Scan Five Fortunes" was officially launched in the Alipay APP. This also became the starting point for the establishment of the xNN technical team.

"The blessing sweeping activity was first tested on the cloud, that is, using AI capabilities to identify the word blessing on the server. However, it was ultimately found that the cost and effect were unsatisfactory, so we turned to the end to do it," Zhou Dajiang said.

The action of "scanning for five blessings" has gone through a series of AI steps of scanning, identification, judgment, calculation, reasoning, and feedback. If it is implemented through the cloud path, the camera must first scan and then transmit it to the server. The cloud server will return the results after the processing is completed. This includes network communication overhead, cloud computing transmission time, and judgment feedback time. Overall, users have long waiting times and poor experience.

After exploration, the xNN team discovered that  the cloud could not meet the real-time needs of users. In order for the application to fly into the homes of ordinary people, it had to be deployed on the device side.

As Wang Shihao said, machine learning on the client side has many advantages: first, it can reduce deployment costs, and everything from model inference to model decision-making is done on the client side, saving cloud operation and maintenance resources; secondly, after loading resources once, there is no need to repeatedly request services It can transfer the amount of data and obtain AI analysis results on the end, which is very suitable for high-frequency AI analysis scenarios such as image recognition. Finally, calling data does not need to be transmitted to the cloud, which helps to improve user privacy.

The technical direction has been verified to be correct, but the implementation process is full of complications.

Taking the Wufu scanning scene as an example, it brings together a variety of AI technologies such as image recognition, speech recognition, natural language processing, and sensor data fusion. For example, the most commonly involved image recognition technology is used to quickly and accurately identify the image of the word "福" scanned by the user to determine whether it is correct; sensor fusion technology, combined with mobile phone motion sensors, etc. to assist barrier-free users in scanning the word "福"; speech synthesis and recognition technology , supports various innovative gameplays of voice interaction, such as "Voice Fighting Nian Beast" and so on.

The difficulty of end-side deployment is also very prominent. Compared with cloud servers, end-side device resources are extremely limited and cannot load overly complex models.

Resource constraints on the end side are reflected in the entire process. Recognition relies on cameras, and the quality of mobile phone cameras varies, some are clear and some are blurry; the GPU and CPU used for model training and inference determine data transmission and feedback. The processing speed; the storage space is limited, which determines the model size.

But the scale of AI models is very large. A simplified model is like the three-dimensional simultaneous equations in middle school. The process of model training and reasoning can be understood as the process of solving equations, but the equations to be solved are millions of yuan. Among them, each element is a "parameter", and the simultaneous equation is called a layer of "neural network". Generally speaking, commonly used simple models may have 3-6 layers, and complex models may even have 50+ layers, which is enough to prove How big are the model parameters.

"As a result, a gap formed between limited resources and large-parameter AI models." Li Jiajia, head of Alipay's terminal technology department, further clarified the direction of the xNN team. Putting the AI ​​model on the terminal is to do a "dojo in a snail shell" ".

But for a national-level APP with a billion users, the technical problems to be solved are far more than that.

Alipay serves people of all ages. Among its 1 billion users, more than 55% come from third- and fourth-tier cities. For a long time, low-end phones priced at a few hundred yuan have become the mainstream, which means that the threshold cannot be set too high in terms of technical design and must be able to adapt to various old versions of mobile phone hardware and operating systems.

Alipay applications often have massive concurrent scenarios, such as Spring Festival blessing sweeps and other promotional activities, which often involve hundreds of millions of people. According to statistics, one in every 2-3 Chinese people has participated in blessing sweeping activities, and the huge influx of people instantly Traffic also brings challenges to the end side.

Based on the attributes of the device-side AI technology itself and Alipay's national-level applications, two heavy mountains are weighing on the shoulders of the xNN team.

02 From AI 1.0 to 2.0, find the optimal solution for the end-side model

To further dismantle the task, the xNN team decided to first solve the common problems of terminal-side AI technology, and then solve Alipay's own personalization problems. Based on this logic, its technology iteration has gone through  the AI ​​1.0 model lightweight stage and the AI ​​2.0 scalable modeling stage.

Model lightweighting, as the name suggests, is to optimize the AI ​​model to be as small as possible. To this end, the xNN team developed "model compression technology".

(Figure: Several basic methods of model compression)

“The core of model compression is to make the model as small as possible, so that it can fit into a mobile phone,” Wang Shihao said.

Although the AI ​​model is compressed smaller, it does not mean that the running performance will be reduced. According to the xNN team, whether the model can be further smaller while completing the same task effect has become the key. For example, after compressing a model, the accuracy of identifying the word "福" is still very high.

A question also arises, how small is an AI model?

Zhou Dajiang told Guangcone Intelligence that the industry has not yet formed a standard on this issue, which means that each APP may have its own soft standard. "In actual application operations, larger size is not a problem. The industry generally controls the size to be around ten megabits.  But Alipay prefers a model scale of a smaller order of magnitude, and most applications are controlled within one trillion ," Zhou Dajiang said.

Compressing the model to make it smaller mainly solves "good" problems so that complex models can be run on end-side devices. The end-side high-performance computing engine technology focuses on solving "fast" problems and uses various optimization methods to enable efficient calculation of models.

The purpose of high-performance computing engine technology is to enable models to make full use of the computing resources of end-side devices. One core indicator is computing power utilization, which is the utilization efficiency of hardware resources for model calculations. For example, if a mobile phone can theoretically perform 10,000 calculations per second, the performance of the engine determines whether these 10,000 calculations can be fully utilized.

Thanks to the two "killer trumpets" of "model compression + high-performance computing engine", the xNN team "achieved end-to-end deployment of most scenario applications" in the 1.0 stage, which also led to its "downsizing" trend from 1.0 The "flexibility" of 2.0 has laid the foundation.

"In the past, one model was used all over the world, but now it is about finding the most suitable model for different devices," Wang Shihao said.

Scalable modeling technology addresses the pain points of a wide range of user groups and diverse hardware environments for national-level applications.

For a period of time, different mobile phone models share the same AI model, which will result in insufficient coverage. Either to adapt to low-end models, resulting in high-end model resources being wasted and business effects and user experience compromised; or to cater to high-end models, directly losing some low-end machine users.

Now, not only one model is produced, but many models are produced at one time through technical means. Each model can be compatible with different users' mobile phones to maximize its computing power, and then bring the effect of the model to user. This is like the resolution when playing games. You can choose either ultra-high-definition resolution or low resolution, which deeply explains the connotation of "scalability".

The realization conditions of "small and fast" models are further coupled, and the optimal solution can be found quickly through scalable modeling technology in one step. But at the same time, the tools and parameter adjustments involved are becoming more and more complex. After passing through model lightweighting and entering the AI ​​2.0 era, the technical competition has just begun.

With the advent of large models, the implementation of AI in smart terminals will usher in a new technological stage.

03 How far can a large model run on a mobile phone?

Continuously optimize, make the AI ​​model smaller step by step, and run it well and fast on mobile phones.

Looking back at the technological iteration process of the xNN team, this main thread running through it is also the epitome of the intelligence of the entire terminal.

However, in the new stage of application of large models with "larger parameters and more complexity", the technology has both changed and remained unchanged.

What remains unchanged is that the xNN team’s intelligent representation of light cones, even at  the stage of large models, still has the core contradiction in how to use limited resources to achieve good and fast results, and large models only further deepen the contradiction. Judging from the specific evaluation indicators, there is no essential difference from before. Once it involves end-deployment, model size, performance, speed, etc. cannot be escaped.

The biggest variables are large models with billions or tens of billions of parameters. Today's current situation on the device side is that mobile phone computing power, storage and other resources have not improved by leaps and bounds, and huge large models can't wait to become the protagonists of the new era. This is also the difficulty that makes it difficult for large model applications to get through the "last mile" location.

(Figure: Parameter statistics of several pre-trained models)

“Most of the tasks that require extreme optimization in stages 1.0 and 2.0 have been completed, and the sudden increase in the size of large models has become the biggest technical challenge for current end-side AI.”

The xNN team believes that there is technological continuity from 1.0 to the current large-model application implementation stage. The ability to explore and build in past technologies, the experience of finding “good and fast” optimal solutions, and the accumulation of scenario operations are all the cornerstones of moving towards a new stage of large models.

Li Jiajia, head of Alipay's terminal technology department, feels that we need to stand on the "shoulders of giants" and continue to optimize the original technology for higher technical standards; on the other hand, we may need to make some new ones based on the characteristics of large models. Algorithm design and breakthroughs.

According to the picture, although we hold the key to end-to-end AI, there is still a long way to go before large models can actually run on mobile phones.

According to Light Cone Intelligence's observation, at this stage, the computing power of large-model applications used on mobile phones is basically concentrated on the cloud, and device-side adaptation is still in the exploratory stage.

Currently, Huawei, Xiaomi, Vivo, OPPO and other mobile phone manufacturers are optimizing chip hardware design and operating system around large models. From the bottom framework to upper-layer applications, they are trying their best to provide better performance for large models running on mobile phones. Supported by multiple computing resources.

Li Jiajia explained that in the future, the manner in which large models will land on mobile phones is still under discussion.

"Ant Group has recently released a series of self-developed basic large-scale models. How to apply them to terminal devices such as mobile phones, we focus on two key technologies: on the one hand, further enhance the computing efficiency of the terminal inference engine and fully tap the unique characteristics of mobile phones. Constructing computing resources to improve the execution efficiency of large models is crucial to user experience; on the other hand, technical means are used to promote the compression of large models and reduce the consumption of running memory and storage resources without reducing the effect of large models. .At the same time, in addition to continuing to strengthen the two key technologies of engine computing performance and model compression, we also attach great importance to collaboration with mobile phone manufacturers, combining the iteration of manufacturers' hardware capabilities with the advantages of our own scenarios and technologies to accelerate the use of large models on mobile phones. The pace of application”.

After the technical implementation ideas and paths are clear, when can we wait until the large model can actually run on mobile phones?

"The next 1-3 years will be critical," Li Jiajia, head of Alipay's terminal technology department, judged. "The key indicator is still the model size. Currently, some large models still need to consume several GB of physical storage/running memory. Based on this, at least it should be controlled  . At the level of under 500MB-1GB.

The singularity moment of opening up the "last mile" and allowing 1.4 billion people to use mobile phones and large models may be coming soon.

Welcome to follow Light Cone Intelligence and get more cutting-edge knowledge of science and technology!

Guess you like

Origin blog.csdn.net/GZZN2019/article/details/132911594