A senior technical expert from the Tao Department accepted an interview with InfoQ and said: End intelligence will definitely become the core driving force driving business innovation...

In recent years, more and more companies have paid attention to the direction of end intelligence. Some head companies have made new explorations in end intelligence and achieved good results. End intelligence has gradually become the core driver of mobile App business innovation. One of them. What challenges will you encounter in the process of advancing end intelligence? What is the core solution?

Recently, Lv Chengfei, a senior technical expert in the technology department of Alibaba's Amoy Department, was invited by InfoQ to chat with you about the application of terminal intelligence in Amoy Department and the technical challenges behind Double Eleven. It will also share the topic of "Intelligent Technology Construction and Business Innovation of Taoxi Terminal" at the QCon Global Software Development Conference (Shenzhen Station) on December 6-7 . The technical WeChat public account is organized and released.

Lu Chengfei (nickname: Lu Xing) Alibaba senior wireless development expert, has in-depth thinking and practical experience in mobile development, super App architecture, and end-side AI. Joined Baidu after graduation in 2011 and experienced the 0 to 1 research and development process of Baidu input method. Joined Taobao in 2013 and went through the complete process of mobile Taobao Super App technology evolution, leading Taobao iOS architecture upgrade, architecture governance, stability and performance related work. In 2017, we began to explore the direction of end-to-side intelligence, building open source end-to-side reasoning engine MNN, end-to-end computing framework Walle, AR technology framework and beauty AR and other innovative applications.


The following is the interview record:

InfoQ: Hello, Teacher Lu Xing, it is a great honor to interview you. You started to explore the direction of end-to-side intelligence in 2017. In the past three years, what changes do you think have been made in the development of end-to-end intelligence?

Lu Xing: At the macro level, end-to-end smart applications will gradually expand from exploration and experimentation. In the future, they will surely become one of the core technology driving forces for commercial applications and business innovation. Specifically, the development of industry-side intelligence can be viewed from the following three perspectives:

  • From a technical point of view, the problem to be solved is progressive. From the basic issues of model operation to the issues of efficiency and scale application, specifically including: How does the algorithm model run on the end-side? How to quickly iteratively deploy algorithm models? How to lower the threshold of end AI technology to achieve universal application?

  • From an algorithm perspective, end-side algorithms continue to mature and improve. From the initial face detection, to the human body posture, gestures, OCR, etc. gradually mature. In addition to the visual model, it is gradually possible to run the search recommendation depth model, the voice ASR model, and the NLP model on the end-side. For example, we implemented a real-time speech recognition solution on the mobile end based on MNN this year, and we will broadcast on Taobao on Double 11. Very good business results were achieved in the "Guess the End" event.

  • From an application point of view, the overall scope of application continues to expand and deepen. From the initial single point scene such as Taobao Polaroid Tao scene, to the full deployment of multi-app and multi-scenarios, incomplete statistics, Ali has more than 30 end intelligent applications based on MNN.

InfoQ: What are the main stages of the development of Taoxi end intelligence?

Lv Xing: The end-side AI application process of Taoxi is shown above. Each node has many problems. We have been solving them for the past 3 years, mainly through the following 3 stages:

  1. End-side inference engine stage: End-to-end intelligence must first solve the problem of running the algorithm model on the end-side, otherwise nothing can be said. The inference engine is the jewel in the crown of end-to-end intelligence applications. At this stage, we have built an end-to-side inference engine MNN to realize the model End-to-side efficient operation.

  2. Algorithm model service stage: In addition to the operation of the algorithm model, the end intelligence will involve model conversion, update release, version management, operation and maintenance monitoring before and after the business landing. At this stage, we have done the end AI server to solve the algorithm model release update problem. In particular, in addition to models, algorithm tasks also involve pre- and post-processing codes. Therefore, we built a PythonVM-based algorithm task runtime container to allow algorithm students to write Python tasks to achieve rapid iteration.

  3. End AI R&D paradigm stage: In the process of end intelligent large-scale application, it is necessary to systematically solve the problem of R&D iterations. On the one hand, the implementation of end-intelligence applications requires the collaboration of algorithm development and mobile development, but there is a natural GAP between the two, which completely relies on verbal communication, and there are big problems in the efficiency of collaboration; on the other hand, AI application scenarios have a long tail and fragmentation Features, many scenarios have not been implemented due to lack of professional algorithm support, and due to the lack of unified technology construction, it is difficult to deposit and reuse the applied solutions; therefore, we build the "end AI R&D paradigm", which is run by the MNN workbench and MNN Time and end AI server structure. Its core ideas: one is to decouple algorithm and mobile development, allowing algorithm development to iterate independently; the other is to lower the threshold of AI and make AI a powerful weapon for ordinary development to solve business problems. I will share the relevant content details at this QCon meeting.

InfoQ: In the process of promoting end-to-end smart implementation, what difficulties did Amoy Technology encounter? What do you think is the biggest challenge? How was it resolved in the end?

Lu Xing: The rich business scenarios of Amoy Department have always been fertile ground for cultivating innovative technologies. The overall technology and application practices of end intelligence have been at the forefront of the industry. We have an open source reasoning engine MNN and an open MNN workbench. At present, Tao Department has 25+ application scenarios and 65+ algorithm models in daily operation. The reasoning runs more than 10 billion times a day, covering core scenarios such as product search recommendation, user contact, Polaroid Tao, and live broadcast. It has experienced 3 Double 11 Tested and achieved great business value. The overall application can be roughly divided into the following categories:

  • The visual category is mainly used in scene applications such as Polaroid Taobao, Taobao live broadcast, shooting tools, and evaluation.

  • Recommendation category, mainly in various recommendation scenarios such as homepage information flow, after-purchase, and details.

  • Reaching category is mainly used in scenarios such as Push, messaging, and various business bulletins.

  • Voice, mainly used in Taobao live broadcast, intelligent noise reduction and other scenarios.

So far, the biggest challenge is the challenge of the inference engine MNN, such as:

  • Fragmentation of mobile devices and systems;

  • Limited mobile computing power and resources;

  • Diversified algorithm models such as vision and speech

  • ……

How to solve the above challenges, I will not go into details here, I will focus on sharing its core solutions at the QCon Shenzhen 2020 conference.

InfoQ: In the just past Double 11, what are the outstanding performances of end intelligence in the actual application process? Can you start a chat with actual cases?

Lu Xing: End-to-end intelligence has gradually changed from trial applications to one of the core driving forces driving business innovation, and relevant applications can be seen in the hot business scenarios of Double 11. There are also many applications for the hot live broadcast scene this year. Relying on Taobao’s self-developed MNN, Taobao’s live broadcast room launched a "voice guessing price" challenge. The audience can also achieve voice interaction in the live broadcast room, and they can respond to the anchor’s task of guessing the product price by moving their mouths. The end intelligence has greatly improved the interactive playability of live content and the accuracy of content understanding.

Based on the end-to-end AI technology to achieve accurate user perception capabilities, during the peak of double 11 traffic, the end-side computing power and data advantages are fully utilized, and the experience and effect of actively reaching users are greatly improved. Only on November 1st, the end-side AI The decision was executed 27.7 billion times.

Through real-time perception of user behavior and intention recognition, product list rearrangement and intelligent refresh, large-scale application in scenes such as Taobao information flow, DPV and GMV have been greatly improved.

InfoQ: Can you briefly talk about the next step of MNN?

Lu Xing: In fact, the essence of the inference engine MNN is to do such a thing, that is, to achieve [different types of models] on [different heterogeneous equipment] [most efficient operation]. There are three key points here, and we continue to evolve and explore.

  1. Supports different types of models, from supporting CV and Data algorithm models to supporting ASR and NLP algorithm models. Recently, MNN has been improved and upgraded in terms of control flow and dynamic graphs, and newly supports network models such as Transformer.

  2. Supporting different heterogeneous devices, from supporting client CPU ARMv7/64/v8.2 to GPU OpenCL/Vulkan/Metal, etc. are constantly evolving and improving. MNN has also begun to support server-side Intel x86/NVIDIA GPU inference, providing cloud-side One unified reasoning service. For each heterogeneous device that needs to realize and optimize all OPs, which leads to excessive development costs, we innovatively propose a geometric computing architecture solution that converges the number of OPs to about 20 core operators to achieve low-cost coverage of each heterogeneous post At the end, MNN should be the reasoning engine with the most comprehensive support for the industry covering heterogeneous backends.

  3. Achieving the most efficient operation and high performance has always been one of the core advantages of MNN, which is also widely recognized in the industry. Specific optimization ideas include offline model compression, graph fusion and other methods for optimization, and online optimization through assembly, SIMD/parallelization, matrix algorithm, and scheduling. In addition, MNN cooperates with PAI to realize the integration of training and quantification to MNN deployment cloud integration solution, adding compression schemes such as sparse pruning and Overflow-Aware quantization.

MNN will continue to evolve in the above three directions, but from the perspective of the entire end-to-end intelligent application link, MNN only solves the single-point problem of efficient operation of the algorithm model on the end-side. Currently, we are moving from MNN single-point technology to end-to-end intelligent technology systemization and productization. As mentioned above, we build end-end AI research and development paradigm, and solve the conversion, optimization, debugging, release, etc. of the algorithm model deployment process through the MNN workbench. Problem, even let the algorithm development iterate independently. MNN Workbench is currently in free public beta, interested students can visit our official website www.mnn.zone to download the experience.

InfoQ: In your opinion, what other technical directions are worth paying attention to in the mobile field in the future?

Lu Xing: Technology progress is strongly related to business development. With the rapid development of live broadcast services, multimedia technology should have more development. I myself pay more attention to some things related to end intelligence:

  • AR + edge AI + 3D

I think the combination of these technologies can make many interesting applications. Among them, AR provides a combination of virtual and real scenarios, end-side AI provides interactive capabilities in AR, 3D models/AR materials provide content supply, and 5G networks provide large Network transmission capability of resource packages. At present, these technologies are not mature, for example, it is difficult to achieve low-cost and high-quality 3D modeling. In addition, mobile phones are not the most suitable carrier for AR applications. You can look forward to subsequent consumer-grade AR glasses.

  • The intelligence of end-cloud collaboration

At present, the cloud does training, the client does inference, and the end-cloud integration is still relatively shallow. We are also doing explorations on terminal training and building a distributed terminal-cloud collaborative intelligent system to achieve personalized understanding of users, protect data privacy, and save cloud costs.

✿ Further   reading

Guests | Lu Chengfei (Lu Xing)

Edit| Orange

Produced | InfoQ&Alibaba's new retail technology

Guess you like

Origin blog.csdn.net/Taobaojishu/article/details/110848554