Ali Vision AI's Open Platform Road

Author: Xingtong Ali Dharma Academy

The AI ​​open platform explores how to empower more industries with technological capabilities, and at the same time reverses the advancement of technology to promote the sustainable development of AI. This article takes the Alibaba Cloud Visual Intelligence Open Platform as an example to describe the positioning, architecture, implementation, operation and evolution of the AI ​​platform.

1. Introduction to Ali Vision AI

As the main component of AI technology, visual technology has contributed various technological innovations and application practices in a large number of business scenarios (e-commerce retail, financial logistics, entertainment marketing, enterprise services and other industries). At the same time, as the centralized exploration and R&D base of Ali Group's advanced technology, Dharma Academy has accumulated many excellent visual algorithm capabilities, which are distributed in the following scenarios and links:

These product technologies cover almost all aspects of visual technology:

 How to find a suitable way to release this internally gathered ability and energy, to empower thousands of industries, and at the same time reverse the advancement of technology, from the inside out, from Close to Open, whether it is from the perspective of technological development or from the perspective of From the perspective of social value, they are all of great value. This article is mainly based on the exploration and practice in recent years, to explain the opening and platform road of Alibaba's visual AI.

2. Platformization of visual AI

At present, AI technology has made great progress, but there is a long way to go to meet social expectations and meet actual needs. This contradiction can be abstracted, that is, the contradiction between the diverse AI needs of customers and the limited supply of AI capabilities . The demand is unlimited, and it is naturally impossible to support it with limited resources. Coupled with the particularity of AI capabilities (there are certain R&D and operation thresholds, and the effect is uncertain), even if all the AI ​​capabilities and strengths of Ali are gathered , and can only provide part of the core capabilities and typical cases. The supply and demand gap is a path to shorten this gap by providing tool services, as shown in the figure below.

Therefore, if there is really a relatively general AI platform (public and proprietary AI capabilities have suitable places to play, here we first discuss the public cloud-based AI open platform), then its core value is nothing more than two points: providing core AI capabilities and typical cases; tools to shorten the gap between supply and demand.

Let's see how to build the shortest path of AI "supply" and "demand":

  • Some uses : provide diverse and standardized capabilities that meet the fundamentals of vision, improve the capability supply chain and matching system, and meet the mainstream AI needs of users in a one-stop manner to the greatest extent;

  • Ease of use : Provide full lifecycle capability experience and use process, stable and efficient platform infrastructure support, and achieve fast access, low threshold, and stable use;

  • Affordable : Reduce platform costs through extreme single-capability optimization and multi-capability system optimization; meet the needs of small and medium-sized AI for free through quotas, obtain low-cost usage capabilities, and achieve high ROI;

  • Easy to use : Provide practical and professional capabilities, AI comes from the industry, refines and precipitates, and feeds back to the industry in a systematic way.

A successful and sustainable platform should be able to find what it needs for all participants. The platform system generally has three important stakeholders, the demand side (AI developers or university teachers and students), the supply side (algorithm capability providers, such as the algorithm engineer of Dharma Institute) and the platform itself. The platform needs to consider both the demand and value of supply and demand. . The contradiction between AI needs, diversification of scenarios, and limited methods, data, and resources can be more efficiently alleviated through market-oriented mechanisms and systematic means based on some core AI capabilities.

There are two core points here: some existing AI capabilities, cold/initial startup, solving problems that can be standardized and have a certain degree of versatility; systematic mechanism, forming rapid adaptation, scale effect, feedback closed loop, multi-dimensional online evolution system.

3. Ali Visual Intelligence Open Platform

Initiated by the Alibaba Vision Technology Group, the Alibaba Cloud Vision Intelligence Open Platform (vision.aliyun.com) is a product developed and launched based on the above considerations. Since its launch, it has iterated three major versions step by step. :

Leveraging the power of the Ali Vision Group, dozens of teams were united for support, integrating or introducing the capabilities of multiple products on Alibaba Cloud, and also supporting multiple business parties inside and outside the group.

Since its launch, the visual open platform has confirmed its own vision: to make the world free from difficult-to-use visual AI . This is also the starting point and criterion for the development of the platform, from which it has formed the characteristics of the platform such as "comprehensive, professional, easy-to-use, and easy-to-use" :

In terms of structure, the visual open platform is a multi-level and multi-dimensional system, which can be basically divided into three layers, the basic layer, the capability layer, the application layer, and some users and operation tools. As a platform, each layer requires a lot of R&D and effort investment, large and continuous resource investment, and the spirit to face endless difficulties and endure loneliness.

For better understanding, it can be described in another more concise way:

Here are the three layers of the platform:

1. Basic platform

First of all, the visual AI open platform is a platform. As a cloud-based AI product, resource management (mainly GPU), reasoning platform, stability assurance, monitoring and tracking, and cost efficiency improvement cannot escape. These serve the platform's primary importance (online AI capability launch and operation). The most important thing can be abstracted into AI-capable (non-R&D and production) full-link lifecycle management, including: planning --> product selection --> evaluation --> new release --> operation --> monitoring --> Update --> Offline , etc.

What needs to be emphasized is the evaluation part. The guarantee of algorithm quality (good or not) requires a standard and measurable evaluation mechanism. This is also a method to turn the uncertainty of AI algorithm into certainty, including horizontal similar ability PK, Vertically compete with existing capabilities, and get a standardized evaluation report.

In addition, satisfying the second characteristic (online efficiency tool that shortens supply and demand) mainly depends on the "capacity reproduction" module. Let me expand a little bit here. In addition to the ready-to-use (capability or case template), the capability of secondary or multiple development is required , all belong to reproduction or redevelopment, generally there are three modes:

  • Combination arrangement : This is the recombination of atomic capabilities into capability clusters, which can be called molecular capabilities. This type can be code development or combination in the form of so-called "low-code" graphics. The combination can be a simple series-parallel connection, or a slightly more complex DAG diagram, or even a full set of multi-level nested graphics development methods similar to G language (such as LabView);

  • Reproduction of existing atomic capabilities (generally represented as pre-trained models) : here refers to the online implementation by users through platform tools (offline or out of the open platform are not within the scope of discussion), including: model structure, parameter weight adjustment, quantization acceleration, Migration of large model to small model, few sample tune/different domain data scenarios, etc.;

  • The online iterative evolution of AI capabilities , which is widely used in the Internet's classic ability "search and promotion", has not yet matured models in the AI ​​​​platform, such as online learning, incremental learning, etc. These Internet algorithm evolution models, in solving data security , privacy and other issues, I believe that sooner or later it will be applied in the field of visual AI.

Q: What is the relationship with PAI?

A: PAI can be understood as a series of infrastructure and tools. Based on the product positioning of the open platform, we introduce the capabilities of PAI to realize the reproduction and realization of visual AI capabilities. In a word, PAI is our foundation and tool.

2. Competence Center

The visual open platform is first of all a capability center, which currently gathers most of the group's visual AI capabilities (mainly the capabilities of Dharma Institute), with a total of 200+ in 15 categories, as follows:

Going back to the previous classification system of visual technology, we will find that there are basically one-to-one corresponding categories, which also reflects from another perspective that the open platform is indeed the first platform in the Ali vision field that truly covers all categories. Although there are many capabilities, they can basically be organized into three categories, fundamental capabilities, superior capabilities, and industry application capabilities.

  • Fundamental capabilities: including face and human body, OCR, detection, marking, etc. These AI capabilities are widely used, and the platform must have them. At least it cannot become an obvious shortcoming in terms of performance, otherwise it will be a niche AI ​​platform;

  • Superior capabilities: based on Ali’s own scenarios, with certain technical advantages and differentiated AI capabilities, to establish platform competitiveness, such as segmentation, key points, super-scoring, product identification, etc. These superior capabilities themselves may also be fundamental capabilities;

  • Industry application capabilities: The platform first provides some relatively general AI capabilities, mostly in the pan-Internet field. But other scenarios (such as overseas scenarios, enhanced image editing, personal inspection, etc.), or industry-heavy capabilities (such as medical care, education, etc.), are also very valuable. This also reflects the versatility and openness of this platform.

The selection (selection) of these capabilities also requires a strategy. Here the Voronoi quantization cavity can come in handy again. In the infinite simulation space, a reasonable and quantifiable representative point can be selected. This choice can be based on the size of the demand, the ability To measure advantages and other dimensions, it is also necessary to consider the characteristics suitable for public clouds and the value of capability reproduction (for example, some capabilities can use large models/pre-trained models to facilitate the subsequent production of small models).

3. Scenario application

From the perspective of platform positioning, the platform needs to provide some typical AI solutions. From the perspective of the development stage, when the platform is facing a cold start stage and there are no large-scale users, you can treat the R&D team itself as a special customer, eat dog food first, and see if you can build some typical cases based on the platform, such as Old film restoration, personal verification, cloud image editing, etc. The open platform itself is at the PaaS (AIaaS) layer. Based on this, application examples of the SaaS layer can be built for users to refer to, or to be copied and modified.

Here are a few examples to verify the case of quickly building applications based on the platform:

1) Mask wearing monitoring

This case comes from the outbreak of the new crown in 2020. Due to the urgent need, it is hoped that it can be implemented quickly, real-time reminders for people who do not wear masks, and on-site management assistance for managers. The epidemic prevention and control command center can also timely grasp the wearing of masks in public places. The implementation of preventive measures can improve the accuracy of management decision-making.

The solution is to create a mask wearing detection and statistical early warning system by combining face recognition and face mask recognition provided by the open platform, as well as DingTalk applet reminders and Tmall Genie voice broadcast technology. At that time, after one month of intensive development, it met the actual deployment requirements, and the difficulty of installation was low. It could be operated by ordinary workers who deployed monitoring, and the property’s own personnel could also deploy; the deployment cycle was short, and ordinary workers who installed and monitored could deploy one in about an hour. equipment.

2) Video ad placement

Video implantation is to add some content that does not exist in the video, and it is integrated with the context, so that users feel that "it" itself should be there. The most widely used is advertisement placement. Video implantation is a very complicated technology, which needs to take into account various aspects, such as advertising space detection, advertising space tracking, etc. Sometimes it will encounter complex tracking such as occlusion, moving out of the screen, etc., and after the video is implanted, it is necessary to consider advertising Whether it can match the video details, light and shadow rendering and other issues.

Solution idea: Based on the precise segmentation provided by the open platform, combined with the capabilities of advertising space detection, identification and tracking, video segmentation, implantation and rendering, a fully automatic video advertisement detection and implantation system can be created, which can realize batch delivery, combined with scene-based Ad placement maximizes content value.

3)  Visual content design generation

When developing visual design generation products (Luban and AlibabaWood) in the early stage, a series of capabilities related to visual understanding and production have been accumulated and accumulated, and these capabilities have also become "seed" capabilities on the open platform. After a series of transformations, these two SaaS products also use the infrastructure and AI atomic capabilities of the open platform, making them more focused on business capabilities themselves.

4. Evolution: From OpenAPI to OpenSDK, from public cloud to device-cloud collaboration

The public cloud is the starting point and the main front of an open platform. The earliest form is also an API service, which we call OpenAPI. Just as self-owned capabilities cannot meet all needs and require the cooperation of reproduction tools, the delivery form of the public cloud cannot meet all computing power scenarios. In scenarios such as real-time interaction and high data security requirements, the combination of devices and clouds is a trend, and the same is true for AI platforms. We call it OpenSDK.

In terms of product form, cloud and terminal are just different deployment and operation forms of open platforms, which require a unified product experience and environment, including:

  • Business logic : From demand acquisition, R&D launch, business communication, business process, management and control upgrade, scenario simulation and delivery, etc., all are carried out on the public cloud unified platform, with the same user and experience.

  • Technical logic : device-side effects, computing power optimization, runtime framework, permissions and security, etc., as well as a series of abstract packages that depend on different software and hardware environments, all have device-side characteristics.

The development of OpenSDK is a step-by-step process, especially when the corresponding resources are extremely scarce. We have summarized a progressive logic (based on the support of underlying frameworks such as Alibaba Group MNN):

AI basic capabilities (such as segmentation, detection, face, key points, etc.) --> capabilities that require 2D rendering and material tool support (such as beauty and makeup, stickers) --> capabilities that require 3D rendering support (virtual people, AR/VR, etc.).

After more than half a year of research and development, OpenSDK has a certain prototype. Here are a few examples based on OpenSDK:

End-to-end enhancements to find a wider range of enhanced application scenarios

The value of AI in the field of sports and fitness

5. Evolution: AI Pratt & Whitney + Land of Opportunity to OpenSOTA

As a type of platform, the AI ​​platform conforms to the law of evolution of general platforms, and also has its own AI characteristics:

Ali's AI open platform hopes to contribute its own strength in social value (basic, hard core, inclusive, etc.), starting from "Let more people use better AI", to bring real efficiency to both the demand and supply sides and effect changes. In addition, as an emerging discipline, AI is also very lively in the academic world. Various "SOTA" methods emerge in endlessly, but these methods are difficult to reproduce and use, and the quality is uneven. There is a huge gap between the real use of the public. , based on this, the OpenVision team has always had an idea to create an "out-of-the-box" OpenSOTA mechanism in the future:

  • OpenSOTA carries the goal of "making the platform a place where industry/academic SOTA-AI gathers and uses";

  • Converge SOTA, reproduce SOTA, and use SOTA; have more comprehensive and updated SOTA capabilities, and more importantly, can be reproduced, run online, and integrated.

6. Reality and future

The ideal is full but the reality is skinny. For products such as the AI ​​platform that do not see big benefits in the short term, and it is inherently a project that requires large collaboration, how to achieve better performance in a constrained environment has always been an issue. Questions we need to think about. In addition to vision support, a pragmatic two-legged approach is required, that is, there is a clear long-term plan, and current rhythmic output is also required. Just like the AI ​​capability itself, the ability to continuously evolve is the most imaginative and promising.

Finally, imagine what the final outcome of the visual AI open platform is:

  • Influence: Leading AI open platform and brand in the industry, leader in AI development and usage models;

  • Value embodiment: Serving millions of developers, tens of billions of calls, thousand-level capabilities, and second-level access;

  • AI capability: the place where SOTA-AI is gathered and used in the industry/academia, and the place where original AI algorithms are incubated;

  • Case application: a place to share and experience excellent AI cases, and a place to practice large-scale AI applications;

  • User ecology: a land of inclusiveness and opportunities for mid- and long-tail AI users.

It is hoped that the visual AI open platform will truly become the entrance and position of Ali AI, bring the dual value of business and society, establish the status of AI in the industry, and prosper the AI ​​ecology.

Guess you like

Origin blog.csdn.net/AlibabaTech1024/article/details/123817807