Drops open source AoE: A fast integrated terminal of the operational environment SDK AI

A terminal side open bit AI Integrated Runtime Environment (IRE) -  the AoE (AI ON Edge). AoE of "stability, ease of use, security" as the design principles that can help developers to deep learning algorithm is easily deployed to a different framework of the implementation of efficient terminal.

The reason there are two pieces of the frame, the reason to do such a runtime:

  • First, with the rapid development of artificial intelligence, the past two years there have been many terminals running reasoning framework, giving developers more choices, but also increases the cost to deploy the AI ​​terminal;
  • Second, direct access by AI reasoning framework process more complicated, involving a dynamic library access, resource loading, pre-processing and post-processing, resource release, the model upgrade, and how to protect the stability problem.

According to reports, there are currently following eight mainstream reasoning framework terminal run:

In essence, no matter what the reasoning framework must contain all of the early, pre-processing, execution reasoning, post-processing, release resources which five processes, these abstract reasoning, reasoning framework that supports a variety of AoE basis. Currently AoE implemented support for both reasoning framework NCNN and TensorFlow Lite's.

Specifically, the basic AoE integrated operating environment is abstract reasoning operations through dependency inversion design, the business depends only on the upper layer of abstraction AoE, without being concerned about access specific reasoning framework for implementation. The greatest advantage of this design is that developers can always add new framework for reasoning, without modifying the framework to achieve, so that the business development and AoE SDK to develop completely decoupled.

In AoE SDK in this abstract is:

  • InterpreterComponent: to deal with the early models, implement and reasoning release resources.
  • Convertor: for treatment before and after treatment of the model output model input.

InterpreterComponent specific implementation is as follows:

/**
 * 模型翻译组件
 */
interface InterpreterComponent<TInput, TOutput> extends Component {
    /**
     * 初始化,推理框架加载模型资源
     *
     * @param context      上下文,用与服务绑定
     * @param modelOptions 模型配置列表
     * @return 推理框架加载
     */
    boolean init(@NonNull Context context, @NonNull List<AoeModelOption> modelOptions);
 
    /**
     * 执行推理操作
     *
     * @param input 业务输入数据
     * @return 业务输出数据
     */
    @Nullable
    TOutput run(@NonNull TInput input);
 
    /**
     * 释放资源
     */
    void release();
 
    /**
     * 模型是否正确加载完成
     *
     * @return true,模型正确加载
     */
    boolean isReady();
}

Convertor specific implementation is as follows:

interface Convertor<TInput, TOutput, TModelInput, TModelOutput> {
    /**
     * 数据预处理,将输入数据转换成模型输入数据
     *
     * @param input 业务输入数据
     * @return 模型输入数据
     */
    @Nullable
    TModelInput preProcess(@NonNull TInput input);
 
    /**
     * 数据后处理,将模型输出数据转换成业务输出数据
     *
     * @param modelOutput 模型输出数据
     * @return
     */
    @Nullable
    TOutput postProcess(@Nullable TModelOutput modelOutput);
}

AoE 还有另一个特性是具有稳定性保障。众所周知,Android 平台开发的一个重要的问题是机型适配,尤其是包含大量 Native 操作的场景,机型适配的问题尤其重要,一旦应用在某款机型上面崩溃,造成的体验损害是巨大的。

有数据表明,因为性能问题,移动 App 每天流失的活跃用户占比 5%,这些流失的用户,6 成的用户选择了沉默,不再使用应用,3 成用户改投竞品,剩下的用户会直接卸载应用。因此,对于一个用户群庞大的移动应用来说,保证任何时候 App 主流程的可用性是一件最基本、最重要的事。

结合 AI 推理过程来看,不可避免地,会有大量的操作发生在 Native 过程中,不仅仅是推理操作,还有一些前处理和资源回收的操作也比较容易出现兼容问题。为此,AoE 运行时环境 SDK 为 Android 平台上开发了独立进程的机制,让 Native 操作运行在独立进程中,同时保证了推理的稳定性(偶然性的崩溃不会影响后续的推理操作)和主进程的稳定性(主进程任何时候不会崩溃)。

具体实现过程主要有三个部分:注册独立进程、异常重新绑定进程以及跨进程通信优化。

第一个部分,注册独立进程,在 Manifest 中增加一个 RemoteService 组件,代码如下:

<application>
    <service
        android:name=".AoeProcessService"
        android:exported="false"
        android:process=":aoeProcessor" />
 
</application>

第二个部分,异常重新绑定独立进程,在推理时,如果发现 RemoteService 终止了,执行 “bindService()” 方法,重新启动 RemoteService。

@Override
public Object run(@NonNull Object input) {
    if (isServiceRunning()) {
        ...(代码省略)//执行推理
    } else {
        bindService();//重启独立进程
    }
    return null;
}

第三个部分,跨进程通信优化,因为独立进程,必然涉及到跨进程通信,在跨进程通信里最大的问题是耗时损失,这里,有两个因素造成了耗时损失:

  • 传输耗时
  • 序列化/反序列化耗时

相比较使用 binder 机制的传输耗时,序列化/反序列化占了整个通信耗时的 90%。由此可见,对序列化/反序列化的优化是跨进程通信优化的重点。对比了当下主流的序列化/反序列化工具,最终 AoE 集成运行环境使用了 kryo 库进行序列化/反序列。以下是对比结果,数据参考《各种 Java 的序列化库的性能比较测试结果》。

目前 AoE SDK 已经在滴滴银行卡 OCR 上应用使用,想更加清晰地理解 AoE 和推理框架、宿主 App 的关系,可以通过下面的业务集成示意图来了解它:

已经开源的运行时环境 SDK 包括 Android 和 iOS 平台,此外 Linux 平台运行时环境 SDK 正在紧锣密鼓地开发中,预计在 9 月底也会释出。

详情查看:

Guess you like

Origin www.oschina.net/news/109461/didi-opensourced-aoe