Introduction to Ali MNN Reasoning Framework

1. Reference materials

MNN official website

Chinese Documentation - Yuque

Welcome to the MNN Documentation — MNN-Doc 2.1.1 documentation)

English document

MNN knowledge base

MNN official warehouse

2. Related introduction

1. Introduction to MNN

MNN is a == lightweight deep learning end-side reasoning engine ==, the core solves the problem of deep neural network model running in end-side reasoning, covers the optimization, conversion and reasoning of deep neural network models, and supports deep learning reasoning and training . Applicable to servers/personal computers/mobile phones/embedded devices. MNN is a little late to open source, but it is also an influential reasoning framework for mobile phones.

AI scientist Jia Yangqing commented: "Compared with general frameworks such as Tensorflow and Caffe2 that cover training and reasoning at the same time, MNN pays more attention to the acceleration and optimization of reasoning, and solves the problem of efficiency in the stage of model deployment. Efficiently realize the business behind the model. This coincides with the idea of ​​reasoning engines such as server-side TensorRT. In large-scale machine learning applications, considering large-scale model deployment, the calculation amount of the reasoning side of machine learning is often the training side The calculation amount is more than ten times, so the optimization of the inference side is particularly important.”

2. MNN architecture

insert image description here
insert image description here

3. MNN module design

insert image description here

4. MNN operation process

insert image description here
MNN is responsible for loading the network model, reasoning and predicting and returning relevant results. The whole reasoning process can be divided into:

  1. Model loading analysis ;
  2. Scheduling of calculation graph ;
  3. Runs efficiently on heterogeneous backends .

5. MNN application scenarios

At present, MNN has been used in more than 30 apps such as Alibaba's mobile Taobao, mobile Tmall, and Youku, covering live broadcast, short video, search recommendation, product image search, interactive marketing, rights distribution, security risk control, etc., every day Stable operation hundreds of millions of times. In addition, there are also applications in IoT devices such as rookie self-lifting cabinets. In the 2018 Double Eleven Shopping Festival, MNN was used in scenes such as smiley face red envelopes, scans, star guessing games, etc. at the Tmall party.
insert image description here

5.1 Polaroid

PolaroidIt is an image search and recognition product in Taobao. It has grown into an application with more than 10 million UVs since it was first launched in 2014 and has undergone continuous iterative development. The technology is also being iteratively updated. From the earliest cloud recognition of taking pictures and uploading pictures, it has evolved to the current object recognition and cutout on the end, and then uploading to the cloud for recognition, which effectively improves the user experience and saves computing costs on the server. For some simple object classification, everything recognition and logo recognition, it also supports real-time recognition directly through the model on the end.

5.2 Smiling Red Envelope

smiley red envelopeIt is the first program of the Double Eleven Cat Evening in 2018. This game is based on real-time face detection and expression recognition capabilities. Compared with previous interactive games through screen touch, this event uses the camera to real-time face recognition. The detection algorithm realizes the leap from traditional touch interactive gameplay to natural interactive gameplay, bringing users a new user experience.

5.3 Mobile Taobao Spring Festival Activities: Sweep New Year’s Goods, Collect Five Fortunes

JiwufuIt is the activity of the Spring Festival in 2019, and it is also the first time that Mobile Taobao has joined this activity by sweeping the New Year's goods. By scanning the product identification ability to identify the red New Year’s goods, in addition to Fuka, you can also get physical awards such as down quilts, Wuliangye, Moutai, King Crab, etc. Become a "hen" that lays golden eggs.
insert image description here

First, in order to scan New Year’s goods, Taobao used millions of pictures of New Year’s goods on the server to train a deep neural network model that can identify New Year’s goods. When the user scans the New Year's goods with the camera, Taobao will obtain the photo data in the camera. Then, preprocess the photo, including image scaling, color space conversion, etc.

Sweeping New Year is a camera-based application scenario. Using cloud AI will consume a lot of user traffic to transmit frame-by-frame photos and server-side computing resources, and the response speed will also depend on network conditions. MNN, on the other hand, can avoid network overhead through on-device AI, making the overall experience smooth and stable. It can combine the trained model and processed data to quickly make corresponding calculations and analyze the probability of new year's goods in the photos. If the probability of New Year’s goods in the photo reaches the standard set by Taobao, it can be determined that the user has scanned the New Year’s goods, and then the relevant rights and interests will be issued.

6. MNN characteristics

6.1 Lightweight

  • The main function (model reasoning CPU+GPU) has no dependencies , the code is simplified , and can be easily deployed to mobile devices and various embedded devices.
    • iOS platform: The MNN static library armv7+arm64 with full functions is about 12MB in size, and the executable file generated by linking increases in size by about 2M. After the main function can be trimmed, the size of the static library is 6.1M, the size of the executable file generated by linking is increased by 600 KB, and the size of the metallib file is about 600KB.
    • Android platform: The main function armv7a - c++_shared dynamic library size is about 800KB.
  • Support to use the Mini editing option to further reduce the package size, which can further reduce about 25% on the basis of the above library volume.
  • Support model FP16/Int8 compression and quantization, which can reduce the volume of the model by 50% - 75%.
  • In-depth customization and tailoring for the characteristics of end-side devices without any dependencies, and can be easily deployed to mobile devices and various embedded devices;

6.2 Versatility

  • Support mainstream model file formats such as Tensorflow, Caffe, ONNX, Torchscripts, and mainstream network structures such as CNN / RNN / GAN / Transformer;

  • Support multiple input and multiple output, support input and output of any dimension, support dynamic input (variable input size) , support models with control flow;

  • Rich operators, supporting 178 Tensorflow Ops, 52 Caffe Ops, 163 Torchscipts Ops, 158 ONNX Ops (ONNX is basically fully supported);

  • Support servers/personal computers/mobile phones and embedded devices with POSIX interfaces, support CPU/GPU computing of devices, and support NPU computing of some devices (IOS 11 + CoreML / Huawei + HIAI);

  • Support Windows / iOS 8.0+ / Android 4.3+ / Linux and operating systems with POSIX interface.

  • Support hybrid computing of heterogeneous devices, currently supports CPU and GPU, and can dynamically import GPU Op plug-ins to replace the implementation of CPU Op;

6.3 High Performance

  • Adapted to the CPU architecture of iOS / Android / PC / Server, write SIMD code or hand-written assembly to achieve core calculations, give full play to the computing power of the CPU, and run common CV models under a single thread close to the peak computing power of the device;
  • GPU acceleration (Metal) can be turned on on iOS devices, supporting iOS 8.0 and above, and commonly used models are faster than Apple's native CoreML;
  • Android provides three sets of solutions: OpenCL, Vulkan, and OpenGL, to meet the needs of devices as much as possible, and deeply tuned for mainstream GPUs (Adreno and Mali);
  • Support for faster inference using NVIDIA GPUs on PC/Server based on CUDA;
  • Convolution and transposed convolution algorithms are efficient and stable, and can run efficiently for convolutions of any shape. Winograd convolution algorithms are widely used to improve convolution performance, and symmetric convolutions such as 3x3 -> 7x7 are efficiently implemented. For the first time in the industry engineering practice, the Winograd algorithm optimization of transposed convolution and the Strassen algorithm optimization of matrix multiplication have been realized, and the acceleration effect has been achieved;
  • Support low-precision calculations (int8 / fp16 / bf16) to improve inference performance. It also adapts the relevant instructions of ARMv8.2 and AVX512 architectures, which have better acceleration effects under these two architectures. Additional optimizations have been made for the new architecture of ARM v8.2, and new devices can use the characteristics of half-precision calculations to further speed up;
  • Do not rely on any third-party computing library, rely on a large number of handwritten assembly to achieve core operations, and give full play to the computing power of ARM CPU;

6.4 Ease of use

  • Complete documentation and examples;
  • Supports the use of MNN operators for common numerical calculations, covering common functions of numpy;
  • Provide MNN CV module, support MNN_CV library such as image affine transformation and normalization, support commonly used image processing (less than 100 k under the armv7a architecture). In general, there is no need to additionally introduce libyuv or opencv libraries to process images;
  • Support callback mechanism, convenient to extract data or control running direction;
  • Support running part of the path in the network model, or specify parallel running between CPU and GPU;
  • Support model training under various platforms, especially model training on mobile terminals;
  • Support python calls;

7. MNN Innovation

insert image description here
insert image description here

MNN proposes three core innovations:

  • Semi-automatic search for schemas at runtime
  • Convolution Algorithm Optimization Innovation
  • Mixed Scheduling of Heterogeneous Devices

semi-automatic search

Semi-automatic search is to search and combine the most suitable calculation scheme for the model in the existing high-performance computing module according to certain rules when the model structure is known . It is a novel design idea between the fully automatic search represented by TVM (ie automatic tuning) and the fully manual search represented by NCNN (ie manually implement each case). Its core insight is that the automatic compilation and optimization of TVM is difficult to match the handwritten assembly for hardware features and operators ; at the same time, there are infinitely many case combinations of model operators and parameters, and NCNN cannot optimize for each case . In the final "Data Demonstration" section, we will use experimental data to demonstrate the advantages of MNN over fully automatic search (TVM) and fully manual search (NCNN).

In order to support the ability of semi-automatic search at runtime, MNN proposes a special process called " pre-reasoning ". During the pre-reasoning process, operator computing strategy selection and resource allocation will be performed in advance .

In general, the frequency of input size changes in deep learning applications is relatively small or can be changed to a relatively normalized size after a specific preprocessing stage. When the input size is determined, we can calculate its output size, consumption of different computing strategies, and resource requirements for each Op in the model, and use this as a basis to determine the computing strategy of each Op. Allocation of resources.

8. MNN open source

At the beginning of 2017, before we started engine development, we focused on researching system solutions and open source solutions, and made in-depth analysis from the aspects of versatility, lightweight, high performance, and security. CoreML is Apple's system framework, and MLKit and NNAPI are Android's system framework. The biggest advantage of system framework is its lightness—relatively generous in package size. andThe biggest disadvantage is versatilityCoreML requires iOS 11+, MLKit and NNAPI require Android 8.1+, the models that can be covered are very limited, and it is difficult to support the usage scenarios of embedded devices. In addition, the system framework supports fewer network types and Op types, and the scalability is poor. The computing power of the device has not been fully utilized, and there are problems in model security. To sum up, the system framework is not a good choice. In the open source solution, Tensorflow Lite has not been announced yet.Caffe is more mature but not designed and developed for end-to-side scenariosNCNN has just been released and is not mature enough. In general, we cannot find a set of simple, efficient and safe end-to-side inference engines for different training frameworks and different deployment environments.

Therefore, we want to provideDifferent Business Algorithm Scenariosdifferent training frameworksThe simplicity of different deployment environmentsefficientSafetyThe end-side inference engine MNN.It can smooth out the differences between Android and iOS, the differences between fragmented devices, and the differences between different training frameworks, realize rapid deployment and operation on the end side, and can flexibly add OP and in-depth performance of heterogeneous devices such as CPU/GPU according to the business model optimization

Over time, NCNN, Tensorflow Lite, Mace, Anakin, etc. have been gradually upgraded and open sourced, giving us good input and reference. We are constantly iterating and optimizing with business needs, and have experienced the test of Double Eleven, and have been relatively mature and complete, so we open source it to the community and hope to contribute our strength to application and IoT developers.

9. MNN Tools

Based on MNN (tensor computing engine), a series of tools are provided to support model reasoning, training and general computing:

  • MNN-Converter: Model conversion tool, composed of Frontends and Graph Optimize. The former is responsible for supporting different training frameworks. MNN currently supports Tensorflow (Lite), Caffe, ONNX (PyTorch/MXNet models can be converted to ONNX models first and then to MNN) and Torchscripts; the latter uses operator fusion, operator replacement, Optimize the graph by means of layout adjustment, etc., and generally run offline.
  • MNN-Compress: A model compression tool that compresses the MNN model under a certain precision error tolerance, reduces the model volume, and improves the running performance.
  • MNN-Express: supports model operation with control flow, and supports calling MNN operators for custom calculations.
  • MNN-CV: similar to OpenCV, but the core computing function is based on the image processing algorithm library implemented by MNN
  • MNN-Train: MNN training module, supports training on various platforms.

10. Compile MNN

mkdir build && cd build && cmake .. && make -j8

Guess you like

Origin blog.csdn.net/m0_37605642/article/details/128984605
Recommended