Paddle Lite v2.3 release, model compression 75%, 20% faster reasoning

Paddle Lite v2.3 released, new features include:

  • Support "after training without calibration data quantitative" approach, the model compression up to 75% .

  • Optimize network structure and OP, ARM reasoning speed up the CPU upgrade over 20%.

  • Simplified model operation process optimization tools, support for a key operation , the user is easier to use.

The upgrade brought about changes in the following areas:

Support "after training without calibration data quantitative" model-compression up to 75%

Deployment of deep learning model on mobile phones and other devices, usually take into account the reasoning speed and storage space. On the one hand requires reasoning faster, the better, on the other hand requires a more lightweight model. To solve this problem, the model to quantify technology is particularly critical.

Quantization model is the use of a smaller number of bits represented by the heavy weights of the neural network and activating, can greatly reduce the volume of the model, the terminal device to address the limited storage space problem, while accelerating speed inference model. OP model specific weights to the quantization type from FP32 INT8 / 16 type, can significantly reduce the volume of the model. Validated, the amount of weight into INT16 type, quantization of the model to reduce the volume by 50%; the amount of weight into INT8 type, quantization 75% reduction in the volume of the model.

Paddle Lite quantify the combined Flying paddle compression tool PaddleSlim, offers three methods yield quantitative model for developers: to quantify the training, after the training exercise after calibration data to quantify and no calibration data to quantify.

The "after training without calibration data quantification" is one of the new Lite version of this Paddle important new content.

The method of processing a schematic model of a quantization output three kinds of FIG.

"After training without calibration data quantitative" approach, while maintaining almost the same precision, no sample data for developers to use more convenient, more extensive range of applications.

Of course, if desired while reducing the volume of the model and to accelerate the speed of reasoning model, developers can attempt to employ PaddleSlim "After the calibration data have quantization training" method and the "quantization training" method.

PaddleSlim quantization functions in addition, the model also incorporates compression common cut, distillation, model structure search, search hardware model or the like.

Below MoblieNetV1, MoblieNetv2 ResNet50 model and an example, the effect obtained by the present process.

FIG 2 "before training quantization calibration data" Volume quantization model comparison output method of FIG.

Figure 2 shows, INT16 format quantitative model, compared FP32, reduced by 50% by volume model; INT8 format quantitative model, compared FP32, model volume reduced by 75%.

Figure 3 "of the calibration data before training quantization" quantization accuracy of model output comparison method FIG.

Figure 3 shows, INT16 format quantitative model, compared FP32, accuracy rate unchanged; INT8 quantitative model format, FP32, accuracy rate lower than faint. 

ARM CPU upgrade reasoning speed up over 20%

Paddle Lite v2.3 major update in the ARM CPU performance optimization include:

  • Kernel Size for Conv 3 * 3 to achieve Winograd method comprising F (6,3), and F (2, 3). Winograd algorithm due to a substantial decrease compared to ordinary from the computation, the implementation can significantly improve the performance of the model related OP, such ResNet50 and SqueezeNet.

  • For the Conv activated as Relu6 or LeakyRelu model, add Conv + Relu6 / LeakyRelu fusion, which can reduce memory access time-consuming individual needs of the activation function.

  • For PaddlePaddle1.6 OP upgrade, such as support for any of Conv Padding and Pooling, Paddle Lite associated with increased support. When the working model that Tensorflow conversion, a corresponding one of Tensorflow Conv Paddle Conv, rather than two Padding + Conv the OP, and thus can improve the performance of Tensorflow reasoning model.

Figure 4 shows the frame MobileNetV1 Caffe, MobileNetV2 and ResNet50 three models on a Paddle Lite, NCNN inference delay and frame comparison MNN FIG.

Comparative 4 Caffe frame inference delay model of FIG.

Seen from FIG. 4, Paddle Lite overall performance is better than the other frame. As ResNet50 models on Qualcomm Xiaolong 845, Paddle Lite compared to other frameworks, MNN faster than 10.259%, 17.094% faster than NCNN.

For ONNX model as disclosed ShuffleNet, SqueezeNet and ResNet50, reasoning delay contrast Paddle Lite, MNN and NCNN frame, which results are shown in FIG.

Compare FIG. 5 ONNX inference delay Framework Model

Seen from FIG. 5, Paddle Lite overall performance is better than the other frame. As ShuffleNet models on Qualcomm Xiaolong 845, Paddle Lite compared to other frameworks, 21.185% faster than the MNN, 26.36% faster than NCNN.

Tensorflow disclosed model, such MnasNet, MobileNetV1 and ResNet101, Paddle Lite framework and reasoning MNN contrast reasoning delay performance results shown in Figure 6.

FIG 6 Tensorflow inference delay Comparative Framework Model

It is seen from FIG. 6, Paddle Lite overall superior performance MNN frame. As MnasNet model, Qualcomm snapdragon 855, Paddle Lite faster 12.06% than the MNN; on Qualcomm Xiaolong 845, Paddle Lite faster 18.91% than the MNN; on Qualcomm Xiaolong 835, Paddle Lite faster 18.61% than MNN.

The new version is more detailed performance data, see the GitHub Benchmark:

https://paddle-lite.readthedocs.io/zh/latest/benchmark/benchmark.html

Simplified model operation process optimization tools, support for a key operation, the user is easier to use 

For third-party sources (Tensorflow, Caffe, ONNX) model, generally require two conversions to get Paddle Lite optimization model. X2paddle first use third-party tools to model into PaddlePaddle format conversion tool to Padde Lite support model to optimize re-use model. Meanwhile, Paddle Lite transformed model, the model structure and parameters typically include two files. Complicated operation, the user experience is not very good.

In response to these problems, Paddle Lite v2.3 optimization tools model_optimize_tool the original model has been upgraded version of the model introduced optimization tools --opt. opt highlights include the following three:

  • It provides one-click script (auto_transformer.sh), support for a key to complete all processing operations from a variety of optimization framework model to Paddle Lite model (including OP operator integration, optimization of operating memory multiplexing, etc.) are.

  • Final optimized model .nb only generate a file that contains the model of the network structure and the parameter information. While providing loading model .nb file API interface: set_model_from_file (nb_path), specific interfaces, see [Model Load API]. Loading the original model is still supported.

  • Provides a wealth of log information, such as support for what used to view a model operator; also see which hardware support, and hardware support Paddle Lite which respectively support operator (Figure 7), then to understand the model of Paddle Lite support for.

Figure 7 log information

Opt for a more detailed description, see the GitHub opt introduction and use of tools:

https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html

Other upgrades

1. Document official website upgrade

In order to improve document readability, improve the visual effect of the document, to facilitate users to quickly find documents and easily get started using the Paddle Lite, Paddle Lite document for a comprehensive upgrade. Documents directory visible, more powerful search capabilities, to provide users with a better reading experience.

8 new document interface schematic diagram

Meanwhile, Paddle Lite v2.3 improve the parts of the document, and the use of some new documents, such as "have the training quantification method of calibration data", "no calibration data after training quantitative methods" Using documents. 

Case 2. Paddle Lite Demo warehouse upgrade

Paddle Lite Demo Case for existing warehouses were content upgrade, and add the Demo. For example, in the Android Demo, add face detection (face-detection) Demo, YOLOv3 Demo object detection and segmentation portrait (Human-Segment) Demo. Users can easily experiment according Demo and reference implementations development of new applications. In addition, the CXX Demo BANK Paddle Lite warehouse, add a mask to identify cases, to do whatever the outbreak contribution. Small partners who are interested can download a mask recognition in Demo Paddle Lite warehouse, experiment.

FIG 9 shows a mask recognition

Meanwhile, in order to improve the interface easy to use API, upgrade the C ++ API interface and a Java API interfaces. In the Java API interfaces, the new setting and return data types to support different types of input.

Learn more, see PaddlePaddle official website , update instructions .

Guess you like

Origin www.oschina.net/news/114105/paddle-lite-2-3-released