1. Introduction to TFlite

( 1 ) TFlite concept

tflite is Google's own lightweight inference library. Mainly used on mobile.

The idea used by tflite is mainly to convert the pre-trained model into a tflite model file and get it deployed on the mobile terminal.

The source model of tflite can come from the saved model or frozen model of tensorflow , or from keras .

( 2 ) Advantages of TFlite

Use Flatbuffer to serialize model files, this format takes up less disk and loads faster

The model can be quantized, and the float parameter can be quantized into uint8 type, the model file is smaller and the calculation is faster.

Models can be pruned, structure merged and distilled.

Support for NNAPI. The underlying interface of Android can be called to make use of heterogeneous computing capabilities.

( 3 ) TFlite quantization

a. Quantified benefits

Smaller storage size : Small models take up less storage space on the user's device. For example, an Android application that uses a small model will take up less storage space on the user's mobile device.

Smaller download size : Small models take less time and less bandwidth to download to the user's device.

Less memory usage : Small models use less memory at runtime, freeing up memory for other parts of the app, which can translate into better performance and stability.

b. The process of quantification

The quantization of tflite is not calculated using uint8 throughout. Instead, store the maximum and minimum values of each layer, and then linearly divide this interval into 256 discrete values, so each floating-point number in this range can be represented by an eight-bit (binary) integer, which is approximately the nearest discrete value value. For example, where the minimum value is -3 and the maximum value is 6, 0 bytes represent -3, 255 represents 6, and 128 represents 1.5. Each operation is first calculated using integers and recast to floating point on output. The figure below is a schematic diagram of quantized Relu.

Tensorflow official quantitative documentation

c. Realization of Quantification

Dynamic quantization after training

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
#converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_model1 = converter.convert()
open("xxx.tflite", "wb").write(tflite_model1)

float16 quantization after training

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()
tflite_model1 = converter.convert()
open("xxx.tflite", "wb").write(tflite_model1)

int8 quantization after training

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
for _ in range(num_calibration_steps):
# Get sample input data as a numpy array in a method of your choosing.
yield [input]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_model1 = converter.convert()
open("xxx.tflite", "wb").write(tflite_model1)

Note: float32 and float16 quantization can run on the GPU, int8 quantization can only run on the CPU

2.TFlite model conversion

( 1 ) Save the tflite model during training

import tensorflow as tf
img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
val = img + tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
out = tf.identity(val, name="out")
with tf.Session() as sess:
tflite_model = tf.lite.toco_convert(sess.graph_def, [img], [out])
open("converteds_model.tflite", "wb").write(tflite_model)

( 2 ) Convert TensorFlow models in other formats to tflite models

First of all, you need to install Bazel, refer to: https://docs.bazel.build/versions/master/install-ubuntu.html, you only need to complete the Installing using binary installer part. Then clone the source code of TensorFlow:

git clone https://github.com/tensorflow/tensorflow.git

Then compile the conversion tool, which may take a long time to compile:

cd tensorflow/
bazel build tensorflow/python/tools:freeze_graph
bazel build tensorflow/lite/toco:toco

After getting the conversion tool, start to convert the model, the following operations are to freeze the graph:

input_graph corresponds to the .pb file;

input_checkpoint corresponds to mobilenet_v1_1.0_224.ckpt.data-00000-of-00001, and the suffix is removed when using it.

output_node_names can be obtained in mobilenet_v1_1.0_224_info.txt.

./freeze_graph --input_graph=/mobilenet_v1_1.0_224/mobilenet_v1_1.0_224_frozen.pb \
--input_checkpoint=/mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.ckpt \
--input_binary=true \
--output_graph=/tmp/frozen_mobilenet_v1_224.pb \
--output_node_names=MobilenetV1/Predictions/Reshape_1

Convert the frozen graph to a tflite model:

input_file is a graph that has been frozen;

output_file is the converted output path;

output_arrays这个可以在mobilenet_v1_1.0_224_info.txt中获取；

input_shapes这个是预测数据的shape

./toco --input_file=/tmp/mobilenet_v1_1.0_224_frozen.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--output_file=/tmp/mobilenet_v1_1.0_224.tflite \
--inference_type=FLOAT \
--input_type=FLOAT \
--input_arrays=input \
--output_arrays=MobilenetV1/Predictions/Reshape_1 \
--input_shapes=1,224,224,3

（3）使用检查点进行模型转换

将tensorflow模型保存成.pb文件

import tensorflow as tf
from tensorflow.python.framework import graph_util
from tensorflow.python.platform import gfile

if __name__ == "__main__":
    a = tf.Variable(tf.constant(5.,shape=[1]),name="a")
    b = tf.Variable(tf.constant(6.,shape=[1]),name="b")
    c = a + b
    init = tf.initialize_all_variables()
    sess = tf.Session()
    sess.run(init)
    #导出当前计算图的GraphDef部分
    graph_def = tf.get_default_graph().as_graph_def()
    #保存指定的节点，并将节点值保存为常数
    output_graph_def = graph_util.convert_variables_to_constants(sess,graph_def,['add'])
    #将计算图写入到模型文件中
    model_f = tf.gfile.GFile("model.pb","wb")
    model_f.write(output_graph_def.SerializeToString())

模型文件的读取

Bash
sess = tf.Session()
    #将保存的模型文件解析为GraphDef
    model_f = gfile.FastGFile("model.pb",'rb')
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(model_f.read())
    c = tf.import_graph_def(graph_def,return_elements=["add:0"])
    print(sess.run(c))
    #[array([ 11.], dtype=float32)]

pb文件转tflite

Python
import tensorflow as tf

in_path=r'D:\tmp_mobilenet_v1_100_224_classification_3\output_graph.pb'
out_path=r'D:\tmp_mobilenet_v1_100_224_classification_3\output_graph.tflite'

input_tensor_name=['Placeholder']
input_tensor_shape={'Placeholder':[1,224,224,3]}

class_tensor_name=['final_result']

convertr=tf.lite.TFLiteConverter.from_frozen_graph(in_path,input_arrays=input_tensor_name
                                                   ,output_arrays=class_tensor_name
                                                   ,input_shapes=input_tensor_shape)

# convertr=tf.lite.TFLiteConverter.from_saved_model(saved_model_dir=in_path,input_arrays=[input_tensor_name],output_arrays=[class_tensor_name])
tflite_model=convertr.convert()

with open(out_path,'wb') as f:
    f.write(tflite_model)

3. Call the TFlite model file on the Android side

( 1 ) The process of calling the TFlite model in Android studio to implement reasoning

define an interpreter

Initialize the interpreter (load the tflite model)

Load pictures into buffer in Android

Executing graphs with an interpreter (inference)

Display the results of inference in the app

( 2 ) Import TFLite model steps in Android Studio

Create a new or open an existing Android project.

Open the TFLite model import dialog via the menu item File > New > Other > TensorFlow Lite Model .

Select the model file with the extension .tflite. Model files can be downloaded from the Internet or trained by yourself.

The imported .tflite model file is located under the ml/ folder of the project.

The model mainly includes the following three kinds of information:

Model: Include model name, description, version, author, etc.

tensors: input and output tensors. For example, images need to be pre-processed to a suitable size before reasoning can be performed.

Mobile Deep Learning Deployment: TFlite