如下图，TFlite主要由Converter（左）和Interpreter（右）组成。Converter负责把TensorFlow训练好的模型转化，并输出为.tflite文件（FlatBuffer格式）。转化的同时，还完成了对网络的优化，如量化。Interpreter则负责把.tflite部署到移动端，嵌入式（embedded linux device）和microcontroller，并高效地执行推理过程，同时提供API接口给Python，Objective-C，Swift，Java等多种语言。简单来说，Converter负责打包优化模型，Interpreter负责高效易用地执行推理。

ref:https://zhuanlan.zhihu.com/p/66346329

2. 生成tflite

2.1 生成方式

官网上显示主要有以下三种方式：

2.1.1 Converting a SavedModel to a TensorFlow Lite model.

# Converting a SavedModel to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()

2.1.2 Converting a tf.Keras model to a TensorFlow Lite model.

# Converting a tf.Keras model to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

2.1.3 Converting ConcreteFunctions to a TensorFlow Lite model.

# Converting ConcreteFunctions to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_concrete_functions([func])
tflite_model = converter.convert()

2.2 示例说明

如已有keras model，如ds-cnn（ref：https://blog.csdn.net/u010637291/article/details/108257312）
github上ML-KWS-for-MCU：https://github.com/ARM-software/ML-KWS-for-MCU

2.2.1 未量化的keras model直接生成tflite

## convert to tflite: without quantization
## Approach 1:
models.convert_from_unquant_keras_model_to_tflite(model, './saved_model/tflite/converted_from_unquant_keras_model.tflite')

def convert_from_unquant_keras_model_to_tflite(model, filename_tflite):
    '''
    OK
    :param model:
    :return:
    '''
    # Converting a tf.Keras model to a TensorFlow Lite model.
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    # To quant weights: ERROR (occured when using vela)
    # converter.optimizations = [tf.lite.Optimize.DEFAULT]  ## ERROR occured when using vela
    tflite_model = converter.convert()
    open(filename_tflite, "wb").write(tflite_model)

2.2.2 未量化的模型保存文件（.pb）生成tflite

## Approach 2:
model.save('./saved_model/saved_keras_model')
models.convert_from_unquant_saved_model_to_tflite('./saved_model/saved_keras_model', './saved_model/tflite/converted_from_unquant_saved_model.tflite')

模型的保存可参考：https://blog.csdn.net/u010637291/article/details/107357308

直接model.save为保存模型的结构和权值：
在这里插入图片描述

def convert_from_unquant_saved_model_to_tflite(saved_model_dir, filename_tflite):
    '''
    OK
    :return:
    '''
    if saved_model_dir != '':
        converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)  # only support .pb/.pbtxt
        # Unsuccessfully quantize weights
        # converter.optimizations = [tf.lite.Optimize.DEFAULT]  # DEFAULT/OPTIMIZE_FOR_LATENCY/_LATENCY ERROR
        tflite_model = converter.convert()
        open(filename_tflite, "wb").write(tflite_model)
    else:
        print('saved_model_dir is empty, please specify it!')

注：以上生成的tflite均是未量化的。在实际中可能遇到需要量化后，再转换为tflite。

3. 模型量化

模型量化，通常分为训练中量化（Quantization aware training）和训练后量化（Post-training quantization）

Quantization aware training：https://tensorflow.google.cn/model_optimization/guide/quantization/training

Post-training quantization：https://tensorflow.google.cn/lite/performance/post_training_quantization

3.1 量化方式

3.1.1 Quantization aware training：tfmot.quantization.keras.quantize_model

ref：https://tensorflow.google.cn/model_optimization/guide/quantization/training_comprehensive_guide

安装包：

pip uninstall -y tensorflow
pip install -q tf-nightly
pip install -q tensorflow-model-optimization

如已有一keras model：

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_split=0.1,
)

量化方式为：

import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

q_aware_model.summary()

实际上，如有自定义层，不支持，也没有找到解决方案

3.1.2 Post-training quantization：tf.lite.TFLiteConverter

训练后量化概述：

在这里插入图片描述

3.1.2.1 仅量化weights

对于已训练好的模型，可仅量化weights（即fixed parameters），即运用 dynamic range quantization：

converter = tf.lite.TFLiteConverter.from_keras_model(model)

# This line 
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model_quant = converter.convert()

即相对于未量化的模型转换为tflite，仅添加了一行converter.optimizations = [tf.lite.Optimize.DEFAULT]

3.1.2.2 量化weights和variables

def representative_data_gen():
  for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
    # Model has only one input so each data point has one element.
    yield [input_value]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

tflite_model_quant = converter.convert()

即需要设定converter.representative_dataset = representative_data_gen

3.1.2.3 量化weights/variables/input/output tensors

def representative_data_gen():
  for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
    yield [input_value]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to uint8 (APIs added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

tflite_model_quant = converter.convert()

即在上面基础上设定：
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]，
converter.inference_input_type = tf.uint8，
converter.inference_output_type = tf.uint8。

3.2 示例说明

3.2.1 量化keras model并转换为tflite

在将训练好的模型keras model利用tf.lite.TFLiteConverter转换为tflite模型时，还可进行量化：

convert_from_quant_keras_model_to_tflite(model, rep_dataset, './saved_model/tflite/converted_from_quant_keras_model.tflite')

def convert_from_quant_keras_model_to_tflite(model, rep_dataset, filename_tflite):
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    # Quantize weights: first set the optimizations flag to optimize for size
    converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
    # Quantize variables
    def representative_data_gen():
        for input_value in rep_dataset.take(100):
            yield [input_value]
    converter.representative_dataset = representative_data_gen
    # Quantize input/output tensors
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.uint8
    converter.inference_output_type = tf.uint8

    tflite_quant_model = converter.convert()
    
    # Write to tflite file
    open(filename_tflite, "wb").write(tflite_quant_model)

3.2.2 量化已保存模型（.pb）并转换为tflite

model.save("./saved_model/saved_keras_model")
convert_from_quant_saved_model_to_tflite('./saved_model/saved_keras_model', rep_dataset, './saved_model/tflite/converted_from_quant_saved_model.tflite')

def convert_from_quant_saved_model_to_tflite(saved_model_dir, rep_dataset, filename_tflite):
    '''
    OK
    :param saved_model_dir:
    :param rep_dataset:
    :param filename_tflite:
    :return:
    '''
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
    # Quantize weighjts: first set the optimizations flag to optimize for size
    converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
    # Quantize variables
    def representative_data_gen():
        for input_value in rep_dataset.take(100):
            yield [input_value]
    converter.representative_dataset = representative_data_gen
    # Quantize input/output tensors
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.uint8
    converter.inference_output_type = tf.uint8

    tflite_quant_model = converter.convert()
    # Write to tflite file
    open(filename_tflite, "wb").write(tflite_quant_model)

即现将模型保存，再将保存的模型用tf.lite.TFLiteConverter.from_saved_model 加载和量化，并生成tflite。

4. netron 工具

netron工具可很方便的查看多种类型的tensorflow模型，包括tensorflow lite模型：

主页：https://www.electronjs.org/apps/netron
github：https://github.com/lutzroeder/netron
下载：https://github.com/lutzroeder/netron/releases/tag/v4.5.1

4.1 安装

Ubuntu：

snap install netron

下载.AppImage后未安装成功，下载方式为：

wget https://github.com/lutzroeder/netron/releases/tag/v4.5.1 -o Netron-4.5.1.AppImage

4.2 使用

Ubuntu下新建虚拟环境，输入：

netron

再选择模型位置。示例，查看一个tensorflow lite 模型（.tflite）：

在这里插入图片描述

如可查看模型整体架构，及每一层的参数，如上图中的卷积层的padding方式、stride步长、filter/bias等参数、输入输出tensor的数据及类型（如float32，则为未量化的模型）。

5. vela 工具

主页：https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/heads/master

vela参数说明：https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/heads/master/OPTIONS.md

5.1 安装

安装要求：
tensorflow： Vela supports TensorFlow 2.1.0.
OS：Vela runs on the Linux operating system.
Others: Python >= 3.6， pip3， GNU toolchain (GCC, Binutils and libraries) or alternative C compiler/linker toolchain.

Ubuntu下安装：

# 创建虚拟环境（指定python版本为3.6，由于python3.6不是默认python3，可指定python3.6路径）
virtualenv -p /usr/local/bin/python3.6 ~/venv/vela
# 激活虚拟环境
source ~/venv/vela/bin/activate

# 安装 ethos-u-vela
/usr/local/bin/python3.6 -m pip install ethos-u-vela

# 经测试，通过下载git再安装，不成功。

5.2 使用

vela path/to/network.tflite

经过vela工具的tflite模型前后对比如下：
1）原tflite模型（通过netron工具查看）：

在这里插入图片描述

2）vela工具优化后（通过netron工具查看）：

在这里插入图片描述

【keras模型量化】之训练后量化 Post-Training Quantization（TFLite）

文章目录

1. tflite概述

2. 生成tflite

2.1 生成方式

2.1.1 Converting a SavedModel to a TensorFlow Lite model.

2.1.2 Converting a tf.Keras model to a TensorFlow Lite model.

2.1.3 Converting ConcreteFunctions to a TensorFlow Lite model.

2.2 示例说明

2.2.1 未量化的keras model直接生成tflite

2.2.2 未量化的模型保存文件（.pb）生成tflite

3. 模型量化

3.1 量化方式

3.1.1 Quantization aware training：tfmot.quantization.keras.quantize_model

3.1.2 Post-training quantization：tf.lite.TFLiteConverter

3.1.2.1 仅量化weights

3.1.2.2 量化weights和variables

3.1.2.3 量化weights/variables/input/output tensors

3.2 示例说明

3.2.1 量化keras model并转换为tflite

3.2.2 量化已保存模型（.pb）并转换为tflite

4. netron 工具

4.1 安装

4.2 使用

5. vela 工具

5.1 安装

5.2 使用

猜你喜欢

【keras模型量化】之 训练后量化 Post-Training Quantization（TFLite）

文章目录

1. tflite概述

2. 生成tflite

2.1 生成方式

2.1.1 Converting a SavedModel to a TensorFlow Lite model.

2.1.2 Converting a tf.Keras model to a TensorFlow Lite model.

2.1.3 Converting ConcreteFunctions to a TensorFlow Lite model.

2.2 示例说明

2.2.1 未量化的keras model直接生成tflite

2.2.2 未量化的模型保存文件（.pb）生成tflite

3. 模型量化

3.1 量化方式

3.1.1 Quantization aware training：tfmot.quantization.keras.quantize_model

3.1.2 Post-training quantization：tf.lite.TFLiteConverter

3.1.2.1 仅量化weights

3.1.2.2 量化weights和variables

3.1.2.3 量化weights/variables/input/output tensors

3.2 示例说明

3.2.1 量化keras model并转换为tflite

3.2.2 量化已保存模型（.pb）并转换为tflite

4. netron 工具

4.1 安装

4.2 使用

5. vela 工具

5.1 安装

5.2 使用

猜你喜欢

【keras模型量化】之训练后量化 Post-Training Quantization（TFLite）