【keras模型量化】之 训练后量化 Post-Training Quantization(TFLite)

常用量化有三种方式:

  1. tf.lite.TFLiteConverter

https://tensorflow.google.cn/lite/api_docs/python/tf/lite/TFLiteConverter

为训练后量化,可直接根据keras模型结构,或模型保存文件(savedmodel),进行量化。

量化可分为只量化weights,只量化weights和activations,只量化weights、activation和input/output。

  1. tensorflow_model_optimization.quantization

https://tensorflow.google.cn/model_optimization/guide/quantization/training_comprehensive_guide

量化感知训练。训练后量化更易使用,但量化感知训练在模型准确率方面的表现通常更好。

  1. tf.quantization

针对参数进行假量化,即参数量化为如8bit的数值,但是参数类型还是float。

Tensorflow 2.x 官方文档

tf.lite:https://tensorflow.google.cn/lite/api_docs/python/tf/lite

tf.lite.TFLiteConverter:https://tensorflow.google.cn/lite/api_docs/python/tf/lite/TFLiteConverter

quantization aware training:https://tensorflow.google.cn/model_optimization/guide/quantization/training

post-training quantization:https://tensorflow.google.cn/lite/performance/post_training_quantization

post-training integer quantization:https://tensorflow.google.cn/lite/performance/post_training_integer_quant

注:这里的网址都是可在国内直接打开的(即tensorflow.google.cn,而不是tensorflow.org)

1. tflite概述

ref: https://zhuanlan.zhihu.com/p/66346329

tflite是谷歌的一个轻量级推理库,主要用于移动端。tflite使用的思路主要是从预训练的模型转换为tflite模型文件,拿到移动端部署。tflite的源模型可以来自tensorflow的saved model或者frozen model,也可以来自keras。

TFLite是为了将深度学习模型部署在移动端和嵌入式设备的工具包,可以把训练好的TF模型通过转化、部署和优化三个步骤,达到提升运算速度,减少内存、显存占用的效果。

如下图,TFlite主要由Converter(左)和Interpreter(右)组成。Converter负责把TensorFlow训练好的模型转化,并输出为.tflite文件(FlatBuffer格式)。转化的同时,还完成了对网络的优化,如量化。Interpreter则负责把.tflite部署到移动端,嵌入式(embedded linux device)和microcontroller,并高效地执行推理过程,同时提供API接口给Python,Objective-C,Swift,Java等多种语言。简单来说,Converter负责打包优化模型,Interpreter负责高效易用地执行推理。

ref:https://zhuanlan.zhihu.com/p/66346329

2. 生成tflite

2.1 生成方式

官网上显示主要有以下三种方式:

2.1.1 Converting a SavedModel to a TensorFlow Lite model.

# Converting a SavedModel to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()

2.1.2 Converting a tf.Keras model to a TensorFlow Lite model.

# Converting a tf.Keras model to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

2.1.3 Converting ConcreteFunctions to a TensorFlow Lite model.

# Converting ConcreteFunctions to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_concrete_functions([func])
tflite_model = converter.convert()

2.2 示例说明

如已有keras model, 如ds-cnn(ref:https://blog.csdn.net/u010637291/article/details/108257312
github上ML-KWS-for-MCU:https://github.com/ARM-software/ML-KWS-for-MCU

2.2.1 未量化的keras model直接生成tflite

## convert to tflite: without quantization
## Approach 1:
models.convert_from_unquant_keras_model_to_tflite(model, './saved_model/tflite/converted_from_unquant_keras_model.tflite')
def convert_from_unquant_keras_model_to_tflite(model, filename_tflite):
    '''
    OK
    :param model:
    :return:
    '''
    # Converting a tf.Keras model to a TensorFlow Lite model.
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    # To quant weights: ERROR (occured when using vela)
    # converter.optimizations = [tf.lite.Optimize.DEFAULT]  ## ERROR occured when using vela
    tflite_model = converter.convert()
    open(filename_tflite, "wb").write(tflite_model)

2.2.2 未量化的模型保存文件(.pb)生成tflite

## Approach 2:
model.save('./saved_model/saved_keras_model')
models.convert_from_unquant_saved_model_to_tflite('./saved_model/saved_keras_model', './saved_model/tflite/converted_from_unquant_saved_model.tflite')

模型的保存可参考:https://blog.csdn.net/u010637291/article/details/107357308

直接model.save为保存模型的结构和权值:
在这里插入图片描述

def convert_from_unquant_saved_model_to_tflite(saved_model_dir, filename_tflite):
    '''
    OK
    :return:
    '''
    if saved_model_dir != '':
        converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)  # only support .pb/.pbtxt
        # Unsuccessfully quantize weights
        # converter.optimizations = [tf.lite.Optimize.DEFAULT]  # DEFAULT/OPTIMIZE_FOR_LATENCY/_LATENCY ERROR
        tflite_model = converter.convert()
        open(filename_tflite, "wb").write(tflite_model)
    else:
        print('saved_model_dir is empty, please specify it!')

注:以上生成的tflite均是未量化的。在实际中可能遇到需要量化后,再转换为tflite。

3. 模型量化

模型量化,通常分为训练中量化(Quantization aware training)和训练后量化(Post-training quantization

Quantization aware training:https://tensorflow.google.cn/model_optimization/guide/quantization/training

Post-training quantization:https://tensorflow.google.cn/lite/performance/post_training_quantization

3.1 量化方式

3.1.1 Quantization aware training:tfmot.quantization.keras.quantize_model

ref:https://tensorflow.google.cn/model_optimization/guide/quantization/training_comprehensive_guide

安装包:

pip uninstall -y tensorflow
pip install -q tf-nightly
pip install -q tensorflow-model-optimization

如已有一keras model:

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_split=0.1,
)

量化方式为:

import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

q_aware_model.summary()

实际上,如有自定义层,不支持,也没有找到解决方案

3.1.2 Post-training quantization:tf.lite.TFLiteConverter

训练后量化概述:

在这里插入图片描述

3.1.2.1 仅量化weights

对于已训练好的模型,可仅量化weights(即fixed parameters),即运用 dynamic range quantization

converter = tf.lite.TFLiteConverter.from_keras_model(model)

# This line 
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model_quant = converter.convert()

即相对于未量化的模型转换为tflite,仅添加了一行converter.optimizations = [tf.lite.Optimize.DEFAULT]

3.1.2.2 量化weights和variables

def representative_data_gen():
  for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
    # Model has only one input so each data point has one element.
    yield [input_value]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

tflite_model_quant = converter.convert()

即需要设定converter.representative_dataset = representative_data_gen

3.1.2.3 量化weights/variables/input/output tensors

def representative_data_gen():
  for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
    yield [input_value]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to uint8 (APIs added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

tflite_model_quant = converter.convert()

即在上面基础上设定:
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

3.2 示例说明

3.2.1 量化keras model并转换为tflite

在将训练好的模型keras model利用tf.lite.TFLiteConverter转换为tflite模型时,还可进行量化:

convert_from_quant_keras_model_to_tflite(model, rep_dataset, './saved_model/tflite/converted_from_quant_keras_model.tflite')
def convert_from_quant_keras_model_to_tflite(model, rep_dataset, filename_tflite):
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    # Quantize weights: first set the optimizations flag to optimize for size
    converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
    # Quantize variables
    def representative_data_gen():
        for input_value in rep_dataset.take(100):
            yield [input_value]
    converter.representative_dataset = representative_data_gen
    # Quantize input/output tensors
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.uint8
    converter.inference_output_type = tf.uint8

    tflite_quant_model = converter.convert()
    
    # Write to tflite file
    open(filename_tflite, "wb").write(tflite_quant_model)

3.2.2 量化已保存模型(.pb)并转换为tflite

model.save("./saved_model/saved_keras_model")
convert_from_quant_saved_model_to_tflite('./saved_model/saved_keras_model', rep_dataset, './saved_model/tflite/converted_from_quant_saved_model.tflite')
def convert_from_quant_saved_model_to_tflite(saved_model_dir, rep_dataset, filename_tflite):
    '''
    OK
    :param saved_model_dir:
    :param rep_dataset:
    :param filename_tflite:
    :return:
    '''
    converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
    # Quantize weighjts: first set the optimizations flag to optimize for size
    converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
    # Quantize variables
    def representative_data_gen():
        for input_value in rep_dataset.take(100):
            yield [input_value]
    converter.representative_dataset = representative_data_gen
    # Quantize input/output tensors
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.uint8
    converter.inference_output_type = tf.uint8

    tflite_quant_model = converter.convert()
    # Write to tflite file
    open(filename_tflite, "wb").write(tflite_quant_model)

即现将模型保存,再将保存的模型用tf.lite.TFLiteConverter.from_saved_model 加载和量化,并生成tflite。

4. netron 工具

netron工具可很方便的查看多种类型的tensorflow模型,包括tensorflow lite模型:

主页:https://www.electronjs.org/apps/netron
github:https://github.com/lutzroeder/netron
下载:https://github.com/lutzroeder/netron/releases/tag/v4.5.1

4.1 安装

Ubuntu:

snap install netron

下载.AppImage后未安装成功,下载方式为:

wget https://github.com/lutzroeder/netron/releases/tag/v4.5.1 -o Netron-4.5.1.AppImage

4.2 使用

Ubuntu下新建虚拟环境,输入:

netron

再选择模型位置。示例,查看一个tensorflow lite 模型(.tflite):

在这里插入图片描述

如可查看模型整体架构,及每一层的参数,如上图中的卷积层的padding方式、stride步长、filter/bias等参数、输入输出tensor的数据及类型(如float32,则为未量化的模型)。

5. vela 工具

主页:https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/heads/master

vela参数说明:https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/heads/master/OPTIONS.md

5.1 安装

安装要求:
tensorflow: Vela supports TensorFlow 2.1.0.
OS:Vela runs on the Linux operating system.
Others: Python >= 3.6, pip3, GNU toolchain (GCC, Binutils and libraries) or alternative C compiler/linker toolchain.

Ubuntu下安装:

# 创建虚拟环境(指定python版本为3.6,由于python3.6不是默认python3,可指定python3.6路径)
virtualenv -p /usr/local/bin/python3.6 ~/venv/vela
# 激活虚拟环境
source ~/venv/vela/bin/activate

# 安装 ethos-u-vela
/usr/local/bin/python3.6 -m pip install ethos-u-vela

# 经测试,通过下载git再安装,不成功。

5.2 使用

vela path/to/network.tflite

经过vela工具的tflite模型前后对比如下:
1)原tflite模型(通过netron工具查看):

在这里插入图片描述

2)vela工具优化后(通过netron工具查看):

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/u010637291/article/details/108649829