8bit化模型
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=tensorflow_inception_graph.pb \
--out_graph=optimized_inception_graph.pb \
--inputs='Mul' \
--outputs='softmax' \
--transforms='
add_default_attributes
strip_unused_nodes(type=float, shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
quantize_weights
quantize_nodes
strip_unused_nodes
sort_by_execution_order'
result:
native : benchmark_model.cc:600 Average inference timings in us: Warmup: 1187581, no stats: 592301, with stats: 674790
native : stats_calculator.cc:170 Number of nodes executed: 924
native : stats_calculator.cc:280 ============================== Top by Memory Use ==============================
native : stats_calculator.cc:280 [node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
native : stats_calculator.cc:280 DepthwiseConv2dNative 438.627 40.317 40.317 0.414% 0.414% 8462.336 0 conv_dw_17/depthwise
native : stats_calculator.cc:280 QuantizedAdd 506.209 9.001 9.001 0.093% 0.507% 8388.616 0 conv_dw_17_bn/batchnorm/add_1/eightbit
native : stats_calculator.cc:280 QuantizedMul 488.168 3.781 3.781 0.039% 0.546% 8388.616 0 conv_dw_17_bn/batchnorm/mul_1/eightbit
native : stats_calculator.cc:280 Dequantize 532.036 5.687 5.687 0.058% 0.604% 8388.608 0 conv_dw_17_relu/Relu
native : stats_calculator.cc:280 ConcatV2 432.934 5.647 5.647 0.058% 0.662% 8388.608 0 concatenate_4/concat
native : stats_calculator.cc:280 Conv2DBackpropInput 402.799 7.861 7.861 0.081% 0.743% 5242.880 0 conv2d_transpose_4/conv2d_transpose
native : stats_calculator.cc:280 QuantizedBiasAdd 416.214 5.535 5.535 0.057% 0.800% 4194.312 0 conv2d_transpose_4/BiasAdd/eightbit
native : stats_calculator.cc:280 QuantizedAdd 103.805 4.708 4.708 0.048% 0.849% 4194.312 0 conv_pw_1_bn/batchnorm/add_1/eightbit
native : stats_calculator.cc:280 QuantizedConv2D 77.902 18.453 18.453 0.190% 1.038% 4194.312 0 conv_pw_1_bn/batchnorm/mul_1/eightbit
native : stats_calculator.cc:280 Dequantize 429.111 3.780 3.780 0.039% 1.077% 4194.304 0 conv2d_transpose_4/BiasAdd
native : stats_ca
无视特殊ops的转化成tflite的模型的benchmark: 然后会报“未知ops”的错误:
STARTING!
Num runs: [50]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Warmup runs: [1]
Graph: [/data/local/tmp/models/op_MS512.tflite]
Input layers: [input_1]
Input shapes: [1,512,512,3]
Use nnapi : [0]
nnapi error: unable to open library libneuralnetworks.so
Loaded model /data/local/tmp/models/op_MS512.tflite
resolved reporter
Didn't find custom op for name 'Stack' with version 1
Didn't find custom op for name 'TensorFlowMax' with version 1
Didn't find custom op for name 'TensorFlowMin' with version 1
Didn't find custom op for name 'TensorFlowShape' with version 1
Didn't find custom op for name 'Dequantize' with version 1
Didn't find custom op for name 'QuantizeV2' with version 1
Didn't find custom op for name 'QuantizedAdd' with version 1
Didn't find custom op for name 'QuantizedBiasAdd' with version 1
Didn't find custom op for name 'QuantizedConv2D' with version 1
Didn't find custom op for name 'QuantizedMul' with version 1
Didn't find custom op for name 'QuantizedRelu' with version 1
Didn't find custom op for name 'QuantizedResizeBilinear' with version 1
Didn't find custom op for name 'RequantizationRange' with version 1
Didn't find custom op for name 'Requantize' with version 1
Didn't find custom op for name 'ReorderAxes' with version 1
Registration failed.
Failed to construct interpreter
[2] + Stopped (signal) /data/local/tmp/benchmark_model_lite --graph=/data/local/tmp/models/op_MS512.tflite --input_layer="input_1" --input_layer_shape="1,512,512,3" --input_layer_type="float" --output_layer="proba/Sigmoid:0" --show_run_order=false --
[1] - Aborted /data/local/tmp/benchmark_model_lite --graph=/data/local/tmp/models/op_MS512.tflite --input_layer="input_1" --input_layer_shape="1,512,512,3" --input_layer_type="float" --output_layer="proba/Sigmoid:0" --show_run_order=false --
shrink:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=MS512_O.pb \
--out_graph=optimized_MS512_O2.pb \
--inputs='input_1' \
--outputs='proba/Sigmoid' \
--transforms='
strip_unused_nodes(type=float, shape="1,512,512,3")
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
round_weights
result:
native : benchmark_model.cc:600 Average inference timings in us: Warmup: 625681, no stats: 451567, with stats: 497953
native : stats_calculator.cc:170 Number of nodes executed: 401
native : stats_calculator.cc:280 ============================== Top by Memory Use ==============================
native : stats_calculator.cc:280 [node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
native : stats_calculator.cc:280 DepthwiseConv2dNative 337.052 43.531 43.531 0.424% 0.424% 8462.336 0 conv_dw_17/depthwise
native : stats_calculator.cc:280 ConcatV2 329.394 7.611 7.611 0.074% 0.498% 8388.608 0 concatenate_4/concat
native : stats_calculator.cc:280 Conv2DBackpropInput 316.519 9.466 9.466 0.092% 0.590% 5242.880 0 conv2d_transpose_4/conv2d_transpose
native : stats_calculator.cc:280 Conv2D 42.898 12.100 12.100 0.118% 0.707% 4194.304 0 conv_pw_1_bn/batchnorm/mul_1
native : stats_calculator.cc:280 ConcatV2 419.190 2.549 2.549 0.025% 0.732% 3145.728 0 concatenate_5/concat
native : stats_calculator.cc:280 DepthwiseConv2dNative 22.917 12.987 12.987 0.126% 0.859% 2115.584 0 conv_dw_1/depthwise
native : stats_calculator.cc:280 Conv2D 5.609 12.207 12.207 0.119% 0.977% 2097.152 0 conv_0_bn/batchnorm/mul_1
native : stats_calculator.cc:280 DepthwiseConv2dNative 301.827 6.540 6.540 0.064% 1.041% 1196.032 0 conv_dw_16/depthwise
native : stats_calculator.cc:280 ResizeBilinear 466.228 8.946 8.946 0.087% 1.128% 1048.576 0 logits/ResizeBilinear
native : stats_calculator.cc:280 Conv2D 404.017 12.080 12.080 0.118% 1.246% 1048.576 0 conv_pw_17_bn/batchnorm/mul_1
native : stats_calculator.cc:280
native : stats_calculator.cc:280 ====================
单quantize_weights后的模型
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=MS512_O.pb \
--out_graph=optimized_MS512_O2.pb \
--inputs='input_1' \
--outputs='proba/Sigmoid' \
--transforms='
strip_unused_nodes(type=float, shape="1,512,512,3")
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
quantize_weights'
result:
native : benchmark_model.cc:600 Average inference timings in us: Warmup: 836670, no stats: 449421, with stats: 493879
native : stats_calculator.cc:170 Number of nodes executed: 401
native : stats_calculator.cc:280 ============================== Top by Memory Use ==============================
native : stats_calculator.cc:280 [node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
native : stats_calculator.cc:280 DepthwiseConv2dNative 319.694 50.856 50.856 0.500% 0.500% 8462.336 0 conv_dw_17/depthwise
native : stats_calculator.cc:280 ConcatV2 313.943 5.702 5.702 0.056% 0.556% 8388.608 0 concatenate_4/concat
native : stats_calculator.cc:280 Conv2DBackpropInput 302.511 8.018 8.018 0.079% 0.635% 5242.880 0 conv2d_transpose_4/conv2d_transpose
native : stats_calculator.cc:280 Conv2D 42.193 11.882 11.882 0.117% 0.751% 4194.304 0 conv_pw_1_bn/batchnorm/mul_1
native : stats_calculator.cc:280 ConcatV2 411.013 2.256 2.256 0.022% 0.774% 3145.728 0 concatenate_5/concat
native : stats_calculator.cc:280 DepthwiseConv2dNative 21.453 12.857 12.857 0.126% 0.900% 2115.584 0 conv_dw_1/depthwise
native : stats_calculator.cc:280 Conv2D 4.266 12.116 12.116 0.119% 1.019% 2097.152 0 conv_0_bn/batchnorm/mul_1
native : stats_calculator.cc:280 DepthwiseConv2dNative 287.924 6.535 6.535 0.064% 1.083% 1196.032 0 conv_dw_16/depthwise
native : stats_calculator.cc:280 ResizeBilinear 456.436 8.882 8.882 0.087% 1.171% 1048.576 0 logits/ResizeBilinear
native : stats_calculator.cc:280 Conv2D 393.573 14.375 14.375 0.141% 1.312% 1048.576 0 conv_pw_17_bn/batchnorm/mul_1
native : stats_calculator.cc:280
native : stats_calculator.cc:280 ====================