tensorflow 8bit化、shrink、单纯quantize_weights后的模型

8bit化模型

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=tensorflow_inception_graph.pb \
--out_graph=optimized_inception_graph.pb \
--inputs='Mul' \
--outputs='softmax' \
--transforms='
  add_default_attributes
  strip_unused_nodes(type=float, shape="1,299,299,3")
  remove_nodes(op=Identity, op=CheckNumerics)
  fold_constants(ignore_errors=true)
  fold_batch_norms
  fold_old_batch_norms
  quantize_weights
  quantize_nodes
  strip_unused_nodes
  sort_by_execution_order'

result:

native : benchmark_model.cc:600 Average inference timings in us: Warmup: 1187581, no stats: 592301, with stats: 674790
native : stats_calculator.cc:170 Number of nodes executed: 924
native : stats_calculator.cc:280 ============================== Top by Memory Use ==============================
native : stats_calculator.cc:280 	             [node type]	  [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
native : stats_calculator.cc:280 	   DepthwiseConv2dNative	  438.627	   40.317	   40.317	  0.414%	  0.414%	  8462.336	        0	conv_dw_17/depthwise
native : stats_calculator.cc:280 	            QuantizedAdd	  506.209	    9.001	    9.001	  0.093%	  0.507%	  8388.616	        0	conv_dw_17_bn/batchnorm/add_1/eightbit
native : stats_calculator.cc:280 	            QuantizedMul	  488.168	    3.781	    3.781	  0.039%	  0.546%	  8388.616	        0	conv_dw_17_bn/batchnorm/mul_1/eightbit
native : stats_calculator.cc:280 	              Dequantize	  532.036	    5.687	    5.687	  0.058%	  0.604%	  8388.608	        0	conv_dw_17_relu/Relu
native : stats_calculator.cc:280 	                ConcatV2	  432.934	    5.647	    5.647	  0.058%	  0.662%	  8388.608	        0	concatenate_4/concat
native : stats_calculator.cc:280 	     Conv2DBackpropInput	  402.799	    7.861	    7.861	  0.081%	  0.743%	  5242.880	        0	conv2d_transpose_4/conv2d_transpose
native : stats_calculator.cc:280 	        QuantizedBiasAdd	  416.214	    5.535	    5.535	  0.057%	  0.800%	  4194.312	        0	conv2d_transpose_4/BiasAdd/eightbit
native : stats_calculator.cc:280 	            QuantizedAdd	  103.805	    4.708	    4.708	  0.048%	  0.849%	  4194.312	        0	conv_pw_1_bn/batchnorm/add_1/eightbit
native : stats_calculator.cc:280 	         QuantizedConv2D	   77.902	   18.453	   18.453	  0.190%	  1.038%	  4194.312	        0	conv_pw_1_bn/batchnorm/mul_1/eightbit
native : stats_calculator.cc:280 	              Dequantize	  429.111	    3.780	    3.780	  0.039%	  1.077%	  4194.304	        0	conv2d_transpose_4/BiasAdd
native : stats_ca

无视特殊ops的转化成tflite的模型的benchmark：然后会报“未知ops”的错误:

STARTING!
Num runs: [50]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Warmup runs: [1]
Graph: [/data/local/tmp/models/op_MS512.tflite]
Input layers: [input_1]
Input shapes: [1,512,512,3]
Use nnapi : [0]
nnapi error: unable to open library libneuralnetworks.so
Loaded model /data/local/tmp/models/op_MS512.tflite
resolved reporter
Didn't find custom op for name 'Stack' with version 1

Didn't find custom op for name 'TensorFlowMax' with version 1

Didn't find custom op for name 'TensorFlowMin' with version 1

Didn't find custom op for name 'TensorFlowShape' with version 1

Didn't find custom op for name 'Dequantize' with version 1

Didn't find custom op for name 'QuantizeV2' with version 1

Didn't find custom op for name 'QuantizedAdd' with version 1

Didn't find custom op for name 'QuantizedBiasAdd' with version 1

Didn't find custom op for name 'QuantizedConv2D' with version 1

Didn't find custom op for name 'QuantizedMul' with version 1

Didn't find custom op for name 'QuantizedRelu' with version 1

Didn't find custom op for name 'QuantizedResizeBilinear' with version 1

Didn't find custom op for name 'RequantizationRange' with version 1

Didn't find custom op for name 'Requantize' with version 1

Didn't find custom op for name 'ReorderAxes' with version 1

Registration failed.

Failed to construct interpreter
[2] + Stopped (signal)     /data/local/tmp/benchmark_model_lite --graph=/data/local/tmp/models/op_MS512.tflite --input_layer="input_1" --input_layer_shape="1,512,512,3" --input_layer_type="float" --output_layer="proba/Sigmoid:0" --show_run_order=false --
[1] - Aborted              /data/local/tmp/benchmark_model_lite --graph=/data/local/tmp/models/op_MS512.tflite --input_layer="input_1" --input_layer_shape="1,512,512,3" --input_layer_type="float" --output_layer="proba/Sigmoid:0" --show_run_order=false --

shrink:

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=MS512_O.pb \
--out_graph=optimized_MS512_O2.pb \
--inputs='input_1' \
--outputs='proba/Sigmoid' \
--transforms='
  strip_unused_nodes(type=float, shape="1,512,512,3")
  fold_constants(ignore_errors=true)
  fold_batch_norms
  fold_old_batch_norms
  round_weights

result:

native : benchmark_model.cc:600 Average inference timings in us: Warmup: 625681, no stats: 451567, with stats: 497953
native : stats_calculator.cc:170 Number of nodes executed: 401
native : stats_calculator.cc:280 ============================== Top by Memory Use ==============================
native : stats_calculator.cc:280 	             [node type]	  [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
native : stats_calculator.cc:280 	   DepthwiseConv2dNative	  337.052	   43.531	   43.531	  0.424%	  0.424%	  8462.336	        0	conv_dw_17/depthwise
native : stats_calculator.cc:280 	                ConcatV2	  329.394	    7.611	    7.611	  0.074%	  0.498%	  8388.608	        0	concatenate_4/concat
native : stats_calculator.cc:280 	     Conv2DBackpropInput	  316.519	    9.466	    9.466	  0.092%	  0.590%	  5242.880	        0	conv2d_transpose_4/conv2d_transpose
native : stats_calculator.cc:280 	                  Conv2D	   42.898	   12.100	   12.100	  0.118%	  0.707%	  4194.304	        0	conv_pw_1_bn/batchnorm/mul_1
native : stats_calculator.cc:280 	                ConcatV2	  419.190	    2.549	    2.549	  0.025%	  0.732%	  3145.728	        0	concatenate_5/concat
native : stats_calculator.cc:280 	   DepthwiseConv2dNative	   22.917	   12.987	   12.987	  0.126%	  0.859%	  2115.584	        0	conv_dw_1/depthwise
native : stats_calculator.cc:280 	                  Conv2D	    5.609	   12.207	   12.207	  0.119%	  0.977%	  2097.152	        0	conv_0_bn/batchnorm/mul_1
native : stats_calculator.cc:280 	   DepthwiseConv2dNative	  301.827	    6.540	    6.540	  0.064%	  1.041%	  1196.032	        0	conv_dw_16/depthwise
native : stats_calculator.cc:280 	          ResizeBilinear	  466.228	    8.946	    8.946	  0.087%	  1.128%	  1048.576	        0	logits/ResizeBilinear
native : stats_calculator.cc:280 	                  Conv2D	  404.017	   12.080	   12.080	  0.118%	  1.246%	  1048.576	        0	conv_pw_17_bn/batchnorm/mul_1
native : stats_calculator.cc:280 
native : stats_calculator.cc:280 ====================

单quantize_weights后的模型

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=MS512_O.pb \
--out_graph=optimized_MS512_O2.pb \
--inputs='input_1' \
--outputs='proba/Sigmoid' \
--transforms='
  strip_unused_nodes(type=float, shape="1,512,512,3")
  fold_constants(ignore_errors=true)
  fold_batch_norms
  fold_old_batch_norms
  quantize_weights'

result:

native : benchmark_model.cc:600 Average inference timings in us: Warmup: 836670, no stats: 449421, with stats: 493879
native : stats_calculator.cc:170 Number of nodes executed: 401
native : stats_calculator.cc:280 ============================== Top by Memory Use ==============================
native : stats_calculator.cc:280 	             [node type]	  [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
native : stats_calculator.cc:280 	   DepthwiseConv2dNative	  319.694	   50.856	   50.856	  0.500%	  0.500%	  8462.336	        0	conv_dw_17/depthwise
native : stats_calculator.cc:280 	                ConcatV2	  313.943	    5.702	    5.702	  0.056%	  0.556%	  8388.608	        0	concatenate_4/concat
native : stats_calculator.cc:280 	     Conv2DBackpropInput	  302.511	    8.018	    8.018	  0.079%	  0.635%	  5242.880	        0	conv2d_transpose_4/conv2d_transpose
native : stats_calculator.cc:280 	                  Conv2D	   42.193	   11.882	   11.882	  0.117%	  0.751%	  4194.304	        0	conv_pw_1_bn/batchnorm/mul_1
native : stats_calculator.cc:280 	                ConcatV2	  411.013	    2.256	    2.256	  0.022%	  0.774%	  3145.728	        0	concatenate_5/concat
native : stats_calculator.cc:280 	   DepthwiseConv2dNative	   21.453	   12.857	   12.857	  0.126%	  0.900%	  2115.584	        0	conv_dw_1/depthwise
native : stats_calculator.cc:280 	                  Conv2D	    4.266	   12.116	   12.116	  0.119%	  1.019%	  2097.152	        0	conv_0_bn/batchnorm/mul_1
native : stats_calculator.cc:280 	   DepthwiseConv2dNative	  287.924	    6.535	    6.535	  0.064%	  1.083%	  1196.032	        0	conv_dw_16/depthwise
native : stats_calculator.cc:280 	          ResizeBilinear	  456.436	    8.882	    8.882	  0.087%	  1.171%	  1048.576	        0	logits/ResizeBilinear
native : stats_calculator.cc:280 	                  Conv2D	  393.573	   14.375	   14.375	  0.141%	  1.312%	  1048.576	        0	conv_pw_17_bn/batchnorm/mul_1
native : stats_calculator.cc:280 
native : stats_calculator.cc:280 ====================

tensorflow 8bit化、shrink、单纯quantize_weights后的模型

猜你喜欢