How tensorflow high version model is compatible with low version

 

When training with a higher version of the AI ​​engine and exporting the model and converting it to the OM format of the Ascend310 chip, there may be situations where operators are not supported. Now I will teach you how to avoid these operators reasonably.

Take the model trained on TensorFlow-2.x as an example, how to convert it into an OM model available for the lower version of the Ascend310 chip (such as the C32 version). More tips can be inferred through this article and be flexible.

Write in front

Since Frozen Graph has been abandoned by TF-2.x, TF-2.x began to use the keras model, and the export is in the saved_model format or the h5 format. To convert the OM model, you must first get the Frozen Graph model on TensorFlow-1.x

Export Frozen Graph under TF-2.x

Suppose you have a keras model under TF-2.x

model = tf.keras.Model(input_nodes, output_nodes)

Convert into Frozen Graph by the following code 

import tensorflow as tf
from tensorflow import keras
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

# Convert Keras model to ConcreteFunction
full_model = tf.function(lambda x: model(x))
full_model = full_model.get_concrete_function(tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype))

# Get frozen ConcreteFunction
frozen_func = convert_variables_to_constants_v2(full_model)
frozen_graph_def = frozen_func.graph.as_graph_def()

# remove final nodes generated by keras which having so many indepencies inputs.
# this will help model to be opened by netron and to be converted to OM
frozen_sub_graph_def = tf.compat.v1.graph_util.extract_sub_graph(
  frozen_graph_def, dest_nodes=[out_node.name[:-2] for out_node in output_nodes])
  
# Save frozen graph from frozen ConcreteFunction to hard drive
tf.io.write_graph(graph_or_graph_def=frozen_sub_graph_def,
                  logdir="/tmp/frozen_graph/",
                  name="model.pb",
                  as_text=False)

The model exported by keras will add the Identity node at the end, and the entire model is dependent, which will cause the model to fail to transfer to OM, and netron cannot be loaded.

Add the following method in the middle

tf.compat.v1.graph_util.extract_sub_graph(...)

The last Identity node can be effectively removed

1. The operator version is too high

When converting OM, you may encounter errors that this operator does not exist, such as `FusedBatchNormV3`. This is because the lower version may only support `FusedBatchNorm`, and there is no V3 version.

At this time, you only need to edit the Frozen Graph file, simply replace the operator name in the PB model file, and replace `FusedBatchNormV3` with `FusedBatchNorm`. The calculation is the same, it will not affect the accuracy, but only affect the performance. There are also the same type of `AddV2` replaced with `Add`, or other operators. If you can find the corresponding operator of the earlier version, it can be reasonably avoided.

2. The new parameter of the operator is not supported

For example, on the Conv2D operator, the higher version of the TF engine will have the option of explicit_paddings, and it will be written in the Graph. At this time, the OM will be converted and an error will be reported, indicating that the attribute of explicit_paddings cannot be found. At this time, edit the Frozen Graph file. , Remove this attr from the op of Conv2D. Generally speaking, this kind of low version does not have, and the new parameter features of the high version are closed by default for forward compatibility, so removing an attr will not affect the accuracy.

Offer the above two methods, by editing Frozen Graph to circumvent the implementation code

import os
import tempfile
import tensorflow as tf

from google.protobuf import text_format
from tensorflow.core.framework import graph_pb2

TMP_PBTXT = 'tmp_model.pbtxt'
TMP_COMPAT_PBTXT = 'tmp_compat_model.pbtxt'

def merge_line_by_line(fo, input_graph_def):
  entry = 0
  item_lines = []
  for line in fo:
    if not line.strip():
      continue
    item_lines.append(line)
    if line.strip().endswith('{'):
      entry += 1
    elif line.strip().endswith('}'):
      entry -= 1
      if entry == 0:
        text_format.MergeLines(item_lines, input_graph_def)
        del item_lines[:]
        
def parse_input_graph_proto(input_graph, input_binary):
  if not os.path.exists(input_graph):
    raise ValueError('invalid input path')
  input_graph_def = graph_pb2.GraphDef()
  if input_binary:
    with open(input_graph, 'rb') as f:
      input_graph_content = f.read()
    input_graph_def.ParseFromString(input_graph_content)
  else:
    with open(input_graph, 'r') as f:
      merge_line_by_line(f, input_graph_def)
  return input_graph_def
  
def compat_pb(input_pb_path, replace=True):
  tmp_dir = tempfile.mkdtemp()
  tmp_pbtxt_file = os.path.join(tmp_dir, TMP_PBTXT)
  graph_def = parse_input_graph_proto(input_pb_path, input_binary=True)
  tf.train.write_graph(graph_or_graph_def=graph_def, logdir=tmp_dir, name=TMP_PBTXT, as_text=True)
  del graph_def
  new_graph_def_str = ''
  lines_to_cache = []
  num_lines_to_skip = 0
  with open(tmp_pbtxt_file, 'r') as f:
    for line in f:
      if num_lines_to_skip > 0:
        num_lines_to_skip -= 1
        continue
      if 'attr {' in line.strip():
        lines_to_cache.append(line)
        continue
      if line.strip().startswith('key: "explicit_paddings"'):
        del lines_to_cache[:]
        num_lines_to_skip = 5
        continue
      elif line.strip().startswith('key: "U"'):
        del lines_to_cache[:]
        num_lines_to_skip = 4
        continue
      elif line.strip().startswith('key: "half_pixel_centers"'):
        del lines_to_cache[:]
        num_lines_to_skip = 4
        continue
      if lines_to_cache:
        new_graph_def_str += ''.join(lines_to_cache)
        del lines_to_cache[:]
      new_graph_def_str += line
      
  new_graph_def_str = new_graph_def_str.replace('FusedBatchNormV3', 'FusedBatchNorm').replace('AddV2', 'Add')
  tmp_compat_pbtxt_file = os.path.join(tmp_dir, TMP_COMPAT_PBTXT)
  with open(tmp_compat_pbtxt_file, 'w') as f:
    f.write(new_graph_def_str)
    
  del new_graph_def_str
  graph_def_compat = parse_input_graph_proto(tmp_compat_pbtxt_file, input_binary=False)
  input_pb_dir, input_pb_name = os.path.split(input_pb_path)
  output_pb_dir = input_pb_dir
  if replace:
    output_pb_name = input_pb_name
  else:
    output_pb_name = 'compat_' + input_pb_name
    
  tf.train.write_graph(graph_or_graph_def=graph_def_compat, logdir=output_pb_dir, name=output_pb_name, as_text=False)
  
if __name__ == '__main__':
  compat_pb('/tmp/model.pb', replace=False)

Although there are some hard codes here, as an offline tool, it can satisfy the function.

This generation fragment is mainly to delete a set of proto descriptions corresponding to the attribute half_pixel_centers

      elif line.strip().startswith('key: "half_pixel_centers"'):
        del lines_to_cache[:]
        num_lines_to_skip = 4

If you understand this script, you can deal with low version compatibility and attribute deletion of various operators.

At present, this script can handle many situations where TF-1.15 is backward compatible to Ascend310-C32.

3. The operator itself does not support

For example, leaky_relu cannot be implemented by an operator on the Ascend310-C32 version, so at this time, it can only be replaced by patchwork with other operators. At this time, it cannot be solved by compiling the Frozen Graph file (too complicated), and it is recommended to modify it directly from the source code. Such as

y =tf.nn.leaky_relu(x, alpha=alpha)

Replace with

tf.maximum(alpha * x, x)

For example, if the mish activation function cannot be found, you can replace it with the following operator

y = x * tf.tanh(tf.math.log(1 + tf.exp(x)))

Four, circumvent through pre/post processing

If your operator cannot be supported using the above methods, and this operator appears at the head or tail of the model, congratulations, you still have hope.

When you export the model, you can take the calculation involved in the operator out of the model and put it in the pre-processing and post-processing of the inference script. Take pseudo code as an example

Suppose your model is:

def model(x):
    y = op1(x)
    y = op2(y)
    y = op3(y)

Assuming that op1 and op3 are not supported, and they are some calculations that do not contain network weights, then when you export the model, only the op2 part is exported, and op1 is written in the preprocessing with the numpy API, and op3 is written in the numpy API. Post-processing

E.g:

In sound classification, the data must be Fourier transformed at the top of the model, but the Fourier transform operator is not supported on Ascend310-C32, then the Fourier transform is removed from the beginning of the model when the model is derived , And then implemented with numpy, written in the pre-processing of the inference script

In object detection, at the end of the model, the result must be NMS, which involves dynamic shape, which is not supported on Ascend310-C32, then the exported model is directly removed for post-processing, the model directly outputs feature_map, and then the post-processing of the inference script is done as NMS ( numpy's NMS is also fast, don’t worry about performance)

 

 

from tensorflow.python.compat import compat

with compat.forward_compatibility_horizon(2019, 05, 01):
    y = model(x)

 

Guess you like

Origin blog.csdn.net/yxpandjay/article/details/108780776