AIMET API Documentation (1)


1 AIMET API for PyTorch

1.1 AIMET PyTorch Quantification API

To take full advantage of AIMET's quantification capabilities, users are encouraged to follow several guidelines when defining PyTorch models. AIMET provides an API that can automate some model definition changes and check whether AIMET quantification functions can be applied to PyTorch models.

Before using any AIMET quantification functions, users should first call the model preparer API.

AIMET quantization for PyTorch models provides the following functionality.

  • Quantitative Analyzer API : Analyze the model and indicate sensitive layers for quantification
  • Quantitative Simulation API : Ability to simulate inference and training on quantized hardware
  • Adaptive Rounding API : Post-training quantization technique for optimizing rounding of weight tensors
  • Cross-layer equalization API : post-training quantization technique for equalizing layer parameters
  • Bias Correction API : Post-training quantization technique for correcting layer output bias due to quantization noise
  • AutoQuant API : A unified API integrating post-training quantization technology provided by AIMET
  • BN Re-estimation API : API for re-estimating BN layer statistics and collapsing BN layers API: A unified API integrating post-training quantization technology provided by AIMET

If users want to use Multi-GPU with CLE or QAT, they can refer to:

  • Multi-GPU Guide : A guide to using the PyTorch DataParallel API with AIMET functionality

1.1.1 PyTorch Model Guide

To take full advantage of AIMET capabilities, users are encouraged to follow several guidelines when defining PyTorch models.

Model should support conversion to onnx

The model definition should support conversion to onnx, and users can check the model's compatibility for onnx conversion as follows:

...
model = Model()
torch.onnx.export(model, <dummy_input>, <onnx_file_name>):

Models should be jit traceable

The model definition should be JIT traceable and the user can check the compatibility of the JIT traced model as follows:

...
model = Model()
torch.jit.trace(model, <dummy_input>):

Define layers as modules instead of using torch.nn.function equivalent

When using activation functions and other stateless layers, PyTorch will allow users

  • Define the layer as a module (instantiated in the constructor and used in the forward pass), or
  • Use torch.nn.function equivalent purely in forward pass

For adding simulation nodes to the AIMET quantitative simulation model, AIMET requires the former (layers are defined as modules). Changing the model definition to use modules instead of functions is mathematically equivalent and does not require retraining the model.

For example, if the user has:

def forward(...):
    ...
    x = torch.nn.functional.relu(x)
    ...

Users should define their models as:

def __init__(self,...):
    ...
    self.relu = torch.nn.ReLU()
    ...

def forward(...):
    ...
    x = self.relu(x)
    ...

In some cases this is not possible because operations can only be expressed as functions rather than class definitions, but should be followed wherever possible.

Additionally, users can automate this using the model preparer API

Avoid reusing class definitions in modules

A module defined in a class definition can only be used once. If you reuse any module, define a new identical module in the class definition. For example, if the user has:

def __init__(self,...):
    ...
    self.relu = torch.nn.ReLU()
    ...

def forward(...):
    ...
    x = self.relu(x)
    ...
    x2 = self.relu(x2)
    ...

Users should define their models as:

def __init__(self,...):
    ...
    self.relu = torch.nn.ReLU()
    self.relu2 = torch.nn.ReLU()
    ...

def forward(...):
    ...
    x = self.relu(x)
    ...
    x2 = self.relu2(x2)
    ...

Additionally, users can automate this using the model preparer API

Use only torch.Tensor or tuples of torch.Tensors as model/submodule input and output

Modules should use tensors or tuples of tensors as input and output to support model to onnx conversion. For example, if the user has:

def __init__(self,...):
...
def forward(self, inputs: Dict[str, torch.Tensor]):
    ...
    x = self.conv1(inputs['image_rgb'])
    rgb_output = self.relu1(x)
    ...
    x = self.conv2(inputs['image_bw'])
    bw_output = self.relu2(x)
    ...
    return {
    
     'rgb': rgb_output, 'bw': bw_output }

Users should define their models as:

def __init__(self,...):
...
def forward(self, image_rgb, image_bw):
    ...
    x = self.conv1(image_rgb)
    rgb_output = self.relu1(x)
    ...
    x = self.conv2(image_bw)
    bw_output = self.relu2(x)
    ...
    return rgb_output, bw_output

1.1.2 Schema Checker API

aimet_torch.arch_checker.arch_checker.ArchChecker.check_model_arch(model, dummy_input, result_dir=None)

Check each node in the model using the checks in _node_check_dict. Only nodes and failed tests are logged.

Parameters :

  • model (Module) – The Torch model to be checked.
  • dummy_input (Union[Tensor, Tuple]) – Pass input to the model. Can be a tensor or tuple of tensors

Return arch_checker_report: {op.dotted_name_op: NodeErrorReportObject}
return type :ArchCheckerReport

The AIMET PyTorch Architecture Checker helps detect suboptimal model construction and provides potential options for updating the model to improve performance. The schema checker currently checks for the following conditions:

  • Convolutional layers for optimal channel sizes.
  • Activation function without performance.
  • Batch normalization layers cannot be collapsed.
  • An intermediate convolutional layer in a sequence of convolutional layers with padding.

In this section, we describe models that fail schema checks and show how to run the schema checker.

Example 1: Model with insufficient channels

We start with the following model, which contains convolutional layers with less than 32 channels.

class ModelWithNotEnoughChannels(torch.nn.Module):
    """ Model that prelu module. Expects input of shape (1, 3, 32, 32) """

    def __init__(self):
        super(ModelWithNotEnoughChannels, self).__init__()
        self.conv1 = torch.nn.Conv2d(3, 31, kernel_size=2, stride=2, padding=2, bias=False)
        self.bn1 = torch.nn.BatchNorm2d(31)

    def forward(self, *inputs):
        x = self.conv1(inputs[0])
        x = self.bn1(x)
        return x

Import the schema checker:

从aimet_torch.arch_checker.arch_checker导入ArchChecker

Run the model checker by passing in the model along with the model input:

def example_check_for_number_of_conv_channels():

    model = ModelWithNotEnoughChannels()
    ArchChecker.check_model_arch(model, dummy_input=torch.rand(1, 3, 32, 32))

The convolutional layer in the model is missing one channel and the following logger print will appear:

Utils - INFO - Graph/Node: ModelWithNotEnoughChannels.conv1: Conv2d(3, 31, kernel_size=(2, 2), stride=(2, 2), padding=(2, 2), bias=False) fails check: {
    
    '_check_conv_channel_32_base', '_check_conv_channel_larger_than_32'}

An HTML file containing the following content will be generated.
HTML report content

Image/layer name question recommend
ModelWithNotEnoughChannels.conv1 The input/output tensors of this convolution have channel sizes less than 32 Try adjusting the channels to a multiple of 32 for better performance
ModelWithNotEnoughChannels.conv1 The channel size of the input/output tensors of this convolution is not a multiple of 32 Try adjusting the channels to a multiple of 32 for better performance.

Example 2: Model with non-performance activations

We start with the following model, which contains convolutional layers with less than 32 channels.

class ModelWithPrelu(torch.nn.Module):
    """ Model that prelu module. Expects input of shape (1, 3, 32, 32) """

    def __init__(self):
        super(ModelWithPrelu, self).__init__()
        self.conv1 = torch.nn.Conv2d(32, 32, kernel_size=2, stride=2, padding=2, bias=False)
        self.bn1 = torch.nn.BatchNorm2d(32)
        self.prelu1 = torch.nn.PReLU()

    def forward(self, *inputs):
        x = self.conv1(inputs[0])
        x = self.bn1(x)
        x = self.prelu1(x)
        return x

Run the model checker by passing in the model along with the model input:

def example_check_for_non_performant_activations():

    model = ModelWithPrelu()
    ArchChecker.check_model_arch(model, dummy_input=torch.rand(1, 32, 32, 32))

The PReLU layer in the model is considered to perform poorly compared to ReLU, and the following logger print will occur:

Utils - INFO - Graph/Node: ModelWithPrelu.prelu1: PReLU(num_parameters=1) fails check: {
    
    '_activation_checks'}

Example 3: Model with independent batch normalization layer

We start with the following model, which contains convolutional layers with less than 32 channels.

class ModelWithNonfoldableBN(torch.nn.Module):
    """ Model that has non-foldable batch norm. """

    def __init__(self):
        super(ModelWithNonfoldableBN, self).__init__()
        self.conv1 = torch.nn.Conv2d(32, 32, kernel_size=2, stride=2, padding=2, bias=False)
        self.avg_pool1 = torch.nn.AvgPool2d(3, padding=1)
        self.bn1 = torch.nn.BatchNorm2d(32)

    def forward(self, *inputs):
        x = self.conv1(inputs[0])
        x = self.avg_pool1(x)
        x = self.bn1(x)
        return x

Run the model checker by passing in the model along with the model input:

def example_check_for_standalone_bn():

    model = ModelWithNonfoldableBN()
    ArchChecker.check_model_arch(model, dummy_input=torch.rand(1, 32, 32, 32))

The AveragePool layer prevents the BatchNormalization layer and the Convolution layer from collapsing, and the following logger print will appear:

Utils - INFO - Graph/Node: ModelWithNonfoldableBN.bn1: BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) fails check: {
    
    '_check_batch_norm_fold'}

Guess you like

Origin blog.csdn.net/weixin_38498942/article/details/132545223