pytorch model quantification

Model quantification is to reduce the size of the model for calculation on edge devices

First build the network:

import torch
import torch.nn as nn
from torchsummary import summary
device = torch.device("cpu")
class SimpleNet(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleNet, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2)
        self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=3, stride=1, padding=1)
        self.fc = nn.Linear(in_features=16 * 16 * 24, out_features=num_classes)
    def forward(self, input):
        output = self.conv1(input)
        output = nn.ReLU()(output)
        output = self.conv2(output)
        output = nn.ReLU()(output)
        output = self.pool(output)
        output = self.conv3(output)
        output = nn.ReLU()(output)
        output = self.conv4(output)
        output = nn.ReLU()(output)
        output = output.view(-1, 16 * 16 * 24)
        output = self.fc(output)
        return output
model = SimpleNet().to(device=device)
print(model)

result:

SimpleNet(
  (conv1): Conv2d(3, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv3): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv4): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc): Linear(in_features=6144, out_features=10, bias=True)
)

Quantitative operation:

Dynamic quantization used here:

quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.LSTM, nn.Linear}, dtype=torch.qint8
)
print(quantized_model)

result:

SimpleNet(
  (conv1): Conv2d(3, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv3): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv4): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc): DynamicQuantizedLinear(in_features=6144, out_features=10, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
)

Look at how much compressed:

import os
def print_size_of_model(model):
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

print_size_of_model(model)
print_size_of_model(quantized_model)

result:

Size (MB): 0.287049
Size (MB): 0.103451

The compression effect is quite obvious.

Quantization is only an optimization of the weight value of the model. The above is a comparison of the weight value, which looks good

summary(model.cuda(), input_size=(3, 512, 512))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 12, 512, 512]             336
            Conv2d-2         [-1, 12, 512, 512]           1,308
         MaxPool2d-3         [-1, 12, 256, 256]               0
            Conv2d-4         [-1, 24, 256, 256]           2,616
            Conv2d-5         [-1, 24, 256, 256]           5,208
            Linear-6                   [-1, 10]          61,450
================================================================
Total params: 70,918
Trainable params: 70,918
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 3.00
Forward/backward pass size (MB): 78.00
Params size (MB): 0.27
Estimated Total Size (MB): 81.27
----------------------------------------------------------------

As can be seen from the above data, the model weight value (Params size (MB): 0.27 ) only accounts for a part, and it takes up a small proportion in this model

Since the quantified model cannot use summary, it cannot be directly observed

Guess you like

Origin blog.csdn.net/zhou_438/article/details/109079063