Model quantification is to reduce the size of the model for calculation on edge devices
First build the network:
import torch
import torch.nn as nn
from torchsummary import summary
device = torch.device("cpu")
class SimpleNet(nn.Module):
def __init__(self, num_classes=10):
super(SimpleNet, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2)
self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)
self.conv4 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=3, stride=1, padding=1)
self.fc = nn.Linear(in_features=16 * 16 * 24, out_features=num_classes)
def forward(self, input):
output = self.conv1(input)
output = nn.ReLU()(output)
output = self.conv2(output)
output = nn.ReLU()(output)
output = self.pool(output)
output = self.conv3(output)
output = nn.ReLU()(output)
output = self.conv4(output)
output = nn.ReLU()(output)
output = output.view(-1, 16 * 16 * 24)
output = self.fc(output)
return output
model = SimpleNet().to(device=device)
print(model)
result:
SimpleNet(
(conv1): Conv2d(3, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv4): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc): Linear(in_features=6144, out_features=10, bias=True)
)
Quantitative operation:
Dynamic quantization used here:
quantized_model = torch.quantization.quantize_dynamic(
model, {nn.LSTM, nn.Linear}, dtype=torch.qint8
)
print(quantized_model)
result:
SimpleNet(
(conv1): Conv2d(3, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv4): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc): DynamicQuantizedLinear(in_features=6144, out_features=10, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
)
Look at how much compressed:
import os
def print_size_of_model(model):
torch.save(model.state_dict(), "temp.p")
print('Size (MB):', os.path.getsize("temp.p")/1e6)
os.remove('temp.p')
print_size_of_model(model)
print_size_of_model(quantized_model)
result:
Size (MB): 0.287049
Size (MB): 0.103451
The compression effect is quite obvious.
Quantization is only an optimization of the weight value of the model. The above is a comparison of the weight value, which looks good
summary(model.cuda(), input_size=(3, 512, 512))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 12, 512, 512] 336
Conv2d-2 [-1, 12, 512, 512] 1,308
MaxPool2d-3 [-1, 12, 256, 256] 0
Conv2d-4 [-1, 24, 256, 256] 2,616
Conv2d-5 [-1, 24, 256, 256] 5,208
Linear-6 [-1, 10] 61,450
================================================================
Total params: 70,918
Trainable params: 70,918
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 3.00
Forward/backward pass size (MB): 78.00
Params size (MB): 0.27
Estimated Total Size (MB): 81.27
----------------------------------------------------------------
As can be seen from the above data, the model weight value (Params size (MB): 0.27 ) only accounts for a part, and it takes up a small proportion in this model
Since the quantified model cannot use summary, it cannot be directly observed