Implementation of Deep Learning Model Pruning

    For deep learning, more complex models often have a good recognition effect, but complex models often require relatively high computing power. In some application scenarios that require high real-time performance or relatively small computing power, at this time Complex models often fail to achieve the expected results well. At this time, it is necessary to prune the model to improve the calculation speed of the model. Pruning is to set this parameter to 0 to eliminate the connection between these nodes and the following, thereby reducing the amount of calculation. This article is mainly based on the actual combat of model pruning.

Reference for this article: Model Compression (Pruning, Quantization) of Deep Learning_Deep Learning Model Compression_CV Algorithm Enqiulu's Blog-CSDN Blog

Table of contents

model construction

Necessary function explanation

module.named_parameters()

module.named_buffers()

model.state_dict().keys()

module._forward_pre_hooks

single layer pruning

Continuous single layer pruning

global pruning

custom pruning 

model construction

    This part mainly explains the model used in the following example, which is our famous LeNet model. Of course, other models are also possible, as long as it is a network with a basic structure.

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        # 1: 图像的输入通道(1是黑白图像), 6: 输出通道, 3x3: 卷积核的尺寸
        self.conv1 = nn.Conv2d(1, 6, 3)
        # self.conv1 = nn.Conv2d(2, 3, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5x5 是经历卷积操作后的图片尺寸
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, int(x.nelement() / x.shape[0]))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LeNet().to(device=device)

Necessary function explanation

    First of all, I need to briefly introduce some functions that may appear frequently below. If you don’t explain them, you may be very confused. I searched for them myself but I didn’t find a very intuitive explanation, so I will follow my understanding. You can also jump over it first, and come back to see it when you encounter it later. The explanations are all based on the convolutional layer, and the other layers are similar.

 module.named_parameters()

    In the above model construction, it has been explained that the module is the first convolutional layer of the model (of course, any layer is possible). For the convolutional layer, this function gets the convolution kernel parameters and some information.

    What confused me at the beginning is why the convolution layer of Conv2d(1, 6, 3) has many 3*3 convolution kernels. Assuming that the convolutional layer is Conv2d(1, 6, 3) as shown above, which means that the input is 1 channel, the output is 6 channels, and the convolution kernel size is 3, then the minimum element of the parameter is a 3*3 convolution kernel, because the output 6 channels, then each output channel needs to be convolved with 1 input channel to get a channel output, so there are 6*1 convolution kernels of 3*3; if the input is 2 channels, then each output channel Both need to be convolved with 2 input channels, and each input channel needs a convolution kernel, and then an output is obtained. At this time, there will be 6*2 convolution kernels of 3*3.

module.named_buffers()

    This function represents a masked buffer. Because the subsequent pruning is aimed at the parameters of the convolution kernel, which position parameters need to be marked to be deleted, and the number of this buffer corresponds to the convolution kernel parameters one by one. If this parameter is cut, then This position is marked as 0, otherwise it is 1, and finally multiplied by the parameter matrix, and the cut position parameter becomes 0.

model.state_dict().keys()

    This output is the current state list, which may be some parameters that need to be used. Before pruning, this is a separate parameter. After pruning, it becomes a parameter backup and mask matrix. The specific function is not very good. Understand.

module._forward_pre_hooks

This parameter is a list, which records the algorithm records used by a certain layer of the heap, such as L1 regularization.

The following four pruning methods are introduced

1. Single-layer pruning (for a specific convolutional layer or a certain pruning)

2. Continuous single-layer pruning (single-layer pruning for multiple layers)

3. Global pruning (pruning globally)

4. Custom pruning (custom pruning rules)

single layer pruning

    The first is to prune a specific layer. Use the prune.random_unstructured() function to write the specific layer of the parameter pruning model, such as the convolutional layer. The pruning object is to prune the weight or bias bias, and also There is a pruning ratio, and then the branches will be pruned according to your requirements. Assuming that the pruning is the weight, the buffer is placed in the mask, which marks the parameters of which positions to be pruned, and these positions are 0, otherwise they are 1.

module = model.conv1
print("---修剪前的状态字典")
print(model.state_dict().keys())  # 打印修剪前的状态字典,发现有weight
print("---修剪前的参数")
print(list(module.named_parameters()))
print("---修剪前的缓冲区")
print(list(module.named_buffers()))
prune.random_unstructured(module, name="weight", amount=0.3)  # 对参数修剪
print("*" * 50)
print("---修剪后的状态字典")
print(model.state_dict().keys())  # 打印修剪前的状态字典,发现多出了 orig 和 mask
print("---修剪后的参数")
print(list(module.named_parameters()))  # 实际上还没有变,下面会解释
print("---修剪后的缓冲区")
print(list(module.named_buffers()))  # 这个就是掩码
print("---修剪算法")
print(module._forward_pre_hooks)  # 这里里面存放的每个元素是一个算法

    You can see the difference before and after pruning from the status list, that is, the weight has changed to weight_orig and weight_mask, and the mask actually marks which positions are to be pruned . This data is in the buffer, so it is empty before pruning and pruning Then there is data. weight_orig is a backup, or the original weight , and then the multiplication of the mask and weight_orig is the result of pruning. It can be seen from the parameters after pruning that they are actually the same as before pruning, so how to change the parameters to after pruning requires the use of the remove function . remove is similar to the button for confirming pruning. After execution, it will Delete the mask of the buffer and change the parameters to be deleted to 0. This process is irreversible. After execution, if there is no additional backup, the parameters will be permanently changed. (connected after the above code)

prune.remove(module, 'weight')
print("---执行remove后的参数")
print(list(module.named_parameters()))  # 此时参数变化

Continuous single layer pruning

    Continuous single-layer pruning is actually similar to single-layer pruning. The only difference is that a convolutional layer is pruned above. Now we can use a loop to prune all convolutional layers and fully connected layers. A single loop In fact, it is still a single layer of pruning. 

print(dict(model.named_buffers()).keys())  # 打印缓冲区
print(model.state_dict().keys())  # 打印初始模型的所有状态字典
print(dict(model.named_buffers()).keys())  # 打印初始模型的mask buffers张量字典名称,发现此时为空(因为还没剪枝)
for name, module in model.named_modules():
    # 对模型中所有的卷积层执行l1_unstructured剪枝操作, 选取20%的参数剪枝
    if isinstance(module, torch.nn.Conv2d): # 比较第一个是不是第二个表示的类,这里就是判断是不是卷积层
        prune.l1_unstructured(module, name="weight", amount=0.2)
    # 对模型中所有全连接层执行ln_structured剪枝操作, 选取40%的参数剪枝
    elif isinstance(module, torch.nn.Linear):
        prune.ln_structured(module, name="weight", amount=0.4, n=2, dim=0)

# 打印多参数模块剪枝后的mask buffers张量字典名称
print(dict(model.named_buffers()).keys()) # 打印缓冲区
print(model.state_dict().keys())  # 打印多参数模块剪枝后模型的所有状态字典名称

    It can be found that there are more weight masks for each layer in the buffer, and the weight of the state dictionary has also become weight_orig and mask.

global pruning

The above two are pruning for a specific layer, while the global pruning is for the entire model, and how many parameters are cut out in the entire model, thereby reducing the model. (The code here is basically ported, thanks to the big guy mentioned at the beginning of the article)

model = LeNet().to(device=device)
parameters_to_prune = (
            (model.conv1, 'weight'),
            (model.conv2, 'weight'),
            (model.fc1, 'weight'),
            (model.fc2, 'weight'),
            (model.fc3, 'weight'))
prune.global_unstructured(parameters_to_prune, pruning_method=prune.L1Unstructured, amount=0.2)
# 统计每个层被剪枝的数量百分比(也就是统计等于0的数字占总数的比例)
print(
    "Sparsity in conv1.weight: {:.2f}%".format(
    100. * float(torch.sum(model.conv1.weight == 0))
    / float(model.conv1.weight.nelement())
    ))

print(
    "Sparsity in conv2.weight: {:.2f}%".format(
    100. * float(torch.sum(model.conv2.weight == 0))
    / float(model.conv2.weight.nelement())
    ))

print(
    "Sparsity in fc1.weight: {:.2f}%".format(
    100. * float(torch.sum(model.fc1.weight == 0))
    / float(model.fc1.weight.nelement())
    ))

print(
    "Sparsity in fc2.weight: {:.2f}%".format(
    100. * float(torch.sum(model.fc2.weight == 0))
    / float(model.fc2.weight.nelement())
    ))

print(
    "Sparsity in fc3.weight: {:.2f}%".format(
    100. * float(torch.sum(model.fc3.weight == 0))
    / float(model.fc3.weight.nelement())
    ))

print(
    "Global sparsity: {:.2f}%".format(
    100. * float(torch.sum(model.conv1.weight == 0)
               + torch.sum(model.conv2.weight == 0)
               + torch.sum(model.fc1.weight == 0)
               + torch.sum(model.fc2.weight == 0)
               + torch.sum(model.fc3.weight == 0))
         / float(model.conv1.weight.nelement()
               + model.conv2.weight.nelement()
               + model.fc1.weight.nelement()
               + model.fc2.weight.nelement()
               + model.fc3.weight.nelement())
    ))

    After running, it can be found that the parameters of each layer have been cut to varying degrees. The calculation method is to calculate the proportion of 0 in the mask layer.

custom pruning

    The customization of custom pruning is mainly reflected in the pruning method. For example, if the parameter is close to 0 or relatively small, the contribution may be small. Then pruning can be considered at this time, and it will not have a great impact on the model. The following example adopts the method of alternate pruning, that is, pruning every other. Of course, this can be changed. (Because the reference is written like this) 

class myself_pruning_method(prune.BasePruningMethod):
    PRUNING_TYPE = "unstructured"

    # 内部实现compute_mask函数, 完成程序员自己定义的剪枝规则, 本质上就是如何去mask掉权重参数
    def compute_mask(self, t, default_mask):
        mask = default_mask.clone()
        # 此处定义的规则是每隔一个参数就遮掩掉一个, 最终参与剪枝的参数量的50%被mask掉
        # 当然可以自己定义
        mask.view(-1)[::2] = 0
        return mask

# 自定义剪枝方法的函数, 内部直接调用剪枝类的方法apply
def myself_unstructured_pruning(module, name):
    myself_pruning_method.apply(module, name)
    return module

# 下面开始剪枝
# 实例化模型类
model = LeNet().to(device=device)

start = time.time()  # 计时
# 调用自定义剪枝方法的函数, 对model中的第三个全连接层fc3中的偏置bias执行自定义剪枝
myself_unstructured_pruning(model.fc3, name="bias")

# 剪枝成功的最大标志, 就是拥有了bias_mask参数
print(model.fc3.bias_mask)

# 打印一下自定义剪枝的耗时
duration = time.time() - start
print(duration * 1000, 'ms')

Guess you like

Origin blog.csdn.net/weixin_60360239/article/details/129566196