The batchnorm2d parameter torch_Pytorch is free to load some model parameters and freeze

Pytorch load method and load_ STATE_ dict method can only read relatively fixed parameter file, they require read STATE_ dict the key and Model.state_ dict () is equal to the corresponding key.

In the process of migration learning, we may only need to use a part of a pre-trained network, combine multiple networks into one network, or separate the Sequential in the pre-training model in order to obtain the output of the intermediate layer, etc., these Case. The traditional load method is not very effective.

For example, if we want to use the first 7 convolutions of Mobilenet and freeze these layers, the latter part is connected to another structure, or rewritten into an FCN structure, the traditional method will not work.

The most universal method is: build a dictionary so that the keys of the dictionary are the same as the network we created. We then fill in the desired parameters from various pre-trained networks to the new keys to have a new state_ dict, so we can load this new state _dict. At present, we can only think of this method to deal with more complex network transformations.

On the Internet, check "load part of the model", "freeze part of the model" generally only change the FC, which is not useful at all. When I was a beginner, I wrote the state_dict and stepped on some pits and sent it to record.


1. Load some pre-training parameters

Let’s take a look at the structure of Mobilenet first

(Source github, with pre-trained model mobilenet_sgd_rmsprop_69.526.tar)

class Net(nn.Module):

def __init__(self):

super(Net, self).__init__()


def conv_bn(inp, oup, stride):

return nn.Sequential(

nn.Conv2d(inp, oup, 3, stride, 1, bias=False),

nn.BatchNorm2d(oup),

nn.ReLU(inplace=True)

)


def conv_dw(inp, oup, stride):

return nn.Sequential(

nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),

nn.BatchNorm2d(inp),

nn.ReLU(inplace=True),


nn.Conv2d(inp, oup, 1, 1, 0, bias=False),

nn.BatchNorm2d(oup),

nn.ReLU(inplace=True),

)


self.model = nn.Sequential(

conv_bn( 3, 32, 2),

conv_dw( 32, 64, 1),

conv_dw( 64, 128, 2),

conv_dw(128, 128, 1),

conv_dw(128, 256, 2),

conv_dw(256, 256, 1),

conv_dw(256, 512, 2),

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 1024, 2),

conv_dw(1024, 1024, 1),

nn.AvgPool2d(7),

)

self.fc = nn.Linear(1024, 1000)


def forward(self, x):

x = self.model(x)

x = x.view(-1, 1024)

x = self.fc(x)

return x

We only need the first 7 layers of convolution, and in order to facilitate the concate operation in the future, we disassemble the Sequential and become the following

class Net(nn.Module):

def __init__(self):

super(Net, self).__init__()


def conv_bn(inp, oup, stride):

return nn.Sequential(

nn.Conv2d(inp, oup, 3, stride, 1, bias=False),

nn.BatchNorm2d(oup),

nn.ReLU(inplace=True)

)


def conv_dw(inp, oup, stride):

return nn.Sequential(

nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),

nn.BatchNorm2d(inp),

nn.ReLU(inplace=True),


nn.Conv2d(inp, oup, 1, 1, 0, bias=False),

nn.BatchNorm2d(oup),

nn.ReLU(inplace=True),

)


self.conv1 = conv_bn( 3, 32, 2)

self.conv2 = conv_dw( 32, 64, 1)

self.conv3 = conv_dw( 64, 128, 2)

self.conv4 = conv_dw(128, 128, 1)

self.conv5 = conv_dw(128, 256, 2)

self.conv6 = conv_dw(256, 256, 1)

self.conv7 = conv_dw(256, 512, 2)


# 原来这些不要了

# 可以自己接后面的结构

'''

self.features = nn.Sequential(

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 512, 1),

conv_dw(512, 1024, 2),

conv_dw(1024, 1024, 1),

nn.AvgPool2d(7),)


self.fc = nn.Linear(1024, 1000)

'''


def forward(self, x):

x1 = self.conv1(x)

x2 = self.conv2(x1)

x3 = self.conv3(x2)

x4 = self.conv4(x3)

x5 = self.conv5(x4)

x6 = self.conv6(x5)

x7 = self.conv7(x6)

#x8 = self.features(x7)

#out = self.fc

return (x1,x2,x3,x4,x4,x6,x7)

Let's create a net with a modified structure and see what is the difference between his state_dict and the state_dict of our pre-training file

net = Net()

#我的电脑没有GPU,他的参数是GPU训练的cudatensor,于是要下面这样转换一下

dict_trained = torch.load("mobilenet_sgd_rmsprop_69.526.tar",map_location=lambda storage, loc: storage)["state_dict"]

dict_new = net.state_dict().copy()


new_list = list (net.state_dict().keys() )

trained_list = list (dict_trained.keys() )

print("new_state_dict size: {} trained state_dict size: {}".format(len(new_list),len(trained_list)) )

print("New state_dict first 10th parameters names")

print(new_list[:10])

print("trained state_dict first 10th parameters names")

print(trained_list[:10])


print(type(dict_new))

print(type(dict_trained))

The output is as follows:

After we cut half, the parameter changed from 137 to 65. The first ten parameters show that the name has changed but the order has not changed. The data type of state_dict is Odict, which can be operated according to the operation method of dict.

new_state_dict size: 65 trained state_dict size: 137
New state_dict first 10th parameters names
['conv1.0.weight', 'conv1.1.weight', 'conv1.1.bias', 'conv1.1.running_mean', 'conv1.1.running_var', 'conv2.0.weight', 'conv2.1.weight', 'conv2.1.bias', 'conv2.1.running_mean', 'conv2.1.running_var']
trained state_dict first 10th parameters names
['module.model.0.0.weight', 'module.model.0.1.weight', 'module.model.0.1.bias', 'module.model.0.1.running_mean', 'module.model.0.1.running_var', 'module.model.1.0.weight', 'module.model.1.1.weight', 'module.model.1.1.bias', 'module.model.1.1.running_mean', 'module.model.1.1.running_var']
<class 'collections.OrderedDict'>
<class 'collections.OrderedDict'>

We see that as long as we build a dictionary so that the keys of the dictionary are the same as the network we created, we can have a new state_dict by filling in the desired parameters from various pre-training networks to the new keys . In this way we can load this new state _dict, which is the most universal method for all network changes.

for i in range(65):

dict_new[ new_list[i] ] = dict_trained[ trained_list[i] ]


net.load_state_dict(dict_new)

There are other situations. For example, we just added some layers at the back without changing the name and structure of the original network layer. You can use the following simple method:

loaded_dict = {k: loaded_dict[k] for k, _ in model.state_dict()}

2. Freeze these layers of parameters

There are many methods, and the freezing method corresponding to the above method is used here

发现之前的冻结有问题,还是建议看一下

https://discuss.pytorch.org/t/how-the-pytorch-freeze-network-in-some-layers-only-the-rest-of-the-training/7088

或者

https://discuss.pytorch.org/t/correct-way-to-freeze-layers/26714

或者

Correspondingly, during training, only the parameters of requirements_grad = True can be updated in the optimizer, so

optimizer = torch.optim.Adam( filter(lambda p: p.requires_grad, net.parameters(),lr) )

 

 

Undertake programming in Matlab, Python and C++, machine learning, computer vision theory implementation and guidance, both undergraduate and master's degree, salted fish trading, professional answers please go to know, please contact QQ number 757160542 for details, if you are the one.

 

 

Guess you like

Origin blog.csdn.net/weixin_36670529/article/details/113903515