PyTorch study notes (7) advanced training skills

directory

custom loss function

defined as a function

defined as a class

Dynamically adjust the learning rate

Use the official scheduler

custom scheduler

Model fine-tuning-torchvision

The process of model fine-tuning

Use an existing model structure

train a specific layer

Model fine-tuning -timm

Use and modify pretrained models

save the model

half-precision training

Use argparse to adjust parameters

Introduction to argparse

Use of argparse

More efficient use of argparse to modify hyperparameters

Summarize


custom loss function

defined as a function

def my_loss(output, target):
    loss = torch.mean((output - target)**2)
    return loss

defined as a class

Dice Loss is a common loss function in the field of segmentation, defined as follows:

DSC = \frac{2\left | X\cap Y \right |}{\left | X \right |+\left | Y \right |}

class DiceLoss(nn.Module):
    def __init__(self,weight=None,size_average=True):
        super(DiceLoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = F.sigmoid(inputs)
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()
        dice = (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)
        return 1 - dice

criterion = DiceLoss()
loss = criterion(input, targets)

For more loss functions, refer to  Loss Function Library - Keras & PyTorch | Kaggle

Note: When customizing the loss function, when it comes to mathematical operations, it is best to use the tensor calculation interface provided by Pytorch, so that there is no need to implement the automatic derivation function and directly call cuda.

Dynamically adjust the learning rate

The choice of learning rate is a problem that has plagued people for a long time in deep learning. If the learning rate is set too small, it will greatly reduce the convergence speed and increase the training time; if the learning rate is too large, it may cause the parameters to oscillate back and forth on both sides of the optimal solution. After selecting an appropriate learning rate, after many rounds of training, the accuracy rate may fluctuate or the loss will no longer decrease, indicating that the current learning rate cannot meet the needs of model tuning. This phenomenon can be improved by an appropriate learning rate decay strategy to improve accuracy. This setting method is scheduler.

Use the official scheduler

  • Learn about the official API

In the process of training a neural network, the learning rate is one of the most important hyperparameters. As a currently popular deep learning framework, PyTorch has encapsulated some methods for dynamically adjusting the learning rate in torch.optim.lr_scheduler for us to use.

  1. lr_scheduler.LambdaLR LambdaLR — PyTorch 1.12 documentation
  2. lr_scheduler.MultiplicativeLR MultiplicativeLR — PyTorch 1.12 documentation
  3. lr_scheduler.StepLR StepLR — PyTorch 1.12 documentation
  4. lr_scheduler.MultiStepLR MultiStepLR — PyTorch 1.12 documentation
  5. lr_scheduler.ExponentialLR ExponentialLR — PyTorch 1.12 documentation
  6. lr_scheduler.CosineAnnealingLR CosineAnnealingLR — PyTorch 1.12 documentation
  7. lr_scheduler.ReduceLROnPlateau ReduceLROnPlateau — PyTorch 1.12 documentation
  8. lr_scheduler.CyclicLR CyclicLR — PyTorch 1.12 documentation
  9. lr_scheduler.OneCycleLR OneCycleLR — PyTorch 1.12 documentation
  10. lr_scheduler.CosineAnnealingWarmRestartsCosineAnnealingWarmRestarts — PyTorch 1.12 documentation
  • Use the official API
#选择一种优化器
optimizer = torch.optim.Adam(...)
#选择一种或多种动态调整学习率的方法
scheduler1 = torch.optim.lr_scheduler....
scheduler2 = torch.optim.lr_scheduler....
...
schedulern = torch.optim.lr_scheduler....
#进行训练
for epoch in range(100):
    train(...)
    validate(...)
    optimizer.step(...)
    #需要在优化器参数更新之后再动态调整学习率
        scheduler1.step()
        ...
    schedulern.step()

Note: When using the official torch.optim.lr_scheduler, scheduler.step() needs to be used after optimizer.step().

custom scheduler

Our method is to customize the function adjust_learning_rate to change the value of lr in param_group.

Simple example: The learning rate needs to be reduced to 1/10 of the original every 30 rounds, and a custom function is needed to realize the change of the learning rate.

def adjust_learning_rate(optimizer, epoch):
    lr = args.lr*(0.1**(epoch//30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

optimizer = torch.optim.SGD(model.parameters(),lr=args.lr, momentum=0.9)
for epoch in range(10):
    train(...)
    validate(...)
    adjust_learning_rate(optimizer, epoch)

Model fine-tuning-torchvision

The process of model fine-tuning

  1. Pre-train a neural network model on the source dataset, namely the source model;
  2. Create a new neural network model, the target model;
  3. Add an output layer whose output size is the number of categories of the target data set to the target model, and initialize the model parameters of this layer randomly;
  4. Train the target model on the target dataset. The output layer will be trained from scratch, while the parameters of the remaining layers are fine-tuned based on the parameters of the source model.

Use an existing model structure

  • instantiated network
import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
densenet = models.densenet161()
inception = models.inception_v3()
googlenet = models.googlenet()
shufflenet = models.shufflenet_v2_x1_0()
mobilenet_v2 = models.mobilenet_v2()
mobilenet_v3_large = models.mobilenet_v3_large()
mobilenet_v3_small = models.mobilenet_v3_small()
resnext50_32x4d = models.resnext50_32x4d()
wide_resnet50_2 = models.wide_resnet50_2()
mnasnet = models.mnasnet1_0()
  • Pass pretrained parameters

Use True/False to decide whether to use pre-trained weights. In the default state, pretrained=False means that the weights obtained from pre-training are not used. When pretrained=True, it means that the weights obtained from pre-training on some data sets will be used. .

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)
shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet_v2 = models.mobilenet_v2(pretrained=True)
mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)
mnasnet = models.mnasnet1_0(pretrained=True)

Note:

  1. Models have the extension .pt or .pth.
  2. You can set the weight download URL by using torch.utils.model_zoo.load_url().
  3. You can download the weights and put them in the same folder, and then load the parameters into the network.
  4. If you forcibly stop the download midway, you must go to the corresponding path to delete the weight file, or you may report an error.
self.model = models.resnet50(pretrained=False)
self.model.load_state_dict(torch.load('./model/resnet50-19c8e357.pth'))

train a specific layer

By default, the parameter attribute .requires_grad=True, you don't need to pay attention to these if training or fine-tuning from scratch. But if you are extracting features and only want to calculate the gradient for the newly initialized layer without changing other parameters, you need to freeze some layers by setting requires_grad=False.

def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

Taking resnet18 as an example, change the 1000 classes to 4 classes, but only change the model parameters of the last layer, without changing the model parameters of feature extraction; pay attention to freezing the gradient of the model parameters first, and then modify the fully connected layer of the model output part, In this way, the parameters of the modified fully connected layer can calculate the gradient.

import torchvision.models as models
#冻结参数的梯度
feature_extract = True
model = models.resnet18(pretrained=True)
set_parameter_requires_grad(model, feature_extract)
#修改模型
num_ftrs = model.fc.in_features
model.fc = nn.Linear(in_features=num_ftrs, out_features=4, bias=True)

Afterwards, during the training process, the model will still perform gradient return, but the parameter update will only occur in the fc layer. By setting the requires_grad attribute of the parameter, the goal of specifying a specific layer of the training model is completed, which is very important for fine-tuning the model.

​​​​​​​​Model fine-tuning-timm

github链接:GitHub - rwightman/pytorch-image-models: PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Official website link:

https://fastai.github.io/timmdocs/

https://rwightman.github.io/pytorch-image-models/

Timm installation: pip install timm

How to check the type of pre-trained model

  • Check out the pre-trained models provided by timm

The number of pre-training models provided by timm has reached 592. We can timm.list_models()check the pre-training models provided by timm through the method.

import timm
avail_pretrained_models = timm.list_models(pretrained=True)
len(avail_pretrained_models)
  • View all varieties for a particular model

Each series may correspond to models of different schemes. For example, the ResNet series includes ResNet18, etc., and you can pass in the model features you want to query (fuzzy query) in timm.list_models().

all_densnet_models = timm.list_models("*densenet*")
all_densnet_models

All models of all densenet series are returned as a list.

['densenet121',
 'densenet121d',
 'densenet161',
 'densenet169',
 'densenet201',
 'densenet264',
 'densenet264d_iabn',
 'densenetblur121d',
 'tv_densenet121']
  • View the specific parameters of the model

When we want to view the specific parameters of the model, we can view it by accessing the default_cfg attribute of the model. The specific operation is as follows

model = timm.create_model('resnet34',num_classes=10, pretrained=True)
model.default_cfg
{'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/resnet34-43635321.pth',
 'num_classes': 1000,
 'input_size': (3, 224, 224),
 'pool_size': (7, 7),
 'crop_pct': 0.875,
 'interpolation': 'bilinear',
 'mean': (0.485, 0.456, 0.406),
 'std': (0.229, 0.224, 0.225),
 'first_conv': 'conv1',
 'classifier': 'fc',
 'architecture': 'resnet34'}

Use and modify pretrained models

After getting the pre-trained model, you can use the method of timm.create_model() to create the model, and you can use the pre-trained model by passing in the parameter pretrained=True. Similarly, you can also use the same method as the model in torchvision to view the parameters and types of the model.

import timm, torch

model = timm.create_model('resnet34',pretrained=True)
x = torch.randn(1,3,224,224)
output = model(x)
output.shape
torch.Size([1, 1000])
  • Check the model parameters of a certain layer (take the first layer of convolution as an example)
model = timm.create_model('resnet34', pretrained=True)
list(dict(model.named_children())['conv1'].parameters())
  • Modify the model (change the 1000 class to 10 class output)
model = timm.create_model('resnet34', num_classes=10, pretrained=True)
x = torch.randn(1, 3, 224, 224)
output = model(x)
output.shape
torch.Size([1, 10])
  • To change the number of input channels (for example, the incoming image is a single-channel image, but the model requires a three-channel image) we can change it by adding in_chans=1.
model = timm.create_model('resnet34', num_classes=10, pretrained=True, in_chans=1)
x = torch.randn(1, 1, 224, 224)
output = model(x)

save the model

The model created by the timm library is torch.modela subclass of , which can directly use the built-in model parameter saving and loading methods in the torch library.

torch.save(model.state_dict(),'./checkpoint/timm_model.pth')
model.load_state_dict(torch.load('./checkpoint/timm_model.pth'))

half-precision training

The performance of the GPU is mainly divided into two parts: computing power and video memory. The former determines the computing speed of the graphics card, and the latter determines how much data the graphics card can put in for calculation at the same time. In the case of a certain amount of video memory that can be used, more data can be loaded for each training (batch_size is larger), and the training efficiency can be improved. Therefore, it is very important to use video memory reasonably.

Observe that PyTorch's default floating-point storage method is torch.float32. In general, it does not need to be so accurate, and only half of the information is retained, which will not affect the result, that is, torch.float16 format. Since the number of digits is halved, it is called "half precision".

Obviously, half-precision can reduce the video memory usage, so that the graphics card can load more data for calculation at the same time.

Settings for half-precision training

Use autocast to configure half-precision training in PyTorch, and you need to set it in the following three places:

  • import autocast
from torch.cuda.amp import autocast
  •  model settings

In the model definition, use python's decorator method to decorate the forward function in the model with autocast.

@autocast()
def forward(self, x):
    ...
    return x
  • training process

During training, just put "with autocast():" on and after feeding the data into the model:

for x in train_loader:
    x = x.cuda()
    with autocast():
    output = model(x)
    ...

Note: Half-precision training is mainly suitable for the size of the data itself is relatively large; when the size of the data itself is not large, using half-precision training may not bring significant improvement.

Use argparse to adjust parameters

Introduction to argparse

argparse is a standard module for python's command line parsing. It is built into python and does not need to be installed. This library can pass parameters to the program directly on the command line. The role of argparse is to parse, save and use other parameters passed in from the command line. After using argparse, the parameters entered on the command line can be set with python file.py --lr 1e-4 --batch_size 32 to complete the common hyperparameters.

Use of argparse

In general, the use of argparse can be summarized into three steps:

  • Create an ArgumentParser() object
  • Call the add_argument() method to add parameters
  • Use parse_args() to parse arguments
import argparse

#创建ArgumentParser()对象
parser = argparse.ArgumentParser()

#添加参数
parser.add_argument('-o','--output',action='store_true',help='shows output')
#action='store_true'会将output参数记录为True
#type规定了参数的格式
#default规定了默认值
parser.add_argument('--lr', type=float, default=3e-5, help='select the learning rate, default=1e-3')
parser.add_argument('--batch_size', type=int, required=True, help='input batch size')
#使用parse_args()解析函数
args = parser.parse_args()

if args.output:
    print('This is some output')
    print(f'learning rate:{args.lr}')

Use python demo.py --lr 3e-4 --batch_size 32 on the command line, you can see the following output

This is some output
learning rate: 3e-4

The parameters of argparse can be mainly divided into optional parameters and required parameters. Optional parameters are set to default values ​​if not entered. The required parameter is similar to the batch_size parameter. When required=True is set for the parameter, this parameter must be passed in, otherwise an error will be reported.

#positional.py
import argparse

#位置参数
parser = argparse.ArgumentParser()

parser.add_argument('name')
parser.add_argument('age')

args = parser.parse_args()
print(f'{args.name} is {args.age} years old')

When -- is not used, it will be parsed strictly according to the parameter position.

$ positional_arg.py Peter 23
Peter is 23 years old

In general, argparse is very simple to use, and the above operations can help modify parameters.

More efficient use of argparse to modify hyperparameters

Usually, in order to make the code more concise and modular, the operations related to hyperparameters are generally written in config.py, and then imported in train.py or other files. The specific config.py can refer to the following content.

import argparse

def get_options(parser=argparse.ArgumentParser()):
    parser.add_argument('--workers',type=int, default=0, help='number of data loading workers, you had better put it 4 times of your gpu')
    parser.add_argument('--batch_size', type=int, default=4, help='input batch size, default=64')
    parser.add_argument('--niter', type=int, default=10, help='number of epochs to train for, default=10')
    parser.add_argument('--lr', type=float, default=3e-5, help='select the learning rate, default=1e-3')
    parser.add_argument('--seed', type=int, default=118, help='random seed')
    parser.add_argument('--cuda', action='store_true', default=True, help='enables cuda')
    parser.add_argument('--checkpoint_path', type=str, default='', help='path to load a previous trained model if not empty (default empty)')
    parser.add_argument('--output', action='store_true', default=True, help='shows output')
    
    opt = parser.parse_args()
    
    if opt.output:
        print(f'num_workers:{opt.workers}')
        print(f'batch_size:{opt.batch_size}')
        print(f'epochs(niters):{opt.niter}')
        print(f'learning rate:{opt.lr}')
        print(f'manual_seed:{opt.seed}')
        print(f'cuds enable:{opt.cuda}')
        print(f'checkpoint_path:{opt.checkpoint_path}')

    return opt

if __name__=='__main__':
    opt = get_options()
$ python config.py

num_workers: 0
batch_size: 4
epochs (niters) : 10
learning rate : 3e-05
manual_seed: 118
cuda enable: True
checkpoint_path:

Then in other files such as train.py, you can use the following structure to call parameters.

#导入必要库
import config

opt = config.get_options()

manual_seed = opt.seed
num_workers = opt.workers
batch_size = opt.batch_size
lr = opt.lr
niters = opt.niters
checkpoint_path = opt.checkpoint_path

#随机数的设置,保证复现结果
def set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

...

if __name__=='__main__':
    set_seed(manual_seed)
    for epoch in range(niters):
        train(model, lr, batch_size, num_workers, checkpoint_path)
        val(model, lr, batch_size, num_workers, checkpoint_path)

Summarize

argparse provides a new and more convenient way, which will be combined with other Python standard libraries (pickle, json, logging) to save parameters and record model output.

Reference link: Chapter 6: PyTorch Advanced Training Skills - PyTorch in a nutshell

Guess you like

Origin blog.csdn.net/zhangmeizi1996/article/details/126387480