Application of Deep Learning Skills 9-Learning Rate Adjustment and Fake Data Generation Skills and Summary in Model Training

Hello everyone, I am Weixue AI. Today, I will introduce to you the application of deep learning techniques 9-the adjustment of learning rate in model training and the skills and summary of fake data generation. When we train the model, we want to test whether the model is feasible, but At present, a large amount of data has not been labeled. In the absence of data, today I will teach you how to generate fake data (test data) for model debugging, and teach you how to adjust the learning rate to improve the performance of the model and speed up the effect of convergence.

1. Model training skills under the PyTorch framework

In the process of PyTorch model training, there are many techniques that can improve the training effect and training speed. Here are some common techniques:
1. Learning rate tuning
2. Weight decay
3. Using pre-trained models 
4. Data augmentation 5.
Early stopping
6. Using different optimizers
7. Layer normalization
8. Using deeper models
9. Using model integration

will be introduced later, and one of the techniques will be introduced below: learning rate adjustment.

2. Learning rate adjustment skills

Learning rate adjustment is the process of optimizing model training by dynamically adjusting the learning rate during training. In the early stage of training, the learning rate can be set higher to quickly converge; when the training is close to convergence, the learning rate can be reduced to reduce the oscillation of parameters and make the model more stable. PyTorch provides some ready-made learning rate adjustment strategies, such as StepLR, , ExponentialLRand ReduceLROnPlateauetc.

In the training process, the step size will be smaller when the extreme point is reached. In order to prevent jumping over the extreme point, but the learning rate is too small at the beginning and the convergence will be too slow, so here you can use StepLR learning rate adjustment, StepLR Learning rate adjustment principle: Decay the learning rate according to a fixed step size. In fact, after a certain number of steps (specified by step_size), the learning rate will be multiplied by a coefficient (specified by gamma). For example, if the initial learning rate is 0.1, step_size is 30, and gamma is 0.1, then after every 30 steps, the learning rate will be multiplied by 0.1, which becomes one-tenth of the previous value, until the minimum learning rate is reached. Using StepLR can make the model use a larger learning rate in the early stage of training, which is conducive to rapid convergence; as the training progresses, the learning rate gradually becomes smaller, which helps to optimize the model parameters in detail and improve the generalization performance of the model.

import torch
import torch.optim as optim
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
import torch.utils.data as data

# 使用预训练模型
model = models.resnet18(pretrained=True)
# 修改最后全连接层的输出
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)
# GPU支持
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

# 定义损失函数及优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# 使用StepLR调整学习率
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

# 加载数据集
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

3. FakeData generates fake data

FakeData can be used to generate various types of fake datasets, including image data. FakeData has some built-in functions that can quickly generate fake image data that meets our needs. For example, we create an fake_dataobject called , which contains 20 samples, 2 categories, and each image has a size of 3x224x224, where 3 means that the image has 3 color channels (RGB), and 224x224 is the spatial size of the image. At the same time, we use transforms.ToTensor()to convert the generated images into tensor form to facilitate the training of deep learning models. Next, use data.DataLoaderthe function to fake_dataload the data in the trainer. DataLoaderFunctions allow us to bulk load data on demand and support multiple data loading threads. We will include 4 samples in each batch, and use parameters shuffle=Trueto shuffle the order of the data set to improve the training effect.

fake_data = datasets.FakeData(size=20, num_classes=2, image_size=(3, 224, 224), transform=transforms.ToTensor())
train_loader = data.DataLoader(fake_data, batch_size=4, shuffle=True)

4. Real data example

If we have picture data, we can set the corresponding folder and place pictures of the corresponding category under the folder of the corresponding category, and we can use the ImageFolderloading data set. ImageFolderRequires a folder structure where each subfolder contains a category of images. This is a very common dataset structure for training classification models.

train_dataset = datasets.ImageFolder('train_data', transform=transform)
train_loader = data.DataLoader(train_dataset, batch_size=64, shuffle=True)

Suppose we have a simple dataset with two classes: cats and dogs. The folder structure is as follows:
train_data/
cat/
cat1.jpg
cat2.jpg

dog/
dog1.jpg
dog2.jpg

5. Model Training

# 训练模型
num_epochs = 20
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()

        optimizer.step()
        running_loss += loss.item()

    scheduler.step()
    print(f"Epoch {epoch + 1}, Loss: {running_loss / len(train_loader)}")

This training process of mine uses a pre-trained ResNet18 model for fine-tuning. We use SGDan optimizer and define StepLRa scheduling policy. Every 7 training cycles, the learning rate is multiplied by a coefficient (0.1) for decay. Train with a simple loop and call at the end of each training epoch scheduler.step()to update the learning rate. During training, the learning rate will be dynamically adjusted according to a predetermined learning rate scheduling policy.

I believe that everyone has mastered the skills of learning rate debugging.

Guess you like

Origin blog.csdn.net/weixin_42878111/article/details/130363364