[Hands-on deep learning Pycharm implementation 9] Fine-tuning: hot dog recognition through pre-training

foreword

The link to the detailed handouts is here: d2l official website about fine-tuning
In short, fine-tuning is to take the model and parameters trained on the source data set (usually a relatively large data set), and perform the training on the target data set. Retraining, where the last layer needs to be adjusted according to the number of categories on the target dataset, because the number of categories in the source dataset and the target dataset is different.


1. Code implementation

The environment is as follows:

python version: 3.8.6
torch version: 1.11.0
d2l version: 0.17.5

The code is as follows (with comments):

import os
import torch
import torchvision
from torch import nn
from d2l import torch as d2l


# 数据集包含热狗的“正类”图像,以及包含尽可能多的其他食物的“负类”图像
#@save
d2l.DATA_HUB['hotdog'] = (d2l.DATA_URL + 'hotdog.zip',
                         'fba480ffa8aa7e0febbb511d181409f899b9baa5')

data_dir = d2l.download_extract('hotdog')

train_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train'))   # 训练集
test_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'test'))     # 测试集


# 前8个正类样本图片和最后8张负类样本图片
hotdogs = [train_imgs[i][0] for i in range(8)]  # 前8
not_hotdogs = [train_imgs[-i - 1][0] for i in range(8)]  # 后8
d2l.show_images(hotdogs + not_hotdogs, 2, 8, scale=1.4)
d2l.plt.show()


# 使用RGB通道的均值和标准差,以标准化每个通道
normalize = torchvision.transforms.Normalize(
    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # 分别是RGB三通道的均值和方差

train_augs = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(224),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    normalize])

test_augs = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    normalize])

# 定义和初始化模型
pretrained_net = torchvision.models.resnet18(pretrained=True)  # 使用在ImageNet上预训练的ResNet-18作为源模型
finetune_net = torchvision.models.resnet18(pretrained=True)


# 修改源模型ResNet-18最后一层的全连接层,因为ImageNet有1000类,我们这里只有热狗和非热狗2类
finetune_net.fc = nn.Linear(finetune_net.fc.in_features, 2)
nn.init.xavier_uniform_(finetune_net.fc.weight)  # 只需对最后一层参数做xavier初始化


# 定义训练函数,如果param_group=True,输出层中的模型参数将使用十倍的学习率
def train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5,
                      param_group=True):
    train_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'train'), transform=train_augs),
        batch_size=batch_size, shuffle=True)
    test_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'test'), transform=test_augs),
        batch_size=batch_size)
    devices = d2l.try_all_gpus()
    loss = nn.CrossEntropyLoss(reduction="none")
    if param_group:
        params_1x = [param for name, param in net.named_parameters()
             if name not in ["fc.weight", "fc.bias"]]
        trainer = torch.optim.SGD([{
    
    'params': params_1x},
                                   {
    
    'params': net.fc.parameters(),
                                    'lr': learning_rate * 10}],
                                lr=learning_rate, weight_decay=0.001)
    else:
        trainer = torch.optim.SGD(net.parameters(), lr=learning_rate,
                                  weight_decay=0.001)
    d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,
                   devices)

# 最终训练
train_fine_tuning(finetune_net, 5e-5)

2. Results

insert image description here
It can be seen that the accuracy of the results is still quite high.

Guess you like

Origin blog.csdn.net/weixin_45887062/article/details/126137971