FINETUNING TORCHVISION MODELS

在本教程中，我们将深入探讨如何微调和特征提取torchvision模型，所有这些都已经在1000级Imagenet数据集上预先训练。本教程将深入介绍如何使用几个现代CNN架构，并将为微调任何PyTorch模型建立直觉。由于每个模型架构都不同，因此没有样板微调代码可以在所有场景中使用。相反，研究人员必须查看现有架构并对每个模型进行自定义调整。

在本文档中，我们将执行两种类型的迁移学习：微调和特征提取。在微调中，我们从一个预训练模型开始，并为我们的新任务更新所有模型的参数，实质上是重新训练整个模型。在特征提取中，我们从预训练模型开始，仅更新从中导出预测的最终图层权重。它被称为特征提取，因为我们使用预训练的CNN作为固定的特征提取器，并且仅改变输出层。有关迁移学习的更多技术信息，请参阅此处和此处。

通常，两种迁移学习方法都遵循相同的几个步骤：

初始化预训练模型
重塑最终图层，使其具有与新数据集中的类数相同的输出数
为优化算法定义我们想要在训练期间更新哪些参数
运行训练步骤

from __future__ import print_function
from __future__ import division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

Out:

PyTorch Version:  1.0.0.dev20190318
Torchvision Version:  0.2.3

Inputs

以下是为运行更改的所有参数。我们将使用hymenoptera_data数据集，可在此处下载。该数据集包含两个类，蜜蜂和蚂蚁，其结构使得我们可以使用ImageFolder数据集，而不是编写我们自己的自定义数据集。下载数据并将data_dir输入设置为数据集的根目录。 model_name输入是您要使用的模型的名称，必须从此列表中选择：

[resnet, alexnet, vgg, squeezenet, densenet, inception]

其他输入如下：num_classes是数据集中的类数，batch_size是用于训练的批量大小，可以根据机器的能力进行调整，num_epochs是我们要运行的训练时期的数量，以及 feature_extract是一个布尔值，用于定义我们是微调还是特征提取。如果feature_extract = False，则会对模型进行微调，并更新所有模型参数。如果feature_extract = True，则仅更新最后一个图层参数，其他参数保持固定。

# Top level data directory. Here we assume the format of the directory conforms
#   to the ImageFolder structure
data_dir = "./data/hymenoptera_data"

# Models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "squeezenet"

# Number of classes in the dataset
num_classes = 2

# Batch size for training (change depending on how much memory you have)
batch_size = 8

# Number of epochs to train for
num_epochs = 15

# Flag for feature extracting. When False, we finetune the whole model,
#   when True we only update the reshaped layer params
feature_extract = True

Helper Functions

在编写用于调整模型的代码之前，我们先定义一些辅助函数。

Model Training and Validation Code

train_model函数处理给定模型的训练和验证。作为输入，它需要PyTorch模型，数据加载器字典，损失函数，优化器，训练和验证的指定数量的时期，以及模型是初始模型时的布尔标志。is_inception标志用于容纳Inception v3模型，因为该体系结构使用辅助输出，并且整体模型丢失同时涉及辅助输出和最终输出，如此处所述。功能训练指定数量的时期并且在每个时期之后运行完整的验证步骤。它还跟踪最佳性能模型（在验证准确性方面），并在训练结束时返回性能最佳的模型。在每个时期之后，打印训练和验证准确度。

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

Set Model Parameters’ .requires_grad attribute

当我们进行特征提取时，此辅助函数将模型中参数的.requires_grad属性设置为False。默认情况下，当我们加载一个预训练模型时，所有参数都有.requires_grad = True，如果我们从头开始训练或微调，这很好。但是，如果我们是特征提取并且只想为新初始化的层计算梯度，那么我们希望所有其他参数不需要梯度。这将在以后更有意义。

def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

Initialize and Reshape the Networks

现在到最有趣的部分。这是我们处理每个网络重塑的地方。请注意，这不是一个自动过程，并且对每个模型都是唯一的。回想一下，CNN模型的最后一层（通常是FC层）与数据集中的输出类数量具有相同的节点数。由于所有模型都已在Imagenet上预先训练，因此它们都具有大小为1000的输出层，每个类都有一个节点。这里的目标是将最后一层重新整形为具有与之前相同数量的输入，并且具有与数据集中的类数相同的输出数。在以下部分中，我们将讨论如何单独更改每个模型的体系结构。但首先，关于微调和特征提取之间的区别，有一个重要的细节。

当特征提取时，我们只想更新最后一层的参数，换句话说，我们只想更新我们正在重塑的层的参数。因此，我们不需要计算我们没有改变的参数的梯度，因此为了提高效率，我们将.requires_grad属性设置为False。这很重要，因为默认情况下，此属性设置为True。然后，当我们初始化新图层时，默认情况下新参数的.requires_grad = True，因此只更新新图层的参数。当我们进行微调时，我们可以将所有.required_grad的设置保留为默认值True。

最后，请注意inception_v3要求输入大小为（299,299），而所有其他模型都需要（224,224）。

Resnet

Resnet在深度残留学习图像识别论文中介绍。有几种不同尺寸的变体，包括Resnet18，Resnet34，Resnet50，Resnet101和Resnet152，所有这些都可以从torchvision模型中获得。这里我们使用Resnet18，因为我们的数据集很小，只有两个类。当我们打印模型时，我们看到最后一层是完全连接的层，如下所示：

(fc): Linear(in_features=512, out_features=1000, bias=True)

因此，我们必须将model.fc重新初始化为具有512个输入要素和2个输出要素的线性图层：

model.fc = nn.Linear(512, num_classes)

Alexnet

Alexnet在DeepNet卷积神经网络ImageNet分类论文中被引入，是ImageNet数据集上第一个非常成功的CNN。当我们打印模型架构时，我们看到模型输出来自分类器的第6层

(classifier): Sequential(
    ...
    (6): Linear(in_features=4096, out_features=1000, bias=True)
 )

要将模型与我们的数据集一起使用，我们将此图层重新初始化为

model.classifier[6] = nn.Linear(4096,num_classes)

VGG

VGG在用于大规模图像识别的超深度卷积网络中被引入。 Torchvision提供8种不同长度的VGG版本，其中一些版本具有批量标准化层。这里我们使用VGG-11进行批量标准化。输出层类似于Alexnet，即

(classifier): Sequential(
    ...
    (6): Linear(in_features=4096, out_features=1000, bias=True)
 )

因此，我们使用相同的技术来修改输出层

model.classifier[6] = nn.Linear(4096,num_classes)

Squeezenet

Squeezenet中描述了Squeeznet架构：AlexNet级精度，参数减少50倍，模型尺寸小于0.5MB，并使用与此处显示的任何其他模型不同的输出结构。 Torchvision有两个版本的Squeezenet，我们使用1.0版本。输出来自1x1卷积层，它是分类器的第一层：

(classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AvgPool2d(kernel_size=13, stride=1, padding=0)
 )

要修改网络，我们重新初始化Conv2d图层以使深度为2的输出特征图为

model.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))

Densenet

Densenet 在Densely Connected Convolutional Networks中引入。 Torchvision有四种Densenet变种，但在这里我们只使用Densenet-121。输出层是一个具有1024个输入功能的线性层：

(classifier): Linear(in_features=1024, out_features=1000, bias=True)

为了重塑网络，我们将分类器的线性层重新初始化为

model.classifier = nn.Linear(1024, num_classes)

Inception v3

最后，Inception v3首先在重新思考计算机视觉的初始架构中进行了描述。该网络是独一无二的，因为它在训练时有两个输出层。第二个输出称为辅助输出，包含在网络的AuxLogits部分中。主输出是网络末端的线性层。注意，测试时我们只考虑主要输出。加载模型的辅助输出和主输出打印为：

(AuxLogits): InceptionAux(
    ...
    (fc): Linear(in_features=768, out_features=1000, bias=True)
 )
 ...
(fc): Linear(in_features=2048, out_features=1000, bias=True)

要微调这个模型，我们必须重塑这两个层。这可以通过以下方式完成

model.AuxLogits.fc = nn.Linear(768, num_classes)
model.fc = nn.Linear(2048, num_classes)

请注意，许多模型具有相似的输出结构，但每个模型的处理方式略有不同。另外，请查看重新形成网络的打印模型体系结构，并确保输出特征的数量与数据集中的类数相同。

def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "resnet":
        """ Resnet18
        """
        model_ft = models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "vgg":
        """ VGG11_bn
        """
        model_ft = models.vgg11_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224

    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "inception":
        """ Inception v3
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299

    else:
        print("Invalid model name, exiting...")
        exit()

    return model_ft, input_size

# Initialize the model for this run
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)

# Print the model we just instantiated
print(model_ft)

Out:

SqueezeNet(
  (features): Sequential(
    (0): Conv2d(3, 96, kernel_size=(7, 7), stride=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(96, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (4): Fire(
      (squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (5): Fire(
      (squeeze): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (6): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (7): Fire(
      (squeeze): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (8): Fire(
      (squeeze): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (9): Fire(
      (squeeze): Conv2d(384, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (10): Fire(
      (squeeze): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (11): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (12): Fire(
      (squeeze): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 2, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AdaptiveAvgPool2d(output_size=(1, 1))
  )
)

Load Data

现在我们知道输入大小必须是什么，我们可以初始化数据转换，图像数据集和数据加载器。请注意，模型是使用硬编码的标准化值预先训练的，如此处所述。

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}

# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Out:

Initializing Datasets and Dataloaders...

Create the Optimizer

现在模型结构是正确的，微调和特征提取的最后一步是创建一个只更新所需参数的优化器。回想一下，在加载预训练模型之后，但在重新整形之前，如果feature_extract = True，我们手动将所有参数的.requires_grad属性设置为False。然后，重新初始化的图层的参数默认为.requires_grad = True。所以现在我们知道应该优化所有具有.requires_grad = True的参数。接下来，我们列出这些参数并将此列表输入到SGD算法构造函数。

要验证这一点，请查看要学习的打印参数。微调时，此列表应该很长并包含所有模型参数。但是，当提取此列表的特征应该很短并且仅包括重新形成的层的权重和偏差时。

# Send the model to GPU
model_ft = model_ft.to(device)

# Gather the parameters to be optimized/updated in this run. If we are
#  finetuning we will be updating all parameters. However, if we are
#  doing feature extract method, we will only update the parameters
#  that we have just initialized, i.e. the parameters with requires_grad
#  is True.
params_to_update = model_ft.parameters()
print("Params to learn:")
if feature_extract:
    params_to_update = []
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

Out:

Params to learn:
         classifier.1.weight
         classifier.1.bias

Run Training and Validation Step

最后，最后一步是为模型设置损失，然后针对设定的时期数运行训练和验证功能。请注意，根据时期的数量，此步骤可能需要一段时间才能在CPU上执行。此外，默认学习速率对于所有模型都不是最佳的，因此为了获得最大精度，有必要分别调整每个模型。

# Setup the loss fxn
criterion = nn.CrossEntropyLoss()

# Train and evaluate
model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))

Out:

Epoch 0/14
----------
train Loss: 0.6181 Acc: 0.6803
val Loss: 0.3712 Acc: 0.8954

Epoch 1/14
----------
train Loss: 0.3051 Acc: 0.8648
val Loss: 0.3308 Acc: 0.9020

Epoch 2/14
----------
train Loss: 0.2560 Acc: 0.8852
val Loss: 0.3174 Acc: 0.9150

Epoch 3/14
----------
train Loss: 0.2221 Acc: 0.8975
val Loss: 0.3219 Acc: 0.9216

Comparison with Model Trained from Scratch

只是为了好玩，让我们看看如果我们不使用迁移学习，模型将如何学习。微调与特征提取的性能在很大程度上取决于数据集，但一般而言，两种迁移学习方法相对于从头开始训练的模型，在训练时间和总体准确性方面产生有利结果。

# Initialize the non-pretrained version of the model used for this run
scratch_model,_ = initialize_model(model_name, num_classes, feature_extract=False, use_pretrained=False)
scratch_model = scratch_model.to(device)
scratch_optimizer = optim.SGD(scratch_model.parameters(), lr=0.001, momentum=0.9)
scratch_criterion = nn.CrossEntropyLoss()
_,scratch_hist = train_model(scratch_model, dataloaders_dict, scratch_criterion, scratch_optimizer, num_epochs=num_epochs, is_inception=(model_name=="inception"))

# Plot the training curves of validation accuracy vs. number
#  of training epochs for the transfer learning method and
#  the model trained from scratch
ohist = []
shist = []

ohist = [h.cpu().numpy() for h in hist]
shist = [h.cpu().numpy() for h in scratch_hist]

plt.title("Validation Accuracy vs. Number of Training Epochs")
plt.xlabel("Training Epochs")
plt.ylabel("Validation Accuracy")
plt.plot(range(1,num_epochs+1),ohist,label="Pretrained")
plt.plot(range(1,num_epochs+1),shist,label="Scratch")
plt.ylim((0,1.))
plt.xticks(np.arange(1, num_epochs+1, 1.0))
plt.legend()
plt.show()

Out:

Epoch 0/14
----------
train Loss: 0.6963 Acc: 0.4918
val Loss: 0.6931 Acc: 0.4641

Epoch 1/14
----------
train Loss: 0.6936 Acc: 0.4959
val Loss: 0.6931 Acc: 0.4575

Epoch 2/14
----------
train Loss: 0.6925 Acc: 0.4959
val Loss: 0.6931 Acc: 0.4575

Epoch 3/14
----------
train Loss: 0.6942 Acc: 0.4959
val Loss: 0.6931 Acc: 0.4575

Final Thoughts and Where to Go Next

尝试运行其他一些模型，看看准确度有多好。另外，请注意特征提取花费的时间较少，因为在向后传递中我们不必计算大部分梯度。这里有很多地方可以去。你可以：

使用更难的数据集运行此代码，并查看迁移学习的更多好处
使用此处描述的方法，使用转移学习更新不同的模型，可能在新域（即NLP，音频等）中
一旦您对模型感到满意，您可以将其导出为ONNX模型，或使用混合前端跟踪它以获得更快的速度和优化机会。