CV transfer learning based on torchvision

We have used cifar10 before. Because our model is larger in size, it can understand more complex data sets, so here we use a more complex data set called cifar100. As the name implies, it is a 100-category image data set. , with more classified data and more complexity.

define dataset

import torchvision
import torch


#定义数据集
class Dataset(torch.utils.data.Dataset):

    def __init__(self, train):

        #在线加载数据集
        #更多数据集:https://pytorch.org/vision/stable/datasets.html
        self.data = torchvision.datasets.CIFAR100(root='data',
                                                  train=train,
                                                  download=True)

        #更多数据增强:https://pytorch.org/vision/stable/transforms.html
        self.compose = torchvision.transforms.Compose([

            #原本是32*32的,缩放到300*300,这是为了适应预训练模型的习惯,便于它抽取图像特征
            torchvision.transforms.Resize(300),

            #随机左右翻转,这是一种图像增强,很显然,左右翻转不影响图像的分类结果
            torchvision.transforms.RandomHorizontalFlip(p=0.5),

            #图像转矩阵数据,值域是0-1之间
            torchvision.transforms.ToTensor(),

            #让图像的3个通道的数据分别服从3个正态分布,这3分数据是从一个大的数据集上统计得出的
            #投影也是为了适应预训练模型的习惯
            torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                             std=[0.229, 0.224, 0.225]),
        ])

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):
        #取数据
        x, y = self.data[i]

        #应用compose,图像转数据
        x = self.compose(x)

        return x, y


dataset = Dataset(train=True)

x, y = dataset[0]

print(len(dataset), x.shape, y)

Here we use torchvision's way of loading datasets, which can load datasets online, so more datasets can be obtained through the connections in the above comments. root='data' refers to the path to save the downloaded data set to the local disk, that is, the location of the data cache. The value of the following parameter train is a Boolean value, which refers to the training of the data set to be downloaded The part is still the test part, a variable is used here because we need both parts of the training set.

The compose variable is another function provided by torchvision, which is the data enhancement of the image. The specific method can also be found through the link in the comment. The following demonstrates the commonly used ones. The first one is resize, which is the zoom of the image. It was originally 32x32, and here it is uniformly scaled to 300x300. This is to adapt to the habit of the pre-training model and facilitate the extraction of image features by the pre-training model, because our pre-training model uses 300x300 images for training. The image enhancement of the second application is random left-right flipping, and the flipping probability is set to 0.5, but for the cifar100 data set, the left-right flipping does not affect the classification results of the image, plus this data enhancement is to make our data set more abundant. Use the tool class ToTensor to convert the image into a matrix, and the value range is between 0 and 1. Finally, we normalize the data, that is, let the three channels of our image data obey three normal distributions, and the mean and standard deviation of the three normal distributions are written on it.

Then there are len and getitem. The getitem function takes a batch of data each time, and then applies our compose to this image, so as to enhance our image, and then convert the image into data.

define loader

#每次从loader获取一批数据时回回调,可以在这里做一些数据整理的工作
#这里写的只是个例子,事实上这个回调函数什么也没干..
def collate_fn(data):
    #取数据
    x = [i[0] for i in data]
    y = [i[1] for i in data]

    #比如可以手动转换数据格式
    x = torch.stack(x)
    y = torch.LongTensor(y)

    return x, y


#数据加载器
loader = torch.utils.data.DataLoader(dataset=dataset,
                                     batch_size=8,
                                     shuffle=True,
                                     drop_last=True,
                                     collate_fn=collate_fn)

x, y = next(iter(loader))

print(len(loader), x.shape, y)

There is nothing to say about the code of the loader. The only thing to mention is collate_fn. This function is called back every time a batch of data is fetched from the loader, so some data collation can be done in this function.

(6250, torch.Size([8, 3, 300, 300]), tensor([50, 54, 98, 51, 77, 96, 72, 81]))

Obviously, x is 8 images, y is 8 integers, and the value is between 0 and 100.

transfer learning

The first part of the general model is to read in the data, then extract features layer by layer, and finally extract the data into a vector, put it into a fully connected neural network for classification, then for a trained neural network As far as the network model is concerned, many of the layers can actually be reused. For example, a model here is the result of a regression, and then what should I do if I don't want to regress. It's very simple, I cut off the last layer, and then reconnected three new layers, and then I decide whether to do classification or regression among these three layers. That is to say, I don't train the previous layers, or these layers are basically trained, even if I retrain it, the difficulty will be less.

This is migration learning. Its core idea is to reuse the previously trained model, and the parameters of some of its layers, especially the shallow ones, because these layers are responsible for the feature extraction of image data. In my new The model can be reused, because I also do the work of data feature extraction.

define model

According to what I said before, we need a pre-trained model, and use torchvision to complete this work. In it, it provides a lot of pre-trained models. More options can be found in the link, so that after loading through torchvision , Reassemble the model, and we only need the feature part here, and connect it to a fully connected output layer later, then I can decide whether I want to classify or return.

class Model(torch.nn.Module):

    def __init__(self):
        super().__init__()

        #加载预训练模型
        #更多模型:https://pytorch.org/vision/stable/models.html#table-of-all-available-classification-weights
        pretrained = torchvision.models.efficientnet_v2_s(
            weights=torchvision.models.EfficientNet_V2_S_Weights.IMAGENET1K_V1)

        #重新组装模型,只要特征抽取部分
        pretrained = torch.nn.Sequential(
            pretrained.features,
            pretrained.avgpool,
            torch.nn.Flatten(start_dim=1),
        )

        #锁定参数,不训练
        for param in pretrained.parameters():
            param.requires_grad_(False)

        pretrained.eval()
        self.pretrained = pretrained

        #线性输出层,这部分是要重新训练的
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(1280, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 100),
        )

    def forward(self, x):
        #调用预训练模型抽取参数,因为预训练模型是不训练的,所以这里不需要计算梯度
        with torch.no_grad():
            #[8, 3, 300, 300] -> [8, 1280]
            x = self.pretrained(x)

        #计算线性输出
        #[8, 1280] -> [8, 100]
        return self.fc(x)


model = Model()

x = torch.randn(8, 3, 300, 300)

print(model.pretrained(x).shape, model(x).shape)

model training

#训练
def train():
    #注意这里的参数列表,只包括要训练的参数即可
    optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-3)
    loss_fun = torch.nn.CrossEntropyLoss()
    model.fc.train()

    #定义计算设备,优先使用gpu
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)

    print('device=', device)

    for i, (x, y) in enumerate(loader):
        #如果使用gpu,数据要搬运到显存里
        x = x.to(device)
        y = y.to(device)

        out = model(x)
        loss = loss_fun(out, y)

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if i % 500 == 0:
            acc = (out.argmax(dim=1) == y).sum().item() / len(y)
            print(i, loss.item(), acc)

    #保存模型,只保存训练的部分即可
    torch.save(model.fc.to('cpu'), 'model/8.model')

Defining devices is what we generally write. If there is a GPU, then use the GPU for calculation.

test

@torch.no_grad()
def test():

    #加载保存的模型
    model.fc = torch.load('model/8.model')
    model.fc.eval()

    #加载测试数据集,共10000条数据
    loader_test = torch.utils.data.DataLoader(dataset=Dataset(train=False),
                                              batch_size=8,
                                              shuffle=True,
                                              drop_last=True)

    correct = 0
    total = 0
    for i in range(100):
        x, y = next(iter(loader_test))

        #这里因为数据量不大,使用cpu计算就可以了
        out = model(x).argmax(dim=1)

        correct += (out == y).sum().item()
        total += len(y)

    print(correct / total)

The test data set is loaded here. Note that the train here is False, and the final correct rate is 70%, which is relatively high.

Guess you like

Origin blog.csdn.net/m0_62919535/article/details/131741805