Pytorch migration learning uses Resnet50 for model training to predict cat and dog classification

Table of contents

1. ResNet residual network

1.1 ResNet definition

 1.2 Several network configurations of ResNet

 1.3 ResNet50 network structure

1.3.1 The first few layers of convolution and pooling

1.3.2 Residual Block: Building a Deep Residual Network

1.3.3 ResNet main body: stacking multiple residual blocks

1.4 Migration learning cat and dog binary classification practice

1.4.1 Transfer Learning

1.4.2 Model training

1.4.3 Model Prediction


1. ResNet residual network

1.1 ResNet definition

Deep learning has made major breakthroughs in areas such as image classification, object detection, and speech recognition. However, as the number of network layers increases, the problems of gradient disappearance and gradient explosion gradually become prominent. As the number of layers increases, the gradient information gradually becomes smaller during the backpropagation process, making it difficult for the network to converge. At the same time, the gradient explosion problem will also cause the parameter update of the network to be too large and cannot converge normally.

In order to solve these problems, ResNet proposes an innovative idea: introducing a residual block (Residual Block). The design of the residual block allows the network to learn the residual mapping, which alleviates the vanishing gradient problem and makes the network easier to train.

The figure below is a basic residual block. Its operation is to connect the input of a certain layer to the activation layer of the next layer or even deeper before jumping, and output through the activation function together with the output of this layer.
 

24353e89d9c84a17babbbf4ebe90630b.png

 1.2 Several network configurations of ResNet

As shown below:

 1.3 ResNet50 network structure

ResNet-50 is a deep residual network with 50 convolutional layers. Its network structure is very complex, but we can divide it into the following modules:

1.3.1 The first few layers of convolution and pooling

import torch
import torch.nn as nn

class ResNet50(nn.Module):
    def __init__(self, num_classes=1000):
        super(ResNet50, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

1.3.2 Residual Block: Building a Deep Residual Network

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * 4, kernel_size=1, stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out

1.3.3 ResNet main body: stacking multiple residual blocks

In ResNet-50, we stack multiple residual blocks to build the whole network. Each residual block processes the input feature map and outputs a richer feature map. Stacking multiple residual blocks allows the network to extract information layer by layer in the depth direction, thereby obtaining higher-level semantic information. code show as below:

class ResNet50(nn.Module):
    def __init__(self, num_classes=1000):
        # ... 前几层代码 ...

        # 4个残差块的block1
        self.layer1 = self._make_layer(ResidualBlock, 64, 3, stride=1)
        # 4个残差块的block2
        self.layer2 = self._make_layer(ResidualBlock, 128, 4, stride=2)
        # 4个残差块的block3
        self.layer3 = self._make_layer(ResidualBlock, 256, 6, stride=2)
        # 4个残差块的block4
        self.layer4 = self._make_layer(ResidualBlock, 512, 3, stride=2)

 Use the make_layer function to realize the stacking of the basic residual block Bottleneck. code show as below:

def _make_layer(self, block, channel, block_num, stride=1):
    """
        block: 堆叠的基本模块
        channel: 每个stage中堆叠模块的第一个卷积的卷积核个数,对resnet50分别是:64,128,256,512
        block_num: 当期stage堆叠block个数
        stride: 默认卷积步长
    """
        downsample = None   # 用于控制shorcut路的
        if stride != 1 or self.in_channel != channel*block.expansion:   # 对resnet50:conv2中特征图尺寸H,W不需要下采样/2,但是通道数x4,因此shortcut通道数也需要x4。对其余conv3,4,5,既要特征图尺寸H,W/2,又要shortcut维度x4
            downsample = nn.Sequential(
                nn.Conv2d(in_channels=self.in_channel, out_channels=channel*block.expansion, kernel_size=1, stride=stride, bias=False), # out_channels决定输出通道数x4,stride决定特征图尺寸H,W/2
                nn.BatchNorm2d(num_features=channel*block.expansion))

        layers = []  # 每一个convi_x的结构保存在一个layers列表中,i={2,3,4,5}
        layers.append(block(in_channel=self.in_channel, out_channel=channel, downsample=downsample, stride=stride)) # 定义convi_x中的第一个残差块,只有第一个需要设置downsample和stride
        self.in_channel = channel*block.expansion   # 在下一次调用_make_layer函数的时候,self.in_channel已经x4

        for _ in range(1, block_num):  # 通过循环堆叠其余残差块(堆叠了剩余的block_num-1个)
            layers.append(block(in_channel=self.in_channel, out_channel=channel))

        return nn.Sequential(*layers)   # '*'的作用是将list转换为非关键字参数传入

1.4 Migration learning cat and dog binary classification practice

1.4.1 Transfer Learning

Transfer Learning is a machine learning and deep learning technique that allows us to transfer knowledge or features learned from one task to another related task, thereby accelerating model training and improving performance. In transfer learning, we usually take a model that has been pretrained on a large-scale dataset (called the source task model) and apply its weights to a new task (called the target task), instead of training a completely new model from scratch.

The core idea of ​​transfer learning is : before solving a new task, we can first obtain some common features or knowledge from the learned tasks, and transfer these features or knowledge to the new task. The advantage of this is that the source task model has been fully trained on a large-scale dataset and has learned many common features, such as edge detection, texture, etc., which are useful for many tasks.

1.4.2 Model training

First, we need to prepare a dataset for binary classification of cats and dogs. The data set can be downloaded from Kaggle, which contains a large number of pictures of cats and dogs.

After downloading the dataset, we need to divide the dataset into training and testing sets. The training set folder is named train, and two folders are created, namely cat and dog, and each folder stores pictures of corresponding categories. The test set is named test, similarly. Then we use the ResNet50 network model, train and save our model on our computer using GPU, and verify the accuracy of the model prediction on the test set after the training is completed.

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
from torchvision.datasets import ImageFolder
from torchvision.models import resnet50

# 设置随机种子
torch.manual_seed(42)

# 定义超参数
batch_size = 32
learning_rate = 0.001
num_epochs = 10

# 定义数据转换
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# 加载数据集
train_dataset = ImageFolder("train", transform=transform)
test_dataset = ImageFolder("test", transform=transform)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

# 加载预训练的ResNet-50模型
model = resnet50(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)  # 替换最后一层全连接层,以适应二分类问题

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

# 训练模型
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # 前向传播
        outputs = model(images)
        loss = criterion(outputs, labels)

        # 反向传播和优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            print(f"Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item()}")
torch.save(model,'model/c.pth')
# 测试模型
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        print(outputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        break

    print(f"Accuracy on test images: {(correct / total) * 100}%")

1.4.3 Model Prediction

First load the model we saved, here we predict a single image, and print the prediction results to the log.

import cv2 as cv
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import torchvision.transforms as transforms
import  torch
from PIL import Image
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model=torch.load('model/c.pth')
print(model)
model.to(device)

test_image_path = 'test/dogs/dog.4001.jpg'  # Replace with your test image path
image = Image.open(test_image_path)
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
input_tensor = transform(image).unsqueeze(0).to(device)  # Add a batch dimension and move to GPU

# Set the model to evaluation mode
model.eval()


with torch.no_grad():
    outputs = model(input_tensor)
    _, predicted = torch.max(outputs, 1)
    predicted_label = predicted.item()


label=['猫','狗']
print(label[predicted_label])
plt.axis('off')
plt.imshow(image)
plt.show()

run screenshot

So far this article is over.

Guess you like

Origin blog.csdn.net/qq_43649937/article/details/131870303