[DL series] Detailed explanation of ResNet network structure and complete code implementation

Name: Deep Residual Learning for Image Recognition

Author: He Yuming team

Publiced: 2015.12_CVPR



foreword

1. Problems to be solved in residual network

  • network degradation

insert image description here

As the depth of the network increases, the amount of information that the network can obtain increases, and the extracted features become more abundant. However, before the residual structure was proposed, according to experiments, as the network layer continues to deepen, the accuracy of the model will continue to increase at first, reaching the maximum saturation value, and then as the depth of the network continues to increase, the accuracy of the model will not only decrease. Continue to increase, but there will be a significant decrease, that is, the error of the model training process and testing process is higher than that of the shallow model. This is due to the problem that the previous network model will cause gradient explosion and gradient disappearance as the network layer continues to deepen.

2. Highlights of the ResNet model

  1. Propose the Residual module
  2. Use Batch Normalization to speed up training (discard dropout)
  3. Residual network: easy to converge, solve the degradation problem well, the model can be very deep, and the accuracy rate is greatly improved

ResNet model structure

1. Residual Learning

  • Residual Structure 1

insert image description here

  • Residual structure 2

insert image description here

The residual network is a very effective network to alleviate the problem of gradient disappearance, which greatly improves the depth of the network that can be effectively trained.

  • Residual structure 1: The original paper will input X (that is, input) after a series of processing to obtain the residual F(X). If it does not undergo downsample processing on the shortcut branch (that is, does not undergo conv1x1 convolution + BN processing), then in The final residual mapping function is F(X) + X. This structure is generally used in the layer after the first layer in the conv_x block, that is, it is used in the layer that does not need to change the input and output dimensions before and after.

  • Residual structure 2: The structure mentioned in residual structure 2 that requires downsample processing on the shortcut branch is generally used in the first layer of each conv_x block, that is, the output out_channel of the previous layer does not conform to this The in_channel required by the layer, at this time, it is necessary to use conv1x1 convolution to perform the dimension-up operation, and the residual mapping function obtained at this time is F(X) + G(X) , and G(X) is the shortcut branch to process the input X The resulting identity map.

2. Residual module

  • Original paper Residual structure

insert image description here

  • Residual detailed structure
    insert image description here

There are two types of Residual structures, BasicBlock and BottleNeck . BasicBlock application and ResNet-18, ResNet-34 model, BottleNeck application and ResNet-50, ResNet-101, ResNet-152 model.

  • In the two structures, the shortcut branch of downsample is applied, that is, the first layer in each conv_x block mentioned above. At this time, the input out_channel of the previous layer is not equal to the in_channel required by this layer, so through downsample Make dimension adjustments.

  • The dotted line shortcut branch, that is, in_channel = out_channel after the downsample adjustment of the first layer in the conv_x block, the subsequent layers in this block conv_x no longer need to adjust the channel_size, so it can directly map X (ie input).

  • Precautions for BN layer
    insert image description here

[Note]: Here is a reference to the notes of the big up [ Thunderbolt Wz ] at station b. Through the formula, it is more intuitive to explain that adding the BN layer does not need to set the bias.

It can be seen from the above series of derivation formulas that after adding the BN layer, the bias will be offset in the calculation (that is, b in the formula), so set bias=False in the code .

3. ResNet model

  • ResNet-18, Res-50 model structure
    insert image description here

  • Original paper ResNet-layer model

insert image description here

The general structure of all models in ResNet-layer is basically similar. The difference between the models of different layers lies in the choice of using BasicBlock or BottleNeck, and the second is the number of Residuals used in each Conv_x block.

The changes in the number of channels in the ResNet-18 and ResNet-50 models in the above figure, the changes in image size, and whether to use downsample have been marked, and can be compared with the specific ResNet_layer model given in the original paper for mutual learning. Other layer models are basically consistent with these two models.


ResNet-layers model complete code

insert image description here

[Note]: This model code is basically consistent with the logic of Pytorch's official source code structure. For specific details, explanations and instructions have been given in the comments in the code.

1. Basic Block

import torch
import torch.nn as nn


class BasicBlock(nn.Module):
    """搭建BasicBlock模块"""
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(BasicBlock, self).__init__()

        # 使用BN层是不需要使用bias的,bias最后会抵消掉
        self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1, stride=stride, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)    # BN层, BN层放在conv层和relu层中间使用
        self.conv2 = nn.Conv2d(out_channel, out_channel, kernel_size=3, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)

        self.downsample = downsample
        self.relu = nn.ReLU(inplace=True)

    # 前向传播
    def forward(self, X):
        identity = X
        Y = self.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))

        if self.downsample is not None:    # 保证原始输入X的size与主分支卷积后的输出size叠加时维度相同
            identity = self.downsample(X)

        return self.relu(Y + identity)

2. BottleNeck

class BottleNeck(nn.Module):
    """搭建BottleNeck模块"""
    # BottleNeck模块最终输出out_channel是Residual模块输入in_channel的size的4倍(Residual模块输入为64),shortcut分支in_channel
    # 为Residual的输入64,因此需要在shortcut分支上将Residual模块的in_channel扩张4倍,使之与原始输入图片X的size一致
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(BottleNeck, self).__init__()

        # 默认原始输入为256,经过7x7层和3x3层之后BottleNeck的输入降至64
        self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)    # BN层, BN层放在conv层和relu层中间使用
        self.conv2 = nn.Conv2d(out_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.conv3 = nn.Conv2d(out_channel, out_channel * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)  # Residual中第三层out_channel扩张到in_channel的4倍

        self.downsample = downsample
        self.relu = nn.ReLU(inplace=True)

    # 前向传播
    def forward(self, X):
        identity = X

        Y = self.relu(self.bn1(self.conv1(X)))
        Y = self.relu(self.bn2(self.conv2(Y)))
        Y = self.bn3(self.conv3(Y))

        if self.downsample is not None:    # 保证原始输入X的size与主分支卷积后的输出size叠加时维度相同
            identity = self.downsample(X)

        return self.relu(Y + identity)

3. ResNet

class ResNet(nn.Module):
    """搭建ResNet-layer通用框架"""
    # num_classes是训练集的分类个数,include_top是在ResNet的基础上搭建更加复杂的网络时用到,此处用不到
    def __init__(self, residual, num_residuals, num_classes=1000, include_top=True):
        super(ResNet, self).__init__()

        self.out_channel = 64    # 输出通道数(即卷积核个数),会生成与设定的输出通道数相同的卷积核个数
        self.include_top = include_top

        self.conv1 = nn.Conv2d(3, self.out_channel, kernel_size=7, stride=2, padding=3,
                               bias=False)    # 3表示输入特征图像的RGB通道数为3,即图片数据的输入通道为3
        self.bn1 = nn.BatchNorm2d(self.out_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.conv2 = self.residual_block(residual, 64, num_residuals[0])
        self.conv3 = self.residual_block(residual, 128, num_residuals[1], stride=2)
        self.conv4 = self.residual_block(residual, 256, num_residuals[2], stride=2)
        self.conv5 = self.residual_block(residual, 512, num_residuals[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))    # output_size = (1, 1)
            self.fc = nn.Linear(512 * residual.expansion, num_classes)

        # 对conv层进行初始化操作
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def residual_block(self, residual, channel, num_residuals, stride=1):
        downsample = None

        # 用在每个conv_x组块的第一层的shortcut分支上,此时上个conv_x输出out_channel与本conv_x所要求的输入in_channel通道数不同,
        # 所以用downsample调整进行升维,使输出out_channel调整到本conv_x后续处理所要求的维度。
        # 同时stride=2进行下采样减小尺寸size,(注:conv2时没有进行下采样,conv3-5进行下采样,size=56、28、14、7)。
        if stride != 1 or self.out_channel != channel * residual.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.out_channel, channel * residual.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * residual.expansion))

        block = []    # block列表保存某个conv_x组块里for循环生成的所有层
        # 添加每一个conv_x组块里的第一层,第一层决定此组块是否需要下采样(后续层不需要)
        block.append(residual(self.out_channel, channel, downsample=downsample, stride=stride))
        self.out_channel = channel * residual.expansion    # 输出通道out_channel扩张

        for _ in range(1, num_residuals):
            block.append(residual(self.out_channel, channel))

        # 非关键字参数的特征是一个星号*加上参数名,比如*number,定义后,number可以接收任意数量的参数,并将它们储存在一个tuple中
        return nn.Sequential(*block)

    # 前向传播
    def forward(self, X):
        Y = self.relu(self.bn1(self.conv1(X)))
        Y = self.maxpool(Y)
        Y = self.conv5(self.conv4(self.conv3(self.conv2(Y))))

        if self.include_top:
            Y = self.avgpool(Y)
            Y = torch.flatten(Y, 1)
            Y = self.fc(Y)

        return Y

4. Build ResNet-34, ResNet-50 models

# 构建ResNet-34模型
def resnet34(num_classes=1000, include_top=True):
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


# 构建ResNet-50模型
def resnet50(num_classes=1000, include_top=True):
    return ResNet(BottleNeck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


# 模型网络结构可视化
net = resnet34()

5. Network structure visualization

# 1. 使用torchsummary中的summary查看模型的输入输出形状、顺序结构,网络参数量,网络模型大小等信息
from torchsummary import summary

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = net.to(device)
summary(model, (3, 224, 224))    # 3是RGB通道数,即表示输入224 * 224的3通道的数据


# 2. 使用torchviz中的make_dot生成模型的网络结构,pdf图包括计算路径、网络各层的权重、偏移量
from torchviz import make_dot

X = torch.rand(size=(1, 3, 224, 224))    # 3是RGB通道数,即表示输入224 * 224的3通道的数据
Y = net(X)
vise = make_dot(Y, params=dict(net.named_parameters()))
vise.view()

6. View the official source code of Pytorch

# Pytorch官方ResNet模型
# 导入resnet包,(ctrl + 左键)点击resnet34,即可查看resnet-layer模型源码
from torchvision.models import resnet34

7. split_dataset.py

File: split_dataset.py

Function: dataset partition script. Divide the original data set flower_photos intotwo data sets of train and test , and change the picture size=224x224.

Dataset download address: http://download.tensorflow.org/example_images/flower_photos.tgz

Dataset save path: root directory\data_set\flower_photos

"""
# 数据集划分脚本
#
"""

import os
import glob
import random
from PIL import Image


if __name__ == '__main__':
    split_rate = 0.1    # 训练集和验证集划分比率
    resize_image = 224    # 图片缩放后统一大小
    file_path = '.\\data_set\\flower_photos'    # 获取原始数据集路径

    # 找到文件中所有文件夹的目录,即类文件夹名
    dirs = glob.glob(os.path.join(file_path, '*'))
    dirs = [d for d in dirs if os.path.isdir(d)]

    print("Totally {} classes: {}".format(len(dirs), dirs))    # 打印花类文件夹名称

    for path in dirs:
        # 对每个类别进行单独处理
        path = path.split('\\')[-1]  # -1表示以分隔符/保留后面的一段字符

        # 在根目录中创建两个文件夹,train/test
        os.makedirs("data_set\\train\\{}".format(path), exist_ok=True)
        os.makedirs("data_set\\test\\{}".format(path), exist_ok=True)

        # 读取原始数据集中path类中对应类型的图片,并添加到files中
        files = glob.glob(os.path.join(file_path, path, '*jpg'))
        files += glob.glob(os.path.join(file_path, path, '*jpeg'))
        files += glob.glob(os.path.join(file_path, path, '*png'))

        random.shuffle(files)    # 打乱图片顺序
        split_boundary = int(len(files) * split_rate)  # 训练集和测试集的划分边界

        for i, file in enumerate(files):
            img = Image.open(file).convert('RGB')

            # 更改原始图片尺寸
            old_size = img.size  # (wight, height)
            ratio = float(resize_image) / max(old_size)  # 通过最长的size计算原始图片缩放比率
            # 把原始图片最长的size缩放到resize_pic,短的边等比率缩放,等比例缩放不会改变图片的原始长宽比
            new_size = tuple([int(x * ratio) for x in old_size])

            im = img.resize(new_size, Image.ANTIALIAS)  # 更改原始图片的尺寸,并设置图片高质量,保存成新图片im
            new_im = Image.new("RGB", (resize_image, resize_image))  # 创建一个resize_pic尺寸的黑色背景
            # 把新图片im贴到黑色背景上,并通过'地板除//'设置居中放置
            new_im.paste(im, ((resize_image - new_size[0]) // 2, (resize_image - new_size[1]) // 2))

            # 先划分0.1_rate的测试集,剩下的再划分为0.9_ate的训练集,同时直接更改图片后缀为.jpg
            assert new_im.mode == "RGB"
            if i < split_boundary:
                new_im.save(os.path.join("data_set\\test\\{}".format(path),
                                         file.split('\\')[-1].split('.')[0] + '.jpg'))
            else:
                new_im.save(os.path.join("data_set\\train\\{}".format(path),
                                         file.split('\\')[-1].split('.')[0] + '.jpg'))

    # 统计划分好的训练集和测试集中.jpg图片的数量
    train_files = glob.glob(os.path.join('data_set', 'train', '*', '*.jpg'))
    test_files = glob.glob(os.path.join('data_set', 'test', '*', '*.jpg'))

    print("Totally {} files for train".format(len(train_files)))
    print("Totally {} files for test".format(len(test_files)))

8. train.py

File: train.py

Function: Train the model and verify the accuracy of the trained model.

ResNet-34 pre-training weight file download address: https://download.pytorch.org/models/resnet34-b627a593.pth

[Note]: Download address of other ResNet-layer pre-training weights:

model_urls = {
     
     
    'resnet18': 'https://download.pytorch.org/models/resnet18-f37072fd.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-b627a593.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-0676ba61.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-63fe2227.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-394f9c45.pth',
    'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
    'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
    'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
    'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
}
"""
# 训练脚本
#
"""

import os
import sys
import json
import time

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data
from torchvision import transforms, datasets
from tqdm import tqdm

from model import resnet34


def train_model():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("Using {} device.".format(device))

    # 数据预处理。transforms提供一系列数据预处理方法
    data_transform = {
    
    
        "train": transforms.Compose([transforms.RandomResizedCrop(224),    # 随机裁剪
                                     transforms.RandomHorizontalFlip(),    # 水平方向随机反转
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),    # 标准化
        "val": transforms.Compose([transforms.Resize(256),    # 图像缩放
                                   transforms.CenterCrop(224),    # 中心裁剪
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}

    # 获取数据集根目录(即当前代码文件夹路径)
    data_root = os.path.abspath(os.path.join(os.getcwd(), ".\\"))
    # 获取flower图片数据集路径
    image_path = os.path.join(data_root, "data_set")
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)

    # ImageFolder是一个通用的数据加载器,它要求我们以root/class/xxx.png格式来组织数据集的训练、验证或者测试图片。
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"), transform=data_transform["train"])
    train_num = len(train_dataset)
    val_dataset = datasets.ImageFolder(root=os.path.join(image_path, "test"), transform=data_transform["val"])
    val_num = len(val_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    class_dict = dict((val, key) for key, val in flower_list.items())    # 将字典中键值对翻转。此处翻转为 {'0':daisy,...}

    # 将class_dict编码成json格式文件
    json_str = json.dumps(class_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size = 4    # 设置批大小。batch_size太大会报错OSError: [WinError 1455] 页面文件太小,无法完成操作。
    num_workers = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print("Using batch_size={} dataloader workers every process.".format(num_workers))

    # 加载训练集和测试集
    train_loader = Data.DataLoader(train_dataset, batch_size=batch_size,
                                   num_workers=num_workers, shuffle=True)
    val_loader = Data.DataLoader(val_dataset, batch_size=batch_size,
                                 num_workers=num_workers, shuffle=True)
    print("Using {} train_images for training, {} test_images for validation.".format(train_num, val_num))
    print()

    # 加载预训练权重
    # download url: https://download.pytorch.org/models/resnet34-b627a593.pth
    net = resnet34()
    model_weight_path = ".\\resnet34_pre.pth"    # 预训练权重
    assert os.path.exists(model_weight_path), "file {} does not exist.".format(model_weight_path)
    # torch.load_state_dict()函数就是用于将预训练的参数权重加载到新的模型之中
    net.load_state_dict(torch.load(model_weight_path, map_location='cpu'), strict=False)

    # 改变in_channel符合fc层的要求,调整output为数据集类别5
    in_channel = net.fc.in_features
    net.fc = nn.Linear(in_channel, 5)
    net.to(device)

    # 损失函数
    loss_function = nn.CrossEntropyLoss()

    # 优化器
    params = [p for p in net.parameters() if p.requires_grad]
    optimizer = optim.Adam(params, lr=0.0001)

    epochs = 10    # 训练迭代次数
    best_acc = 0.0
    save_path = '.\\resNet34.pth'    # 当前模型训练好后的权重参数文件保存路径
    batch_num = len(train_loader)    # 一个batch中数据的数量
    total_time = 0    # 统计训练过程总时间

    for epoch in range(epochs):
        # 开始迭代训练和测试
        start_time = time.perf_counter()  # 计算训练一个epoch的时间

        # train
        net.train()
        train_loss = 0.0
        train_bar = tqdm(train_loader, file=sys.stdout)    # tqdm是Python进度条库,可以在Python长循环中添加一个进度条提示信息。

        for step, data in enumerate(train_bar):
            train_images, train_labels = data
            train_images = train_images.to(device)
            train_labels = train_labels.to(device)

            optimizer.zero_grad()    # 梯度置零。清空之前的梯度信息
            outputs = net(train_images)    # 前向传播
            loss = loss_function(outputs, train_labels)    # 计算损失
            loss.backward()    # 反向传播
            optimizer.step()    # 参数更新
            train_loss += loss.item()    # 将计算的loss累加到train_loss中

            # desc:str类型,作为进度条说明,在进度条右边
            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}.".format(epoch+1, epochs, loss)

        # validate
        net.eval()
        val_acc = 0.0
        val_bar = tqdm(val_loader, file=sys.stdout)

        with torch.no_grad():
            for val_data in val_bar:
                val_images, val_labels = val_data
                val_images = val_images.to(device)
                val_labels = val_labels.to(device)

                val_y = net(val_images)    # 前向传播
                predict_y = torch.max(val_y, dim=1)[1]    # 在维度为1上找到预测Y的最大值,第0个维度是batch
                # 计算测试集精度。predict_y与val_labels进行比较(true=1, False=0)的一个batch求和,所有batch的累加精度值
                val_acc += torch.eq(predict_y, val_labels).sum().item()

                val_bar.desc = "valid epoch[{}/{}].".format(epoch+1, epochs)

        # 打印epoch数据结果
        val_accurate = val_acc / val_num
        print("[epoch {:.0f}] train_loss: {:.3f}  val_accuracy: {:.3f}"
              .format(epoch+1, train_loss/batch_num, val_accurate))

        epoch_time = time.perf_counter() - start_time    # 计算训练一个epoch的时间
        print("epoch_time: {}".format(epoch_time))
        total_time += epoch_time    # 统计训练过程总时间
        print()

        # 调整测试集最优精度
        if val_accurate > best_acc:
            best_acc = val_accurate
            # model.state_dict()保存学习到的参数
            torch.save(net.state_dict(), save_path)    # 保存当前最高的准确度

    # 将训练过程总时间转换为h:m:s格式打印
    m, s = divmod(total_time, 60)
    h, m = divmod(m, 60)
    print("Total_time: {:.0f}:{:.0f}:{:.0f}".format(h, m, s))

    print('Finished Training!')


if __name__ == '__main__':
    train_model()


[Appendix: ResNet_layer model code]

File: model.py

"""
# 搭建resnet-layer模型
#
"""
import torch
import torch.nn as nn


class BasicBlock(nn.Module):
    """搭建BasicBlock模块"""
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(BasicBlock, self).__init__()

        # 使用BN层是不需要使用bias的,bias最后会抵消掉
        self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1, stride=stride, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)    # BN层, BN层放在conv层和relu层中间使用
        self.conv2 = nn.Conv2d(out_channel, out_channel, kernel_size=3, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)

        self.downsample = downsample
        self.relu = nn.ReLU(inplace=True)

    # 前向传播
    def forward(self, X):
        identity = X
        Y = self.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))

        if self.downsample is not None:    # 保证原始输入X的size与主分支卷积后的输出size叠加时维度相同
            identity = self.downsample(X)

        return self.relu(Y + identity)


class BottleNeck(nn.Module):
    """搭建BottleNeck模块"""
    # BottleNeck模块最终输出out_channel是Residual模块输入in_channel的size的4倍(Residual模块输入为64),shortcut分支in_channel
    # 为Residual的输入64,因此需要在shortcut分支上将Residual模块的in_channel扩张4倍,使之与原始输入图片X的size一致
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(BottleNeck, self).__init__()
        # 默认原始输入为224,经过7x7层和3x3层之后BottleNeck的输入降至64
        self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)    # BN层, BN层放在conv层和relu层中间使用
        self.conv2 = nn.Conv2d(out_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.conv3 = nn.Conv2d(out_channel, out_channel * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)  # Residual中第三层out_channel扩张到in_channel的4倍

        self.downsample = downsample
        self.relu = nn.ReLU(inplace=True)

    # 前向传播
    def forward(self, X):
        identity = X

        Y = self.relu(self.bn1(self.conv1(X)))
        Y = self.relu(self.bn2(self.conv2(Y)))
        Y = self.bn3(self.conv3(Y))

        if self.downsample is not None:    # 保证原始输入X的size与主分支卷积后的输出size叠加时维度相同
            identity = self.downsample(X)

        return self.relu(Y + identity)


class ResNet(nn.Module):
    """搭建ResNet-layer通用框架"""
    # num_classes是训练集的分类个数,include_top是在ResNet的基础上搭建更加复杂的网络时用到,此处用不到
    def __init__(self, residual, num_residuals, num_classes=1000, include_top=True):
        super(ResNet, self).__init__()

        self.out_channel = 64    # 输出通道数(即卷积核个数),会生成与设定的输出通道数相同的卷积核个数
        self.include_top = include_top

        self.conv1 = nn.Conv2d(3, self.out_channel, kernel_size=7, stride=2, padding=3,
                               bias=False)    # 3表示输入特征图像的RGB通道数为3,即图片数据的输入通道为3
        self.bn1 = nn.BatchNorm2d(self.out_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.conv2 = self.residual_block(residual, 64, num_residuals[0])
        self.conv3 = self.residual_block(residual, 128, num_residuals[1], stride=2)
        self.conv4 = self.residual_block(residual, 256, num_residuals[2], stride=2)
        self.conv5 = self.residual_block(residual, 512, num_residuals[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))    # output_size = (1, 1)
            self.fc = nn.Linear(512 * residual.expansion, num_classes)

        # 对conv层进行初始化操作
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def residual_block(self, residual, channel, num_residuals, stride=1):
        downsample = None

        # 用在每个conv_x组块的第一层的shortcut分支上,此时上个conv_x输出out_channel与本conv_x所要求的输入in_channel通道数不同,
        # 所以用downsample调整进行升维,使输出out_channel调整到本conv_x后续处理所要求的维度。
        # 同时stride=2进行下采样减小尺寸size,(注:conv2时没有进行下采样,conv3-5进行下采样,size=56、28、14、7)。
        if stride != 1 or self.out_channel != channel * residual.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.out_channel, channel * residual.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * residual.expansion))

        block = []    # block列表保存某个conv_x组块里for循环生成的所有层
        # 添加每一个conv_x组块里的第一层,第一层决定此组块是否需要下采样(后续层不需要)
        block.append(residual(self.out_channel, channel, downsample=downsample, stride=stride))
        self.out_channel = channel * residual.expansion    # 输出通道out_channel扩张

        for _ in range(1, num_residuals):
            block.append(residual(self.out_channel, channel))

        # 非关键字参数的特征是一个星号*加上参数名,比如*number,定义后,number可以接收任意数量的参数,并将它们储存在一个tuple中
        return nn.Sequential(*block)

    # 前向传播
    def forward(self, X):
        Y = self.relu(self.bn1(self.conv1(X)))
        Y = self.maxpool(Y)
        Y = self.conv5(self.conv4(self.conv3(self.conv2(Y))))

        if self.include_top:
            Y = self.avgpool(Y)
            Y = torch.flatten(Y, 1)
            Y = self.fc(Y)

        return Y


# 构建ResNet-34模型
def resnet34(num_classes=1000, include_top=True):
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


# 构建ResNet-50模型
def resnet50(num_classes=1000, include_top=True):
    return ResNet(BottleNeck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


# 模型网络结构可视化
net = resnet34()

"""
# 1. 使用torchsummary中的summary查看模型的输入输出形状、顺序结构,网络参数量,网络模型大小等信息
from torchsummary import summary

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = net.to(device)
summary(model, (3, 224, 224))    # 3是RGB通道数,即表示输入224 * 224的3通道的数据
"""


"""
# 2. 使用torchviz中的make_dot生成模型的网络结构,pdf图包括计算路径、网络各层的权重、偏移量
from torchviz import make_dot

X = torch.rand(size=(1, 3, 224, 224))    # 3是RGB通道数,即表示输入224 * 224的3通道的数据
Y = net(X)
vise = make_dot(Y, params=dict(net.named_parameters()))
vise.view()
"""


"""
# Pytorch官方ResNet模型
from torchvision.models import resnet34
"""



[Note]: Due to my limited level, if there are any mistakes, please correct me! ! !

Guess you like

Origin blog.csdn.net/qq_39770163/article/details/126169080