[打卡]动手学深度学习第四次打卡

代码作业——Fashoin-mnist分类任务：针对Fashoin-mnist数据集，设计、搭建、训练机器学习模型，能够尽可能准确地分辨出测试数据集标签。
（由于正值上课时间，自己时间和水平有限，只完成提交了刷分题，开放题没来得及完成按时提交，很是遗憾！！！）
在这里插入图片描述
图像分类一直是深度学习的热门研究课题之一, 卷积神经网络是近些年发展起来, 并引起重视的一种高效识别算法. 卷积神经网络是人工神经网络的一种, 是深度学习的一个重要算法, 特别是在模式分类领域, 由于该网络避免了对图像的复杂前期预处理, 可以直接输入原始图像, 因此得到广泛的应用. 20 世纪60 年代, Hubel 和Wiesel 在研究猫脑皮层中用于局部敏感和方向选择的神经元时, 发现其独特的网络结构可以有效的降低反馈神经网络的复杂性, 继而提出了卷积神经网络(Convolutional Neural Networks, 简称CNN).

与一般神经网络相比, 从构造上看, 卷积神经网络的基本结构包括两层, 其一为特征提取层, 其二为特征映射层. 从模型特征上看, 卷积神经网络有两个特殊性,其一为神经元之间的连接是非全连接的, 其二为同一层中的某些神经元之间的连接的权重是共享的, 从而降低了模型的复杂度. 卷积神经网络的特殊结构在语音识别和图像处理方面有着独特的优越性, 因此卷积神经网络主要用来识别位移, 缩放以及其他形式扭曲不变性的二维图形. 随着研究学者对卷积神经网络的研究和优化, 提出了各种优秀的卷积神经网络的模型, 并在图像处理上取得了不错的效果.早在1998 年, LEcun Yann, Bottou Leon 就提出了LeNet-5 模型, 这个模型是一种典型的用来识别数字的卷积网络, 当年美国大多数银行就是用它识别支票上面的手写数字的. LeNet-5 是早期的卷积神经网络模型, 其结构一共有7 层, 每层都包含不同数量的训练参数, 主要有2 个卷积层, 2 个池化层, 3 个全连接层.

Alex Krizhevsky 提出的AlexNet 模型是卷积网络进入大范围应用的经典网络模型, 其结构包括5 个卷积层和3 个全连接层, 其主要的优势在于: (1) 使用ReLU 激活函数, ReLU 与Sigmoid 相比, 其优势是训练速度更快, 因为Sigmoid 的导数在饱和区会变得很小,导致权重几乎没有更新, 从而造成梯度消失; (2) 创造性的提出了Dropout 层, Dropout 层主要是防止过拟合的. 每个全连接层后面使用一个Dropout 层, 从而减少过拟合. 其主要的缺点在于当时人们的观念是卷积和越大, 则感受野越大, 从而看到图片的信息越多, 因此获得的特征越好, 因此AlexNet 模型中使用了一些非常大的卷积核, 比如11×11, 5×5 卷积核, 造成了计算量的增加, 降低了计算性能, 并且不利于模型深度的增加.

VGGNet是由牛津大学的视觉几何组(VisualGeometry Group) 提出的, VGGNet 相比于AlexNet 模型最大的改进就是采用连续多个3×3 的卷积核替代了AlexNet 模型中的大卷积核, 同时增加非线性变换的能力, 保证可以增加深度从而学习更复杂的模式, 从而提升模型的效果.

GoogLeNet[8]是2014 年Christian Szegedy 提出的一种卷积神经网络架构, 其主要的贡献就是提出了inception 模块 , 多个inception 模块构成了inception网络结构. 其相对于AlexNet 模型, GoogLeNet 模型也采用了小卷积核, 而相对于V G G N e t 模型,GoogLeNet 模型采用了多个尺寸进行卷积再聚合的方式, 其优势是获取了更多的特征, 丰富的特征会让结果的判断更为准确.

ResNet是MSRA 何凯明团队提出的, 在深度网络中存在以下问题: (1) 深度学习的网络越深, 常规的网络堆叠效果越不好; (2) 网络越深, 梯度消失现象越来越明显, 网络的训练效果也不会很好. (3) 而浅层的网络又无法明显提升网络的识别效果. 因此ResNet 模型引入了残差结构, 其主要优点就在于解决了极深度条件下深度卷积神经网络的性能退化功能.

本次作业我采用ResNet34模型，查看了论文后，改进了元模型几点：
(1) 对ReLU 激活函数, 使用更新的SELU激活函数进行替换, 形成新的残差结构.
(2) 对传统深度残差网络的数据池化层进行改进.
(3) 通过卷积的方式进行维度补偿.
(4) 学习率随迭代次数进行减小.

在这里插入图片描述
改进的神经网络结构为:首先采用改进后的数据池化层结构, 替换传统深度残差网络中最开始的卷积池化层; 其后为4 个残差网络层, 每个残差网络层都包含多个残差网络, 第一个残差网络层包含3 个残差块, 第二个残差网络层包含4 个残差块, 第三个残差网络层包含6 个残差块, 第四个残差网络层包含3 个残差块, 每个残差块都使用SELU 激活函数替代ReLU 激活函数, 同时进行批量归一化处理. 最后通过均值池化和线性分类器得到输入结果. 模型训练中, 损失函数采用交叉熵损失函数, 采用Adam 算法作为优化器进行优化.
以下是具体代码：

# -*- coding: utf-8 -*-
"""
Created on Fri Feb 28 12:09:23 2020

FanshionMNIST 分类任务大作业

@author: Tian YJ
"""
import os
import sys
import time
import math
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
from torchvision import transforms
from torchvision.models.resnet import *
import matplotlib.pyplot as plt


class GlobalAvgPool2d(nn.Module):
    """
    全局平均池化层
    可通过将普通的平均池化的窗口形状设置成输入的高和宽实现
    """
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()
    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])


class FlattenLayer(torch.nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x): # x shape: (batch, *, *, ...)
        return x.view(x.shape[0], -1)

#class Inception(nn.Module):
#    # c1 - c3为每条线路里的层的输出通道数
#    def __init__(self, in_c, c1, c2, c3):
#        super(Inception, self).__init__()
#        # 线路1，4个3 x 3卷积层
#        self.p1_1 = nn.Conv2d(in_c, c1[0], kernel_size=3)
#        self.p1_2 = nn.Conv2d(c1[0], c1[1], kernel_size=3)
#        self.p1_3 = nn.Conv2d(c1[1], c1[2], kernel_size=3)
#        self.p1_4 = nn.Conv2d(c1[2], c1[3], kernel_size=3)
#        # 线路2，3个3 x 3卷积层
#        self.p2_1 = nn.Conv2d(in_c, c2[0], kernel_size=3)
#        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3)
#        self.p2_3 = nn.Conv2d(c2[1], c2[2], kernel_size=3)
#        # 线路3，2个3 x 3卷积层
#        self.p3_1 = nn.Conv2d(in_c, c3[0], kernel_size=3)
#        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=3)
        
#    def forward(self, x):
#        p1 = self.p1_1(x)
#        p1 = self.p1_2(p1)
#        p1 = self.p1_3(p1)
#        p1 = self.p1_4(p1)

#        p2 = self.p2_1(x)
#        p2 = self.p2_2(p2)
#        p2 = self.p2_3(p2)

#        p3 = self.p3_1(x)
#        p3 = self.p3_2(p3)

#        outputs = [p1, p2, p3]
#        return torch.cat(outputs, 1)  # 在通道维上连结输出

class Residual(nn.Module): 
    def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):
        """
            use_1×1conv: 是否使用额外的1x1卷积层来修改通道数
            stride: 卷积层的步幅, resnet使用步长为2的卷积来替代pooling的作用，是个很赞的idea
        """
        super(Residual, self).__init__()
        self.conv1 = nn.Conv2d(in_channels,
                               out_channels,
                               kernel_size=3,
                               padding=1,
                               stride=stride)
        self.conv2 = nn.Conv2d(out_channels,
                               out_channels,
                               kernel_size=3,
                               padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(in_channels,
                                   out_channels,
                                   kernel_size=1,
                                   stride=stride)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        return F.relu(Y + X)


def resnet_block(in_channels, out_channels, num_residuals, first_block=False):
    '''
    resnet block

    num_residuals: 当前block包含多少个残差块
    first_block: 是否为第一个block

    一个resnet block由num_residuals个残差块组成
    其中第一个残差块起到了通道数的转换和pooling的作用
    后面的若干残差块就是完成正常的特征提取
    '''
    if first_block:
        assert in_channels == out_channels # 第一个模块的输出通道数同输入通道数一致
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))
        else:
            blk.append(Residual(out_channels, out_channels))
    return nn.Sequential(*blk)



# 定义resnet模型结构
#net = nn.Sequential(Inception(1, (4, 8, 16, 32), (4, 8, 16), (8, 16)),
#                   nn.BatchNorm2d(64))
net = nn.Sequential(
        nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),   # TODO: 缩小感受野, 缩channel
        nn.BatchNorm2d(32),
        nn.SELU())
        #nn.MaxPool2d(kernel_size=2, stride=2))   # TODO：去掉maxpool缩小感受野



# 然后是连续4个block
net.add_module("resnet_block1", resnet_block(32, 32, 3, first_block=True))   # TODO: channel统一减半
net.add_module("resnet_block2", resnet_block(32, 64, 4))
net.add_module("resnet_block3", resnet_block(64, 128, 6))
net.add_module("resnet_block4", resnet_block(128, 256, 3))
# global average pooling
net.add_module("global_avg_pool", GlobalAvgPool2d()) 
# fc layer
net.add_module("fc", nn.Sequential(FlattenLayer(), nn.Linear(256, 10)))

print('打印网络结构(主要是为了确认如何调整)')
print(net)


print('打印 1*1*28*28 输入经过每个模块后的shape')
X = torch.rand((1, 1, 28, 28))
for name, layer in net.named_children():
    X = layer(X)
    print(name, ' output shape:\t', X.shape)


# 定义加载数据集的函数
def load_data_fashion_mnist(batch_size,
                            root='.',
                            use_normalize=False,
                            mean=None,
                            std=None):
    """Download the fashion mnist dataset and then load into memory."""
    if use_normalize:
        normalize = transforms.Normalize(mean=[mean], std=[std])
        train_augs = transforms.Compose([transforms.RandomCrop(28, padding=2),
                    transforms.RandomHorizontalFlip(),
                    transforms.ToTensor(), 
                    normalize])
        test_augs = transforms.Compose([transforms.ToTensor(), normalize])
    else:
        train_augs = transforms.Compose([transforms.ToTensor()])
        test_augs = transforms.Compose([transforms.ToTensor()])
    
    mnist_train = torchvision.datasets.FashionMNIST(root=root,
                                                    train=True,
                                                    download=False,
                                                    transform=train_augs)
    mnist_test = torchvision.datasets.FashionMNIST(root=root,
                                                   train=False,
                                                   download=False,
                                                   transform=test_augs)
    if sys.platform.startswith('win'):
        num_workers = 0  # 0表示不用额外的进程来加速读取数据
    else:
        num_workers = 4
    train_iter = torch.utils.data.DataLoader(mnist_train,
                                             batch_size=batch_size,
                                             shuffle=True,
                                             num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test,
                                            batch_size=batch_size,
                                            shuffle=False,
                                            num_workers=num_workers)

    return train_iter, test_iter



print('计算数据集均值标准差')
batch_size = 64  
train_iter, test_iter = load_data_fashion_mnist(batch_size,
                                                root='.',
                                                use_normalize=False)
# 求整个数据集的均值
temp_sum = 0
cnt = 0
for X, y in train_iter:
    if y.shape[0] != batch_size:
        break   # 最后一个batch不足batch_size,这里就忽略了
    channel_mean = torch.mean(X, dim=(0,2,3))  # 按channel求均值(不过这里只有1个channel)
    cnt += 1   # cnt记录的是batch的个数，不是图像
    temp_sum += channel_mean[0].item()
dataset_global_mean = temp_sum / cnt
print('整个数据集的像素均值:{}'.format(dataset_global_mean))

# 求整个数据集的标准差
cnt = 0
temp_sum = 0
for X, y in train_iter:
    if y.shape[0] != batch_size:
        break   # 最后一个batch不足batch_size,这里就忽略了
    residual = (X - dataset_global_mean) ** 2
    channel_var_mean = torch.mean(residual, dim=(0,2,3))  
    cnt += 1   # cnt记录的是batch的个数，不是图像
    temp_sum += math.sqrt(channel_var_mean[0].item())
dataset_global_std = temp_sum / cnt
print('整个数据集的像素标准差:{}'.format(dataset_global_std))


# 重新获取应用了归一化的数据集迭代器
batch_size = 64  
train_iter, test_iter = load_data_fashion_mnist(batch_size,
                                                root='.',
                                                use_normalize=True,
                                                mean = dataset_global_mean,
                                                std = dataset_global_std)


def evaluate_accuracy(data_iter, net, device=None):
    if device is None and isinstance(net, torch.nn.Module):
        # 如果没指定device就使用net的device
        device = list(net.parameters())[0].device
    net.eval() 
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        for X, y in data_iter:
            acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
            n += y.shape[0]
    net.train() # 改回训练模式
    return acc_sum / n


def train_model(net,
                train_iter,
                test_iter,
                batch_size,
                optimizer,
                device,
                num_epochs):
    net = net.to(device)
    print("training on ", device)
    loss = torch.nn.CrossEntropyLoss()
    best_test_acc = 0
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1
        test_acc = evaluate_accuracy(test_iter, net)

        Loss_list.append(train_l_sum / batch_count)
        Accuracy_list.append(train_acc_sum / n)

        print('epoch %d, loss %.4f, train acc %.4f, test acc %.4f, time %.1f sec'
              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))
        if test_acc > best_test_acc:
            print('find best! save at model_best.pth')
            best_test_acc = test_acc
            torch.save(net.state_dict(), 'model_best.pth')
            #utils.save_model({
            #    'arch': args.model,
            #    'state_dict': net.state_dict()
            #}, 'saved-models/{}-run-{}.pth.tar'.format(args.model, run))


print('训练...')
lr, num_epochs = 0.01, 100
optimizer = optim.Adam(net.parameters(), lr=lr)
#optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=5e-4)   # TODO:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

#定义两个数组
Loss_list = []
Accuracy_list = []

train_model(net,
            train_iter,
            test_iter,
            batch_size,
            optimizer,
            device,
            num_epochs)


print('加载最优模型')
net.load_state_dict(torch.load('model_best.pth'))
net = net.to(device)

print('inference测试集')
net.eval() 
id = 0
preds_list = []
with torch.no_grad():
    for X, y in test_iter:
        batch_pred = list(net(X.to(device)).argmax(dim=1).cpu().numpy())
        for y_pred in batch_pred:
            preds_list.append((id, y_pred))
            id += 1

print('生成提交结果文件')
with open('submission.csv', 'w') as f:
    f.write('ID,Prediction\n')
    for id, pred in preds_list:
        f.write('{},{}\n'.format(id, pred))

###画训练过程acc与loss变化情况
#我这里迭代了100次，所以x的取值范围为(0，100)，然后再将每次相对应的准确率以及损失率附在x上
x1 = range(0, num_epochs)
x2 = range(0, num_epochs)
y1 = Accuracy_list
y2 = Loss_list
plt.subplot(2, 1, 1)
plt.plot(x1, y1, 'o-')
plt.title('Test accuracy vs. epoches')
plt.ylabel('Test accuracy')
plt.subplot(2, 1, 2)
plt.plot(x2, y2, '.-')
plt.xlabel('Test loss vs. epoches')
plt.ylabel('Test loss')
plt.show()
plt.savefig("accuracy_loss.jpg")

在这里插入图片描述

由于自己太菜了，没能再往上提高accuracy，最终结果是94.82%。（自己给自己菜哭辽…）加油！！！！！！
在这里插入图片描述

田纳尔多

发布了24 篇原创文章 · 获赞 11 · 访问量 695

私信关注

[打卡]动手学深度学习第四次打卡

猜你喜欢