使用 pytorch 搭建 AlexNet 神经网络模型，实现Cifar10数据集的10分类

模型简介

LeNet 在Mnist数据集上的表现非常好，但是在更大的真实数据集上的表现就没那么出色，一直到2012年的AlexNet横空出世。该模型拿到了当年ImageNet竞赛的冠军，并从此掀起了一波深度学习的热潮。

ImageNet 是李飞飞团队创建的用于图像处理的大型数据集，包含超过1400万张带标记的图像。2010年来，ImageNet每年举办一次的图像分类和物体检测比赛————ILSVRC。图像分类比赛中有1000个不同类别的图像，每个类别有200~1000张不同源的图片。

AlexNet 主要有5个卷积层和3个全连接层组成，最后一个全连接层通过softmax产生的结果作为输入图像在1000个类别上的得分。

模型结构

如图所示，该图是来自AlexNet论文中的结构图。

输入层是 (227, 227, 3) 的图像。

第一层：卷积层1，输入为 224 × 224 × 3 的图像，卷积核的数量为96，论文中两片GPU分别计算48个核; 卷积核的大小为 11 × 11 × 3 ; stride = 4, stride表示的是步长， pad = 0, 表示不扩充边缘;

卷积后的图形大小是怎样的呢？

wide = (224 + 2 * padding - kernel_size) / stride + 1 = 54

height = (224 + 2 * padding - kernel_size) / stride + 1 = 54

dimention = 96

然后进行 (Local Response Normalized),

后面跟着池化pool_size = (3, 3), stride = 2, pad = 0

最终获得第一层卷积的输出为 55 * 55 * 96

第二层：卷积层2, 输入为上一层卷积的feature map，卷积的个数为256个，论文中的两个GPU分别有128个卷积核。卷积核的大小为：5 × 5 × 48; pad = 2, stride = 1; 然后做 LRN，最后 max_pooling, pool_size = (3, 3), stride = 2;

第三层：卷积3, 输入为第二层的输出，卷积核个数为384, kernel_size = (3 × 3 × 256)， padding = 1, 第三层没有做LRN和Pool

第四层：卷积4, 输入为第三层的输出，卷积核个数为384, kernel_size = (3 × 3), padding = 1, 和第三层一样，没有LRN和Pool

第五层：卷积5, 输入为第四层的输出，卷积核个数为256, kernel_size = (3 × 3), padding = 1。然后直接进行max_pooling, pool_size = (3, 3), stride = 2;

第6,7,8层是全连接层，每一层的神经元的个数为4096，最终输出softmax为1000,因为上面介绍过，ImageNet这个比赛的分类个数为1000。全连接层中使用了RELU和Dropout。

使用Relu作为激活函数

为了加快网络训练速度，AlexNet使用了Relu作为激活函数

使用多种方法避免过拟合

使用了数据增强和dropout方法避免过拟合

使用多个GPU训练

结构图中分为上下2条线路，就是在使用多个GPU训练

代码实现

数据集

Cifar-10 数据集由6万张32 * 32的彩色图片组成，共10个分类, 每个分类6000张图片。其中训练集5万张，测试集1万张图片。

数据集包含飞机，手机等10个分类：

import torchvision.transforms as transforms
train_tf = transforms.Compose(
    [
        transforms.Resize((227,227)),
        transforms.RandomHorizontalFlip(0.5),
        transforms.ToTensor(),
        transforms.Normalize([0.49139968,0.48215841,0.44653091],
                                       [0.24703223,0.24348513,0.26158784])]
)
valid_tf = transforms.Compose(
    [
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        transforms.Normalize([0.49139968,0.48215841,0.44653091],
                                       [0.24703223,0.24348513,0.26158784])]
)
复制代码

import torch
from torch.utils.data import DataLoader
import torchvision.datasets as dsets
import torchvision.transforms as transforms
#torchvision.transforms中定义了一系列数据转换形式，有PILImage,numpy,Tensor间相互转换，还能对数据进行处理。
batch_size = 64
# MNIST dataset
train_dataset = dsets.CIFAR10(root = '/ml/cifar', #选择数据的根目录
                           train = True, # 选择训练集
                           transform = train_tf, #不考虑使用任何数据预处理
                           download = True) # 从网络上download图片
test_dataset = dsets.CIFAR10(root = '/ml/cifar', #选择数据的根目录
                           train = False, # 选择测试集
                           transform = valid_tf, #不考虑使用任何数据预处理
                           download = True) # 从网络上download图片
#加载数据
train_loader = torch.utils.data.DataLoader(dataset = train_dataset, 
                                           batch_size = batch_size, 
                                           shuffle = True)  # 将数据打乱
test_loader = torch.utils.data.DataLoader(dataset = test_dataset,
                                          batch_size = batch_size,
                                          shuffle = False)
复制代码

#创建Alexnet模型
from torch import nn
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

import torch.nn as nn
class Alexnet(nn.Module):
    def __init__(self,in_dim,n_class):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_dim,96,11,stride=4,padding=0),
            #nn.BatchNorm2d(96),#增加了BatchNormalization
            nn.ReLU(True),
            nn.MaxPool2d(3, 2),
        
            nn.Conv2d(96,256,5,stride=1,padding=2),
            #nn.BatchNorm2d(256),
            nn.ReLU(True),
            nn.MaxPool2d(3, 2),
        
            nn.Conv2d(256,384,3,stride=1,padding=1),
            #nn.BatchNorm2d(384),
            nn.ReLU(True),
        
            nn.Conv2d(384,384,kernel_size=3,stride=1,padding=1),
            #nn.BatchNorm2d(384),
            nn.ReLU(True),
        
            nn.Conv2d(384,256,kernel_size=3,stride=1,padding=1),
            #nn.BatchNorm2d(256),
            nn.ReLU(True),
            nn.MaxPool2d(3, 2)
        )
        
        self.fc = nn.Sequential(
            nn.Linear(9216,4096),
            nn.ReLU(True),
            nn.Dropout(0.5),
            nn.Linear(4096,4096),
            nn.ReLU(True),
            nn.Dropout(0.5),
            nn.Linear(4096,n_class)
        )

    def forward(self, x): 
        x = self.conv(x) 
        x = x.view(x.size(0), -1) # 将（batch，256,6,6）展平为（batch，256*6*6） 
        output = self.fc(x) 
        return output
        
复制代码

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = Alexnet(3,10)
model.to(device)

复制代码


import numpy as np
learning_rate = 1e-2 #学习率
num_epoches = 20
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr = learning_rate,momentum=0.9)#使用随机梯度下降
model.train() #开启训练模式
for epoch in range(num_epoches):
    print('current epoch = %d' % epoch)
    for i, (images, labels) in enumerate(train_loader): #利用enumerate取出一个可迭代对象的内容
        images = images.to(device)
        labels = labels.to(device)
        #print(images.shape) #[batch,channel,width,height]
        outputs = net(images) #将数据集传入网络做前向计算
        loss = criterion(outputs, labels) #计算loss
        optimizer.zero_grad() #在做反向传播之前先清除下网络状态
        loss.backward() #loss反向传播
        optimizer.step() #更新参数
        
        if i % 100 == 0:
            print('current loss = %.5f' % loss.item())
            
print('finished training')

复制代码


# 做 prediction
total = 0
correct = 0
net.eval() #开启评估模式

for images, labels in test_loader:

    images = images.to(device)
    labels = labels.to(device)
    outputs = net(images)
    _, predicts = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicts == labels).cpu().sum()

print(total)
print('Accuracy = %.2f' % (100 * correct / total))

复制代码

# 结果如下：
10000
Accuracy = 83.83
复制代码

参考资料

AlexNet论文： papers.nips.cc/paper/4824-…

cifar数据集：www.cs.toronto.edu/~kriz/cifar…