构建第一个卷积神经网络模型的动手指南

概述

本文将简要讨论卷积神经网络（CNN），这是一种专为图像相关任务设计的特殊神经网络。

本文将主要关注CNN的实现部分。

介绍
CNN 模型架构中的主要组成部分
- 卷积层
- 池化层
- 全连接层
代码实现
- 步骤1：导入必要的库
- 步骤2：下载训练和测试数据集
- 步骤3：拆分训练集进行训练和验证
- 步骤4：使用 Dataloader 将数据集加载到内存中
- 步骤5：定义架构
- 步骤6：定义损失函数
- 步骤7：实施训练和验证算法
- 步骤8：训练和评估阶段
- 步骤9：测试阶段
- 步骤 10 使用样本进行测试
结论

介绍

卷积神经网络由 Yann LeCun 和 Yoshua Bengio 在 1995 年引入，后来证明在图像领域显示出非凡的结果。

那么，当应用于图像领域时，它们与普通神经网络相比有何特别之处？

我将用一个简单的例子来解释其中的一个原因。考虑到任务是对手写数字图像进行分类，下面给出了一些来自训练集的样本。

如果你正确观察，你会发现所有数字都出现在相应图像的中心。如果测试图像的类型相似，用这些图像训练一个正常的神经网络模型可能会得到很好的结果。

但是如果测试图像如下所示呢？

这里数字九出现在图像的角落。如果我们使用一个简单的神经网络模型来对这张图片进行分类，我们的模型可能会分类失败。

但是，如果将相同的测试图像提供给 CNN 模型，则它很可能会正确分类。它性能更好的原因是它在图像中寻找空间特征。

对于上述情况本身，即使数字 9 位于帧的左角，经过训练的 CNN 模型也会捕获图像中的特征，并且很可能预测该数字是数字 9。普通的神经网络无法做到这种程度。

现在让我们简要讨论一下 CNN 的主要构建块。

CNN 模型架构中的主要组成部分

这是一个简单的 CNN 模型，用于对图像是否包含猫进行分类。

因此，CNN 的主要组成部分是：

卷积层
池化层
全连接层

卷积层

卷积层帮助我们提取图像中存在的特征。这种提取是在滤波器的帮助下实现的。

请遵守以下操作。

在这里，我们可以看到一个窗口在整个图像上滑动，其中图像表示为网格。

现在让我们看看如何进行卷积运算。

假设输入特征图是我们的图像，卷积滤波器是我们要滑过的窗口。

现在让我们观察卷积运算的实例之一。

当卷积滤波器叠加在图像上时，相应的元素会相乘。然后将相乘的值相加得到一个填充在输出特征图中的值。

这个操作一直持续到我们在输入特征图上滑动窗口，直到填充完输出特征图为止。

池化层

使用池化层的想法是减少特征图的维度。对于下面给出的表示，我们使用了 2*2 最大池化层。每次窗口滑过图像时，我们取窗口内的最大值。

最后，经过最大池操作后，我们可以看到输入的维度即 4 * 4 已经缩小到 2*2。

全连接层

如前所述，该层位于 CNN 模型架构的尾部。全连接层的输入是使用卷积滤波器提取的丰富特征。然后向前传播直到输出层，在那里我们得到输入图像属于不同类别的概率。预测输出是模型预测的概率最高的类别。

代码实现

在这里，我们将Fashion MNIST 作为我们的问题数据集。

该数据集包含 T 恤、裤子、套头衫、连衣裙、外套、凉鞋、衬衫、运动鞋、包和踝靴。任务是在训练模型后将给定图像分类到上述类别中。

我们将在 Google Colab 中实现代码，因为它们会在固定时间段内提供免费 GPU 资源的使用。

如果你不熟悉 Colab 环境和 GPU，请查看此博客 (https://www.analyticsvidhya.com/blog/2021/05/a-complete-hands-on-guide-to-train-your-neural-network-model-on-google-colab-gpu/) 以获得更好的想法。

下面给出的是我们将要构建的 CNN 的架构。

步骤 1：导入必要的库

import os
import torch
import torchvision
import tarfile
from torchvision import transforms
from torch.utils.data import random_split
from torch.utils.data.dataloader import DataLoader
import torch.nn as nn
from torch.nn import functional as F
from itertools import chain

步骤2：下载训练和测试数据集

train_set = torchvision.datasets.FashionMNIST("/usr", download=True, transform=
                                                transforms.Compose([transforms.ToTensor()]))
test_set = torchvision.datasets.FashionMNIST("./data", download=True, train=False, transform=
                                               transforms.Compose([transforms.ToTensor()]))

步骤3 拆分训练集进行训练和验证

train_size = 48000
val_size = 60000 - train_size
train_ds,val_ds = random_split(train_set,[train_size,val_size])

步骤4使用 Dataloader 将数据集加载到内存中

train_dl = DataLoader(train_ds,batch_size=20,shuffle=True)
val_dl = DataLoader(val_ds,batch_size=20,shuffle=True)
classes = train_set.classes

现在让我们可视化加载的数据，

for imgs,labels in train_dl:
  for img in imgs:
    arr_ = np.squeeze(img) 
    plt.show()
    break
  break

步骤5 定义架构

import torch.nn as nn
import torch.nn.functional as F
#define the CNN architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        #convolutional layer-1
        self.conv1 = nn.Conv2d(1,6,5, padding=0)
        #convolutional layer-2
        self.conv2 = nn.Conv2d(6,10,5,padding=0)
        # max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        # Fully connected layer 1
        self.ff1 = nn.Linear(4*4*10,56)
        # Fully connected layer 2
        self.ff2 = nn.Linear(56,10)

def forward(self, x):
        # adding sequence of convolutional and max pooling layers
        #input dim-28*28*1
        x = self.conv1(x)
        # After convolution operation, output dim - 24*24*6
        x = self.pool(x)
        # After Max pool operation output dim - 12*12*6
        x = self.conv2(x)
        # After convolution operation  output dim - 8*8*10
        x = self.pool(x)
        # max pool output dim 4*4*10
        x = x.view(-1,4*4*10) # Reshaping the values to a shape appropriate to the input of fully connected layer
        x = F.relu(self.ff1(x)) # Applying Relu to the output of first layer
        x = F.sigmoid(self.ff2(x)) # Applying sigmoid to the output of second layer
        return x

# create a complete CNN
model_scratch = Net()
print(model)

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()

步骤 6 定义损失函数

# Loss function 
import torch.nn as nn
import torch.optim as optim
criterion_scratch = nn.CrossEntropyLoss()
def get_optimizer_scratch(model):
    optimizer = optim.SGD(model.parameters(),lr = 0.04)
    return optimizer

步骤 7 实施训练和验证算法

# Implementing the training algorithm
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        # train phase #
        # setting the module to training mode
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
        # validate the model #
        # set the model to evaluation mode
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - valid_loss))
# print training/validation statistics 
        print('Epoch: {} tTraining Loss: {:.6f} tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
## If the valiation loss has decreased, then saving the model
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss 
    return model

步骤 8：训练和评估阶段

num_epochs = 15
model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch), 
                      criterion_scratch, use_cuda, 'model_scratch.pt')

请注意，每次验证损失减少时，我们都在保存模型的状态。

步骤 9 测试阶段

def test(loaders, model, criterion, use_cuda):
# monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.
# set the module to evaluation mode
    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
          # move to GPU
          if use_cuda:
            data, target = data.cuda(), target.cuda()
          # forward pass: compute predicted outputs by passing inputs to the model
          output = model(data)
          # calculate the loss
          loss = criterion(output, target)
          # update average test loss 
          test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - test_loss))
          # convert output probabilities to predicted class
          pred = output.data.max(1, keepdim=True)[1]
          # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred)),axis=1).cpu().numpy())
        total += data.size(0)
 print('Test Loss: {:.6f}n'.format(test_loss))
print('nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))
# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)

步骤 10 使用样本进行测试

为使用单个图像测试模型而定义的函数

def predict_image(img, model):
    # Convert to a batch of 1
    xb = img.unsqueeze(0)
    # Get predictions from model
    yb = model(xb)
    # Pick index with highest probability
    _, preds  = torch.max(yb, dim=1)
    # printing the image
    plt.imshow(img.squeeze( ))
    #returning the class label related to the image
    return train_set.classes[preds[0].item()]

img,label = test_set[9]
predict_image(img,model_scratch)

结论

在这里，我们简要讨论了卷积神经网络中的主要操作及其架构。还实现了一个简单的卷积神经网络模型，以更好地了解实际用例。

你可以在我的GitHub repo 中找到代码：https://github.com/radathan1/Hands-on-Guide-to-Built-Your-First-Convolutional-Neural-Network-model

此外，你可以通过在架构的全连接层中使用正则化技术（例如批量归一化和 dropout）来扩充数据集来提高实现模型的性能。

请记住，也可以使用预先训练的 CNN 模型，这些模型已使用大型数据集进行训练。通过使用这些最先进的模型，你肯定会获得给定问题的最佳度量分数。

参考

https://www.youtube.com/watch?v=EHuACSjijbI – Jovian
https://www.youtube.com/watch?v=2-Ol7ZB0MmU&t=1503s-A friendly introduction to Convolutional Neural Networks and Image Recognition

☆ END ☆

如果看到这里，说明你喜欢这篇文章，请转发、点赞。微信搜索「uncle_pn」，欢迎添加小编微信「 woshicver」，每日朋友圈更新一篇高质量博文。

↓扫描二维码添加小编↓