Deep learning completes image classification based on ResNet18 network

one. foreword

This task is to use the ResNet18 network to practice a more general image classification task.

The ResNet series of networks, well-known algorithms in the field of image classification, are enduring and timeless, and they still have a wide range of research significance and application scenarios to this day. Various improvements have been made by the industry and are often used for image recognition tasks.

Today, I mainly introduce the case of the ResNet-18 network structure, and other deep networks can be deduced in turn.

ResNet-18, the number represents the depth of the network, that is to say, does the ResNet18 network have 18 layers? In fact, 18 here specifies 18 layers with weights, including convolutional layers and fully connected layers, excluding pooling layers and BN layers.

Image classification is a basic task in computer vision, which divides different images into different categories by the semantics of images. Many tasks can also be converted to image classification tasks. For example, face detection is to determine whether there is a face in an area, which can be regarded as a binary image classification task.

  • Dataset: The classic CIFAR-10 dataset used in the field of computer vision
  • Network layer: The network is ResNet18 model
  • Optimizer: The optimizer is the Adam optimizer
  • Loss function: The loss function is cross entropy loss
  • Evaluation index: The evaluation index is the accuracy rate

Introduction to ResNet network:

insert image description here

two. data preprocessing

2.1 Dataset introduction

The CIFAR-10 dataset contains 10 different categories with a total of 60,000 images, of which each category has 6,000 images, and the image size is 32 × 3232 × 32 pixels.

2.2 Data reading

In this experiment, the original training set is split into two parts, train_set and dev_set, including 40,000 and 10,000 samples, respectively. Take data_batch_1 to data_batch_4 as the training set, data_batch_5 as the validation set, and test_batch as the test set. The final dataset consists of:

  • Training set: 40 000 samples.
  • Validation set: 10 000 samples.
  • Test set: 10 000 samples.

The code to read a batch of data is as follows:

import os
import pickle
import numpy as np

def load_cifar10_batch(folder_path, batch_id=1, mode='train'):
    if mode == 'test':
        file_path = os.path.join(folder_path, 'test_batch')
    else:
        file_path = os.path.join(folder_path, 'data_batch_'+str(batch_id))

    #加载数据集文件
    with open(file_path, 'rb') as batch_file:
        batch = pickle.load(batch_file, encoding = 'latin1')

    imgs = batch['data'].reshape((len(batch['data']),3,32,32)) / 255.
    labels = batch['labels']

    return np.array(imgs, dtype='float32'), np.array(labels)

imgs_batch, labels_batch = load_cifar10_batch(folder_path='datasets/cifar-10-batches-py', 
                                                batch_id=1, mode='train')

View the dimensions of the data:

#打印一下每个batch中X和y的维度
print ("batch of imgs shape: ",imgs_batch.shape, "batch of labels shape: ", labels_batch.shape)

batch of imgs shape: (10000, 3, 32, 32) batch of labels shape: (10000,)

Visually observe one of the sample images and corresponding labels, the code is as follows:

%matplotlib inline
import matplotlib.pyplot as plt

image, label = imgs_batch[1], labels_batch[1]
print("The label in the picture is {}".format(label))
plt.figure(figsize=(2, 2))
plt.imshow(image.transpose(1,2,0))
plt.savefig('cnn-car.pdf')

2.3 Constructing the Dataset class

Construct a CIFAR10Dataset class, which will inherit from the paddle.io.DataSetclass and can process data one by one. The code is implemented as follows:

import paddle
import paddle.io as io
from paddle.vision.transforms import Normalize

class CIFAR10Dataset(io.Dataset):
    def __init__(self, folder_path='/home/aistudio/cifar-10-batches-py', mode='train'):
        if mode == 'train':
            #加载batch1-batch4作为训练集
            self.imgs, self.labels = load_cifar10_batch(folder_path=folder_path, batch_id=1, mode='train')
            for i in range(2, 5):
                imgs_batch, labels_batch = load_cifar10_batch(folder_path=folder_path, batch_id=i, mode='train')
                self.imgs, self.labels = np.concatenate([self.imgs, imgs_batch]), np.concatenate([self.labels, labels_batch])
        elif mode == 'dev':
            #加载batch5作为验证集
            self.imgs, self.labels = load_cifar10_batch(folder_path=folder_path, batch_id=5, mode='dev')
        elif mode == 'test':
            #加载测试集
            self.imgs, self.labels = load_cifar10_batch(folder_path=folder_path, mode='test')
        self.transform = Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010], data_format='CHW')

    def __getitem__(self, idx):
        img, label = self.imgs[idx], self.labels[idx]
        img = self.transform(img)
        return img, label

    def __len__(self):
        return len(self.imgs)

paddle.seed(100)
train_dataset = CIFAR10Dataset(folder_path='datasets/cifar-10-batches-py', mode='train')
dev_dataset = CIFAR10Dataset(folder_path='datasets/cifar-10-batches-py', mode='dev')
test_dataset = CIFAR10Dataset(folder_path='datasets/cifar-10-batches-py', mode='test')

3. Model Construction

Image classification experiments using Resnet18 in the Flying Paddle high-level API.

from paddle.vision.models import resnet18

resnet18_model = resnet18()

The high-level API of the paddle is a further encapsulation and upgrade of the paddle API, providing a more concise and easy-to-use API, which further improves the ease of learning and use of the paddle. Among them, the flying paddle high-level API encapsulates the following modules:

  1. Model class, which supports model training with only a few lines of code;
  2. Image preprocessing module, including dozens of data processing functions, basically covering common data processing and data enhancement methods;
  3. Common models in the field of computer vision and natural language processing, including but not limited to mobilenet, resnet, yolov3, cyclegan, bert, transformer, seq2seq, etc. At the same time, the pre-trained models of the corresponding models are released, and these models can be used directly or here On the basis of the completion of the secondary development.

4. Model training

Reuse the RunnerV3 class, instantiate the RunnerV3 class, and pass in the training configuration. Model training is performed using the training set and validation set for a total of 30 epochs. In experiments, save the model with the highest accuracy as the best model. The code is implemented as follows:

import paddle.nn.functional as F
import paddle.optimizer as opt
from nndl import RunnerV3, Accuracy

#指定运行设备
use_gpu = True if paddle.get_device().startswith("gpu") else False
if use_gpu:
    paddle.set_device('gpu:0')
#学习率大小
lr = 0.001  
#批次大小
batch_size = 64 
#加载数据
train_loader = io.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = io.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = io.DataLoader(test_dataset, batch_size=batch_size) 
#定义网络
model = resnet18_model
#定义优化器,这里使用Adam优化器以及l2正则化策略,相关内容在7.3.3.2和7.6.2中会进行详细介绍
optimizer = opt.Adam(learning_rate=lr, parameters=model.parameters(), weight_decay=0.005)
#定义损失函数
loss_fn = F.cross_entropy
#定义评价指标
metric = Accuracy(is_logist=True)
#实例化RunnerV3
runner = RunnerV3(model, optimizer, loss_fn, metric)
#启动训练
log_steps = 3000
eval_steps = 3000
runner.train(train_loader, dev_loader, num_epochs=30, log_steps=log_steps, 
                eval_steps=eval_steps, save_path="best_model.pdparams")

Visually observe the accuracy and loss changes of the training set and the validation set.

from nndl import plot

plot(runner, fig_name='cnn-loss4.pdf')

In this experiment, the Adam optimizer introduced in Chapter 7 is used for network optimization. If the SGD optimizer is used, it will cause the phenomenon of over-fitting and cannot get a good convergence effect on the validation set. You can try adjusting the training configuration using other optimization strategies in Chapter 7 to achieve higher model accuracy.

V. Model Evaluation

Use the test data to evaluate the best model saved during the training process, and observe the accuracy and loss of the model on the test set. The code is implemented as follows:

# 加载最优模型
runner.load_model('best_model.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

[Test] accuracy/loss: 0.7234/0.8324

6. Model prediction

Similarly, you can also use the saved model to make model predictions on the data in the test set and observe the model effect. The specific code is implemented as follows:

#获取测试集中的一个batch的数据
X, label = next(test_loader())
logits = runner.predict(X)
#多分类,使用softmax计算预测概率
pred = F.softmax(logits)
#获取概率最大的类别
pred_class = paddle.argmax(pred[2]).numpy()
label = label[2][0].numpy()
#输出真实类别与预测类别
print("The true category is {} and the predicted category is {}".format(label[0], pred_class[0]))
#可视化图片
plt.figure(figsize=(2, 2))
imgs, labels = load_cifar10_batch(folder_path='/home/aistudio/datasets/cifar-10-batches-py', mode='test')
plt.imshow(imgs[2].transpose(1,2,0))
plt.savefig('cnn-test-vis.pdf')

The true category is 8 and the predicted category is 8

The real is 8, the prediction is 8. ship

Guess you like

Origin blog.csdn.net/m0_59596937/article/details/127354485