PyTorch image classification

table of Contents

A, torch and torchvision

1、torchvision.datasets

2、torchvision.models

3、torchvision.transforms

4、torchvision.utils

Two, MNIST handwritten digit recognition

1, access to training and test sets MNIST

2, data load

3. Data Preview

4, building a convolution neural network model

5, the model parameter optimization and training

6, the training model saving and loading

7, MNIST handwritten digit recognition complete code

Three, CIFAR10 image classification

1, CIFAR10 presentation data set

2, CIFAR10 image classification to achieve

3, running on the GPU neural network


A, torch and torchvision

PyTorch has two core packages, respectively torch and torchvision .

torch.nn package provides many functions associated with a particular realization of the neural network type, torch.optim package provides a lot of automatic optimization of parameters may be implemented in classes, torch.autograd automatic gradient function.

torchvision includes the popular datasets, model structure and common graphic conversion tool, its main function is to achieve processing, import and preview data, etc., so if you need to process the relevant problem in computer vision, you can borrow at torchvision pack for a large number of class to complete the work.

1、torchvision.datasets

torchvision.datasets contains the following data set: MNIST, COCO, LSUN Classification, ImageFolder, Imagenet-12, CIFAR10 and CIFAR100, STL10 like.

2、torchvision.models

torchvision.models submodule module model structure comprising: AlexNet, VGG, ResNet, SqueezeNet , DenseNet

3、torchvision.transforms

(1)torchvision.transforms.Compose(transforms)

torchvision.transforms.Compose class seen as a container which is capable of simultaneously combining a plurality of data conversion. Incoming parameter is a list, the list element data is loaded to the various transformation operations.

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize(mean=[0.5,0.5,0.5],
                                std=[0.5,0.5,0.5])])

In torchvision.transforms.Compose class with only a change in the type of conversion and a data normalization transformation transfroms.ToTensor transforms.Normalize. Used herein is also called the standard deviation normalized transform transformation method, this method requires the use of the mean of the original data (on Mean) and standard deviation (Standard Deviation) to standardize the data, after transforming the standardization, all the data in line with mean 0 standard deviation of a standard normal distribution.

(2)torchvision.transforms.Resize

For loading image data is scaled according to the size of our needs. Parameters passed to this class of data may be an integer, may be a (h, w) is similar to the sequence, wherein, h for height, width W representatives, but if the data is an integer, it indicates scaling the width and height values ​​are integer data.

(3)torchvision.transforms.Scale

For loading image data is scaled according to the size of our needs, and similar usage torchvision.transforms.Resize.

(4)torchvision.transforms.CenterCrop

Images for images to load at the center as a reference point, be cut according to the size we need. Parameters passed to this class of data may be an integer, may be similar to (h, w) of the tuples.

(5)torchvision.transforms.RandomCrop

Used to load images, random cut according to the size we need. Parameters passed to this class of data may be an integer, may be similar to (h, w) of the tuples.

(6)torchvision.transforms.RandomHorizontalFlip

A picture of the loaded horizontally flipping a random probability. We parameters can be passed to the custom class random probability, if not defined, the default probability is used a value of 0.5.

(7)torchvision.transforms.RandomVerticalFlip

For loading pictures in vertical flip a random probability. We parameters can be passed to the custom class random probability, if not defined, the default probability is used a value of 0.5.

(8)torchvision.transforms.ToTensor

Used to load images, data type conversion, the data PIL before composing pictures into Tensor data types of variables, is about a range of values is [0, 255]of PIL.Imageor shapeto (H, W, C)the numpy.ndarrayconversion into a shape [C, H, W], in the range [0, 1.0]of torch.FloatTensor, let PyTorch can be subjected to calculation and processing.

(9) torchvision.transforms.ToPILImage

Tensor for converting data into a variable data images PIL, primarily for the convenience of image content is displayed.

(10)torchvision.transforms.RandomSizedCrop(size, interpolation=2)

First randomly cut into a predetermined size and then resize size.

(11)torchvision.transforms.Pad(padding, fill=0)

To all sides is filled with a given value. padding: how many pixels to be filled.

(12)torchvision.transforms.Normalize(mean, std)

Given the mean and variance, regularization, that is,Normalized_image=(image-mean)/std

( 13) General Transform : Use lambdaas a converter,transforms.Lambda(lambda)

4、torchvision.utils

(1)torchvision.utils.make_grid

utils.make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False), Given 4D-mini-batch Tensorthe shape of a (B*C*H*W), or a a list of image, a make sizeof (B / nrow, nrow)sub atlas, wherein the parameters: normalize = True, the normalized image pixels, range = (min, max) , min , and max is a number, min, max for standardization image, scale_each = True, each picture independent standardization.

(2)torchvision.utils.save_image

utils.save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False), Given Tensorsaved as image files, if it is mini-batch tensor, use make-gridis made again to save the child's portfolio.

Two, MNIST handwritten digit recognition

1, access to training and test sets MNIST

# 对数据进行载入及有相应变换,将Compose看成一种容器,他能对多种数据变换进行组合
# 传入的参数是一个列表,列表中的元素就是对载入的数据进行的各种变换操作(只有一个颜色通道)
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize(mean=[0.5,],std=[0.5,])])
# 获取MNIST训练集和测试集
data_train=datasets.MNIST(root='data/',transform=transform,train=True,download=True)
data_test=datasets.MNIST(root='data/',transform=transform,train=False)

Wherein, root path for specifying the data set is stored after downloading; transformation operation which data needs to be introduced when the data set is used to specify transform, the transform operation to define in advance; Train specifies required after the download is complete data set which part of the data load, if set to True, then the training set is loaded portion of the data set; if set to False, then the load test is set portion of the data set.

2, data load

After the data has been downloaded and loaded, we also need to load the data. We can be understood as a process of loading data of the picture, after the process is completed, we'll need to send us the pictures of models packed train, while the loading process is this packaged. When loading to confirm the size of each packet, to confirm whether the sequential process of loading scrambled image by a value batch_size shuffle value.

Loading the data used is based torch.utils.data.DataLoader, dataset class parameter is used to specify the name of our data set loading, the batch_size picture parameter sets the number of data in each packet, the value of the code it is 64, so each packet contains 64 pictures. The shuffle parameter set to True, during the loading process will be scrambled and the random data package.

# 数据装载
data_loader_train=torch.utils.data.DataLoader(dataset=data_train,batch_size=64,shuffle=True)
data_loader_test = torch.utils.data.DataLoader(dataset =data_test,batch_size = 64,shuffle = True)

3. Data Preview

#数据预览和图片显示
images,labels=next(iter(data_loader_train))
img=torchvision.utils.make_grid(images)
img=img.numpy().transpose(1,2,0)
std=[0.5,0.5,0.5]
mean=[0.5,0.5,0.5]
img=img*std+mean
print([labels[i] for i in range(16)])
plt.imshow(img)
plt.show()

Use iter and the next to obtain a batch of image data and corresponding image tag;

Use torchvision.utils.make_grid class method to a batch of images to be configured in a grid pattern. Parameters passed to it needs is a batch load data, the load data for each batch are 4 dimensional, front to back dimension constituting respectively batch_size, channel, height and weight, respectively, corresponding to a batch the number of data, the number of color channel of each image, the height and width of each picture. After passing through torchvision.utils.make_grid, image dimension becomes (channel, height, weight), the entire batch of images to be integrated together, the corresponding values in this dimension and also not the same as before, but number of colors channel remains unchanged.

Use Matplotlib the picture data to the normal form, the data must first be used in an array, followed by the dimension of an array must be (height, weight, channel), i.e., the number of channels in the final surface color. So we have to complete the exchange of data dimensions and convert raw data by type of numpy and transpose, so that it can use Matplotlib draw the correct image.

Printout all tags in the batch data, as follows:

[tensor(5), tensor(2), tensor(1), tensor(7), tensor(8), tensor(4), tensor(2), tensor(3), tensor(3), tensor(9), tensor(2), tensor(1), tensor(6), tensor(3), tensor(2), tensor(7), tensor(8), tensor(7), tensor(4), tensor(6), tensor(7), tensor(3), tensor(6), tensor(7), tensor(4), tensor(6), tensor(4), tensor(3), tensor(8), tensor(7), tensor(2), tensor(4), tensor(3), tensor(7), tensor(0), tensor(2), tensor(1), tensor(4), tensor(1), tensor(0), tensor(5), tensor(0), tensor(6), tensor(3), tensor(5), tensor(9), tensor(8), tensor(0), tensor(9), tensor(0), tensor(8), tensor(3), tensor(8), tensor(2), tensor(0), tensor(5), tensor(7), tensor(6), tensor(9), tensor(1), tensor(6), tensor(0), tensor(2), tensor(9)]

All images in this batch data for display, as follows:

4, building a convolution neural network model

CNN following general structure:

  • Input layer: for data input
  • Convolution layer: using a convolution kernel feature extraction and feature mapping
  • Excitation layer: Since also a linear convolution operation, it is necessary to increase the nonlinear mapping
  • Pool layers: the sampling, wherein FIG thinning processing, reduce the loss of feature information
  • Output layer: for output
  • Model specific configuration CNN: convolution layer, regularization layer, the excitation layer, the maximum cell layer, fully connected layer
# 构建卷积神经网络模型
class CNN_Model(torch.nn.Module):
    def __init__(self):
        super(CNN_Model, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
            torch.nn.BatchNorm2d(64),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(stride=2,kernel_size=2))
        self.conv2=torch.nn.Sequential(
            torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            torch.nn.BatchNorm2d(128),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(stride=2, kernel_size=2))
        self.dense = torch.nn.Sequential(
            torch.nn.Linear(7 * 7 * 128, 1024),
            torch.nn.ReLU(),
            torch.nn.Dropout(p=0.5),
            torch.nn.Linear(1024, 10))
    # 前向传播
    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x = x2.view(-1, 7 * 7 * 128)
        x = self.dense(x)
        return x

torch.nn.Conv2d : layer structures for the convolutional neural network convolution, the main input parameters are the number of input channels, output channels, the size of the convolution kernel, the convolution kernel and moving step Paddingde value. Wherein the data type is an integer number of input channels, input data for determining the number of layers; the number of output channels of the data types are integer, for determining the output data of the layers; convolution kernel the size of the integer data type , for determining the size of the convolution kernel; convolution kernel long integer data type movement step, for determining the sliding of each convolution kernel step; Paddingde data types are integer, value of 0 indicates no filling boundary pixel, if the value is greater than 0, then the number of layers increases boundary pixel corresponding.

T orch.nn.MaxPool2d : maximum cell layer for a convolutional neural network, the main input parameter is the window size of the pool, the pool and the window moving step of Padding values. Similarly, pooled window size integer data type, for determining the pool size of the window. Pool the window step size is integer data type for determining the movement of each window pooling step. Padding usage and meaning of values and Paddingde values defined in torch.nn.Conv2d is the same.

torch.nn.Dropout : to prevent over-fitting occurs in convolution neural network training process, its working principle is simply in the process of training the model, the random probability of a certain part of the convolution neural network model parameters to zero, in order to achieve the purpose of reducing adjacent two layer neural connection. You can set the size of the random probability values, if the lack of any set, we use the default probability value of 0.5.

x2.view = X (-1,7 * 128 *. 7) : Immediately after the layer is fully connected, if not flat, the dimension parameter flattening of the parameters, because the actual output of the entire connecting layer and dimensions defined input will not match, the program will error.

5, the model parameter optimization and training

# 对模型进行训练和参数优化
cnn_model = CNN_Model()
loss_func = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model.parameters(),lr=learning_rate)
n_epochs = 5
for epoch in range(n_epochs):
    running_loss = 0.0
    running_correct = 0.0
    print("Epoch  {}/{}".format(epoch, n_epochs))
    for data in data_loader_train:
        X_train, y_train = data
        X_train, y_train = Variable(X_train), Variable(y_train)
        outputs = cnn_model(X_train)
        _, pred = torch.max(outputs.data, 1)
        optimizer.zero_grad()
        loss = loss_func(outputs, y_train)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        running_correct += torch.sum(pred == y_train.data)
    testing_correct = 0.0
    for data in data_loader_test:
        X_test, y_test = data
        X_test, y_test = Variable(X_test), Variable(y_test)
        outputs = cnn_model(X_test)
        _, pred = torch.max(outputs, 1) #返回每一行中最大值的那个元素,且返回其索引
        testing_correct += torch.sum(pred == y_test.data)
        # print(testing_correct)
    print("Loss is :{:.4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}%".format(
        running_loss / len(data_train), 100 * running_correct / len(data_train),
        100 * testing_correct / len(data_test)))

6, the training model saving and loading

# 保存模型
torch.save(cnn_model, 'data/cnn_model.pt')
# 加载模型
cnn_model=torch.load('data/cnn_model.pt')
cnn_model.eval()

7, MNIST handwritten digit recognition complete code

import torch
import torchvision
import matplotlib.pyplot as plt
from torchvision import datasets
from torchvision import transforms
from torch.autograd import Variable
# 参数设置
num_epochs = 10
batch_size = 64
learning_rate = 0.001

# 将数据处理成Variable, 如果有GPU, 可以转成cuda形式
def get_variable(x):
    x = Variable(x)
    return x.cuda() if torch.cuda.is_available() else x

# 对数据进行载入及有相应变换,将Compose看成一种容器,他能对多种数据变换进行组合
# 传入的参数是一个列表,列表中的元素就是对载入的数据进行的各种变换操作(只有一个颜色通道)
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize(mean=[0.5,],std=[0.5,])])
# 获取MNIST训练集和测试集
data_train=datasets.MNIST(root='data/',transform=transform,train=True,download=True)
data_test=datasets.MNIST(root='data/',transform=transform,train=False)

# 数据装载
data_loader_train=torch.utils.data.DataLoader(dataset=data_train,batch_size=batch_size,shuffle=True)
data_loader_test = torch.utils.data.DataLoader(dataset =data_test,batch_size = batch_size,shuffle = True)

#数据预览和图片显示
images,labels=next(iter(data_loader_train))
img=torchvision.utils.make_grid(images)
img=img.numpy().transpose(1,2,0)
std=[0.5,0.5,0.5]
mean=[0.5,0.5,0.5]
img=img*std+mean
print([labels[i] for i in range(64)])
plt.imshow(img)
plt.show()

# 构建卷积神经网络模型
class CNN_Model(torch.nn.Module):
    def __init__(self):
        super(CNN_Model, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
            torch.nn.BatchNorm2d(64),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(stride=2,kernel_size=2))
        self.conv2=torch.nn.Sequential(
            torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            torch.nn.BatchNorm2d(128),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(stride=2, kernel_size=2))
        self.dense = torch.nn.Sequential(
            torch.nn.Linear(7 * 7 * 128, 1024),
            torch.nn.ReLU(),
            torch.nn.Dropout(p=0.5),
            torch.nn.Linear(1024, 10))
    # 前向传播
    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x = x2.view(-1, 7 * 7 * 128)
        x = self.dense(x)
        return x

# 对模型进行训练和参数优化
cnn_model = CNN_Model()
# 将所有的模型参数移动到GPU上
if torch.cuda.is_available():
    cnn_model = cnn_model.cuda()
loss_func = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model.parameters(),lr=learning_rate)
for epoch in range(num_epochs):
    running_loss = 0.0
    running_correct = 0.0
    print("Epoch  {}/{}".format(epoch, num_epochs))
    for data in data_loader_train:
        X_train, y_train = data
        X_train, y_train = get_variable(X_train),get_variable(y_train)
        outputs = cnn_model(X_train)
        _, pred = torch.max(outputs.data, 1)
        optimizer.zero_grad()
        loss = loss_func(outputs, y_train)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        running_correct += torch.sum(pred == y_train.data)
    testing_correct = 0.0
    for data in data_loader_test:
        X_test, y_test = data
        X_test, y_test = get_variable(X_test),get_variable(y_test)
        outputs = cnn_model(X_test)
        _, pred = torch.max(outputs, 1) #返回每一行中最大值的那个元素,且返回其索引
        testing_correct += torch.sum(pred == y_test.data)
        # print(testing_correct)
    print("Loss is :{:.4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}%".format(
        running_loss / len(data_train), 100 * running_correct / len(data_train),
        100 * testing_correct / len(data_test)))
# 保存模型
torch.save(cnn_model, 'data/cnn_model.pt')
# 加载模型
cnn_model=torch.load('data/cnn_model.pt')
cnn_model.eval()

Results are as follows:

三、CIFAR10图像分类

1、CIFAR10数据集介绍

CIFAR-10 是由 Hinton 的学生 Alex Krizhevsky 和 Ilya Sutskever 整理的一个用于识别普适物体的小型数据集。一共包含 10 个类别的 RGB 彩色图片:飞机( airplane )、汽车( automobile )、鸟类( bird )、猫( cat )、鹿( deer )、狗( dog )、蛙类( frog )、马( horse )、船( ship )和卡车( truck )。图片的尺寸为 32×32 ,数据集中一共有 50000 张训练图片和 10000 张测试图片。与 MNIST 数据集相比, CIFAR-10 具有以下不同点:

  • CIFAR-10 是 3 通道的彩色 RGB 图像,而 MNIST 是灰度图像。

  • CIFAR-10 的图片尺寸为 32×32, 而 MNIST 的图片尺寸为 28×28,比 MNIST 稍大。

  • 相比于手写字符, CIFAR-10 含有的是现实世界中真实的物体,不仅噪声很大,而且物体的比例、 特征都不尽相同,这为识别带来很大困难。

2、CIFAR10图像分类实现

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
#参数设置
num_epochs = 15
batch_size = 64
learning_rate = 0.001
# 构建CNN模型
class CNNNet(nn.Module):
    def __init__(self):
        super(CNNNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(64, 128, 5)
        self.fc1 = nn.Linear(128* 5 * 5, 1024)
        self.fc2 = nn.Linear(1024, 84)
        self.fc3 = nn.Linear(84, 10)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 128 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
# 图片显示
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()
# torchvision 数据集的输出是范围在[0,1]之间的 PILImage,我们将他们转换成归一化范围为[-1,1]之间的张量Tensors
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# 获取CIFAR10训练集和测试集
trainset=torchvision.datasets.CIFAR10(root='data/',train=True,download=True,transform=transform)
testset=torchvision.datasets.CIFAR10(root='data/',train=False,download=True,transform=transform)
# CIFAR10训练集和测试集装载
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,shuffle=False, num_workers=0)
# 图片类别
classes = ('plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# 图片显示
images,labels=next(iter(trainloader))
imshow(torchvision.utils.make_grid(images))

# 定义损失函数和优化器
cnn_model=CNNNet()
criterion=nn.CrossEntropyLoss()
optimizer=optim.SGD(cnn_model.parameters(),lr=learning_rate,momentum=0.9)

# 训练模型
for epoch in range(num_epochs):
    running_loss=0.00
    running_correct=0.0
    print("Epoch  {}/{}".format(epoch, num_epochs))
    for i,data in enumerate(trainloader,0):
        inputs,labels=data
        optimizer.zero_grad()
        outputs=cnn_model(inputs)
        loss=criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        running_loss+=loss.item()
        _, pred = torch.max(outputs.data, 1)
        running_correct += torch.sum(pred == labels.data)
    print("Loss is :{:.4f},Train Accuracy is:{:.4f}%".format(running_loss / len(trainset), 100 * running_correct / len(trainset)))
# 保存训练好的模型
torch.save(cnn_model, 'data/cnn_model.pt')

# 加载训练好的模型
cnn_model=torch.load('data/cnn_model.pt')
cnn_model.eval()
#使用测试集对模型进行评估
correct=0.0
total=0.0
with torch.no_grad():   # 为了使下面的计算图不占用内存
    for data in testloader:
        images, labels = data
        outputs = cnn_model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    print("Test Average accuracy is:{:.4f}%".format(100 * correct / total))

# 求出每个类别的准确率
class_correct=list(0. for i in range(10))
class_total=list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images,labels=data
        outputs=cnn_model(images)
        _,predicted=torch.max(outputs,1)
        c=(predicted==labels).squeeze()
        try:
            for i in range(batch_size):
                label = labels[i]
                class_correct[label] += c[i].item()
                class_total[label] += 1
        except IndexError:
            continue
for i in range(10):
    print('Accuracy of %5s : %4f %%' % (classes[i], 100 * class_correct[i] / class_total[i]))

图片显示结果:

模型训练结果:

测试集平均准确率和每个类别的准确率:

3、在GPU上跑神经网络

device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
# 递归地遍历所有模块,并将它们的参数和缓冲器转换为CUDA张量
cnn_model.to(device)
# 必须在每一个步骤向GPU发送输入和目标
inputs,labels=inputs.to(device),labels.to(device)

 

发布了118 篇原创文章 · 获赞 608 · 访问量 53万+

Guess you like

Origin blog.csdn.net/asialee_bird/article/details/103978166