Handwritten digit recognition based on pytorch framework (mnist data set)

Handwritten digit recognition

Some time ago I started to learn pytorch, I learned a little pytorch syntax, I found the pytorch entry to write the code of CNN on the Internet, I tried to read and add comments. Learn more about pytorch, the code comments are fairly clear. Before reading the code, you can see that the knowledge I have learned is all the statements that I will not encounter in the code. Most of the operations are torch reading data. Read this first to help read the code.

Knowledge gained:

1.torch.maual_seed()

In the neural network, it is like all the parameters in the BP neural network are random. We use this method to generate random numbers. These random numbers are fixed and can be reproduced.

import torch
torch.manual_seed(1)
print(torch.rand(2))

Perform two consecutive times:
1.
Insert picture description here2.Insert picture description here

import torch
#torch.manual_seed(1)
print(torch.rand(2))

Insert picture description here
2.Insert picture description here

2.train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)

Insert picture description here
I do not know what it means beginning to see is torch.utils.data later the abbreviation
in the blog have read this after
I learned to read data pytorch torch.utlis.data.DataLoader is an important interface model training data pytorch Most of this interface will be used to read data, the source code of the interface (also from the blog post I watched):

torch.utils.data.DataLoader(
                            dataset,#数据加载
                            batch_size = 1,#批处理大小设置
                            shuffle = False,#是否进项洗牌操作
                            sampler = None,#指定数据加载中使用的索引/键的序列
                            batch_sampler = None,#和sampler类似
                            num_workers = 0,#是否进行多进程加载数据设置
                            collat​​e_fn = None,#是否合并样本列表以形成一小批Tensor
                            pin_memory = False,#如果True,数据加载器会在返回之前将Tensors复制到CUDA固定内存
                            drop_last = False,#True如果数据集大小不能被批处理大小整除,则设置为删除最后一个不完整的批处理。
                            timeout = 0,#如果为正,则为从工作人员收集批处理的超时值
                            worker_init_fn = None )

The purpose of the interface is set according to the data you need bitch_size (batch size setting), shuffle (whether or shuffle) Tensor data encapsulated in a Batch Size size, for subsequent training.

3.torchvision.datasets.MNIST(root='./mnist/', train=False)

Then I went to Baidu again and read a blog and the same author who read torch.utils.data.DataLoader above. I read the blog address and
read his blog. I learned that there is a very important and easy to use in the PyTorch framework Package: torchvision, the package is mainly composed of three sub-packages, namely: torchvision.datasets, torchvision.models, torchvision.transforms. torchvision.datasets. The torchvision.datasets package contains some commonly used data sets such as MNIST, FakeData, COCO, LSUN, ImageFolder, DatasetFolder, ImageNet, CIFAR, etc., and provides some important parameter settings for data set settings, which can be done by simple data set settings. Set call.

  torchvision.datasets.MNIST(root,train = True,transform = None,target_transform = None,download = False )
#  参数介绍:
 #root(string) - 数据集的根目录在哪里MNIST/processed/training.pt 和 MNIST/processed/test.pt存在。
 # train(bool,optional) - 如果为True,则创建数据集training.pt,否则创建数据集test.pt。
 #download(bool,optional) - 如果为true,则从Internet下载数据集并将其放在根目录中。如果已下载数据集,则不会再次下载。
 # transform(callable ,optional) - 一个函数/转换,它接收PIL图像并返回转换后的版本。例如,transforms.RandomCrop
 # target_transform(callable ,optional) - 接收目标并对其进行转换的函数/转换。

Then what I mean by this code is to use mnist in the root directory as the training set

4.torch.unsqueeze

According to the English words, I also judged whether this operation is not compressing, that is, increasing the dimension.
There is an interpretation of the operation of increasing the dimension and compressing.
It means that if the dimension of the position I specified is 1, I delete this dimension, or Add one dimension to the specified dimension

5.nn.Conv2d (two-dimensional convolution in pytorch)

Generally nn.Conv1d one-dimensional convolution is used for text processing, and two-dimensional is mostly used for image processing
nn.Conv2d (16, 32, 5, 1, 2),

Insert picture description here
W2 = (28 + 2X2-5) / 1 + 1 = 28
H2 = (28 + 2X2-5) / 1 + 1 = 28 The
output picture is still 28X28,
some comments about convolution in the code have
Insert picture description here
just started, I have always understood No, some of the parameters in this code will return 2 parameters.
Insert picture description here
In fact, this code contains data visualization. The output of CNN neural network will also output two data. One is the real value and the other is used for data visualization (At this stage, I want to understand pytorch in depth, so I don't care about data visualization)

Code part ():

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt
 
torch.manual_seed(1)    # reproducible 可以复现的
 
# Hyper Parameters
EPOCH = 1               # 为节约时间我们只训练一轮
BATCH_SIZE = 50
LR = 0.001              # 谁知学习率
DOWNLOAD_MNIST = True   # 如果数据集你已经下载好了就设置为False
 
 
# Mnist digits dataset
train_data = torchvision.datasets.MNIST(
    root='./mnist/',
    train=True,                                     # this is training data
    transform=torchvision.transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to
                                                    # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
    download=DOWNLOAD_MNIST,                        # download it if you don't have it
)
 
# plot one example
print(train_data.train_data.size())                 # (60000, 28, 28)
print(train_data.train_labels.size())               # (60000)
plt.imshow(train_data.train_data[0].numpy(), cmap='gray')#打印出数据集中第一个手写数字的灰度图片 
plt.title('%i' % train_data.train_labels[0])

plt.show()

 
# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
#数据加载器,便于训练中的小批量返回,图像批量形状为(50, 1, 28, 28)28X28是这个灰度图片的像素,封装成(50,1,28,28)这样一个tensor
# shuffle=True 对数据进行洗牌
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
 
# convert test data into Variable, pick 2000 samples to speed up testing
#将测试数据转换为变量,选取2000个样本加速测试
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)#--->测试数据

#traindata将原来的[0-255]压缩到了[0.0, 1.0]这个区间,所以test data也要/255做同样处理
test_x = Variable(torch.unsqueeze(test_data.test_data, dim=1), volatile=True).type(torch.FloatTensor)[:2000]/255.   
# shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
#为了节省时间只取了2000个测试 
test_y = test_data.test_labels[:2000]
 
 
class CNN(nn.Module):#继承了nn.Model
    def __init__(self):
        #继承了 CNN 父类的的属性
        super(CNN, self).__init__()#用父类的初始化方式来初始化所继承的来自父类的属性
       #按照网络的前后顺序定义1号网络
        self.conv1 = nn.Sequential(         # input shape (1, 28, 28)
            nn.Conv2d(#这里的nn.Conv2d使用一个2维度卷积  
                in_channels=1,      #in_channels:在文本应用中,即为词向量的维度           
                out_channels=16,    #  out_channels:卷积产生的通道数,有多少个out_channels,就需要多少个一维卷积(也就是卷积核的数量)       
                kernel_size=5,      #kernel_size:卷积核的尺寸;卷积核的第二个维度由in_channels决定,所以实际上卷积核的大小为kernel_size * in_channels         
                stride=1,           #步长,每次移动的单位格子     
                padding=2,          #padding:对输入的每一条边,补充0的层数         
            ),                              # output shape (16, 28, 28)
            nn.ReLU(),    #激活函数ReLU   # activation
            #在2X2的池化层里选出最大值
            nn.MaxPool2d(kernel_size=2),    # choose max value in 2x2 area, output shape (16, 14, 14)
        )
        #按照网络的前后顺序定义2号网络,
        self.conv2 = nn.Sequential(  # input shape (1, 28, 28)
            #使用一个二维卷积
            nn.Conv2d(16, 32, 5, 1, 2),     # output shape (32, 14, 14)
            nn.ReLU(),                      # activation
            nn.MaxPool2d(2),                # output shape (32, 7, 7)
        )
        #全连接层
        self.out = nn.Linear(32 * 7 * 7, 10)   #因为在pytorch中做全连接的输入输出都是二维张量,不同于卷积层要求输入4维张量
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
        output = self.out(x)
        #output是我们的真实值,而x是用于做数据可视化的参数
        return output, x    # return x for visualization
 
#把CNN神经网络类实例化
cnn = CNN()
print(cnn)  # net architecture
#设置一个优化器
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)   # optimize all cnn parameters
#设置损失函数
loss_func = nn.CrossEntropyLoss()  # the target label is not one-hotted


#这些是做数据可视化的,先不做深入了解
# following function (plot_with_labels) is for visualization, can be ignored if not interested
from matplotlib import cm
try: from sklearn.manifold import TSNE; HAS_SK = True
except: HAS_SK = False; print('Please install sklearn for layer visualization')
def plot_with_labels(lowDWeights, labels):
    plt.cla()
    X, Y = lowDWeights[:, 0], lowDWeights[:, 1]
    for x, y, s in zip(X, Y, labels):
        c = cm.rainbow(int(255 * s / 9)); plt.text(x, y, s, backgroundcolor=c, fontsize=9)
    plt.xlim(X.min(), X.max()); plt.ylim(Y.min(), Y.max()); plt.title('Visualize last layer'); plt.show(); plt.pause(0.01)
 
plt.ion()


# training and testing
for epoch in range(EPOCH):#节约时间只循环一次也就是训练一轮
    #当我们迭代我们的数据集的时候批量处理数据
    #step 是循环的次数, (x,y) 对应每一个train_loader数据中的target和真实值
    for step, (x, y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loader
        #Variable 类型可以对该类型的数据自动求导
        b_x = Variable(x)   # batch x
        b_y = Variable(y)   # batch y
        #选取从网络里返回第一个数据作为我们的 真实值
        output = cnn(b_x)[0]               # cnn output
        #在损失函数中 (target(这里也就是这个手写数字是几)-真实值)得到误差
        loss = loss_func(output, b_y)   # cross entropy loss
        #清空这一轮数据的 梯度,不然数据会影响下轮迭代训练 #在每次循环中清零 grad避免累加
        optimizer.zero_grad()           # clear gradients for this training step
        #反向传播计算梯度
        loss.backward()                 # backpropagation, compute gradients
        #optimizer的执行step更新指令,更新model的每一个parameter
        optimizer.step()                # apply gradients
        #每训练五十个数据,我们就用当前训练出的模型来预测
        if step % 50 == 0:
            #用测试集数据来测试,得到两个返回值一个是用于做数据可视化的一个是真实值
            test_output, last_layer = cnn(test_x)
            #在返回真实值选择最大的一个作为预测值
            pred_y = torch.max(test_output, 1)[1].data.squeeze()
            #这里 test_y就是灰度图片本身的标签   准确度就是所有的 预测值==标签(测试集预测成功的次数)/测试集所有的数据数量,也就是预测成功率
            accuracy = sum(pred_y == test_y) / float(test_y.size(0))
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.item(), '| test accuracy: %.2f' % accuracy)
            #下面是做数据可视化,现阶段主要了解pytorch框架
            if HAS_SK:
                # Visualization of trained flatten layer (T-SNE)
                tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
                plot_only = 500
                low_dim_embs = tsne.fit_transform(last_layer.data.numpy()[:plot_only, :])
                labels = test_y.numpy()[:plot_only]
                plot_with_labels(low_dim_embs, labels)
plt.ioff()
 
# print 10 predictions from test data
#输出前十组预测值和真实值, 对比一下预测效果
test_output, _ = cnn(test_x[:10])
pred_y = torch.max(test_output, 1)[1].data.numpy().squeeze()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')

Run screenshot (I ran on jupter):

Insert picture description here
Insert picture description here
Insert picture description here
This is to use the trained model to predict the first ten handwritten digits, the first line is the predicted value, the second line is the label, and the first ten are recognized! It was quite successful.

Guess you like

Origin www.cnblogs.com/Eldq/p/12758073.html