Deep learning and computer vision practice learning (4) The simplest picture classification - handwritten digit recognition (based on MXNet)

Goal: Realize handwritten digit recognition on MXNet, and then understand the basic steps of using MXNet for image classification tasks.

 

1. Create Image Recordio data

Corresponding to using LMDB to save a large amount of data in Caffe, MXNet uses Image Recordio for the realization of a large amount of data IO. This is an efficient and easy-to-distributed data storage method developed by DMLC. Like LMDB, it is also based on memory mapping. (Memory Map). Because the storage code on the hard disk can be a compressed format such as JPG, it has a great advantage in terms of space occupation compared with the way of storing cv::Mat in Caffe with LMDB class, and the conversion speed is not slow, which is better. Directly read the ImageDataLayer of the image through Caffe. Similar to Caffe, to create this format, the first step is to create a list of file paths and tags. The format is as follows:

0    5    mnist/train/000000_5.jpg
1    0    mnist/train/000001_0.jpg
2    4    mnist/train/000002_4.jpg
3    1    mnist/train/000003_1.jpg
...

Each line is 3 fields separated by a tab, the first field is an integer number, the second field is a label, and the third field is a file name or file path. Note that although the numbering in the example is sequential, it doesn't really matter.

The code to generate a list from the generated MNIST image is as follows:

import os
import sys
input_path = sys.argv[1].rstrip(os.sep)
output_path = sys.argv[2]
filenames = os.listdir(input_path)
with open(output_path, 'w') as f:
    for i, filename in enumerate(filenames):
         filepath = os.sep.join([input_path, filename]):
         label = filename[:filename.rfind('.')].split('_')[1]
         line = '{}\t{}\t{}\n'.format(i, label, filepath)
         f.write(line)

Save this code as gen_mxnet_imglist.py, and then execute the following commands in sequence:

>> python gen_mxnet_imglist.py mnist/train train.lst
>> python gen_mxnet_imglist.py mnist/val val.lst
>> python gen_mxnet_imglist.py mnist/test test.lst

Next, in the second step, you can use the official tool mxnet/bin/im2rec of MXNet for data conversion, execute the following command:

>> /path/to/mxnet/bin/im2rec train.lst ./ train.rec color=0
>> /path/to/mxnet/bin/im2rec val.lst ./ val.rec color=0
>> /path/to/mxnet/bin/im2rec test.lst ./ test.rec color=0

What needs to be mentioned is that in the list file, the second field can contain multiple labels. If this is the case, you need to specify the label_width parameter as the number of labels when executing imrec. For the meaning of more parameters, please refer to http://mxnet.io/zh/api/python/io.html.

Version 0.9 of MXNet has launched the image module, which provides a more flexible interface ImageIter implemented by python, which can be learned on the github homepage of MXNet.

2. Use Module module to train LeNet-5

For the use of the Model module with the simplest interface in MXNet, you can refer to Deep Learning and Computer Vision Practical Learning (1) - Implementing a Neural Network with MXNet_Fan0920's Column - CSDN Blog (Introducing the basic use of MXNet and implementing a neural network using MXNet network).

Learn more about the more flexible Module module below. According to the structure of the LeNet-5 version in Caffe, the code to define the network and encapsulate it with the Module module is as follows:

import mxnet as mx
data = mx.symbol.Variable('data')
conv1= mx.symbol.Convolution(data=data, kernel = (5,5), num_filter=20)
pool1 = mx.symbol.Pooling(data=conv1, pool_type = "max", kernel=(2,2), stride=(2,2))
conv2 = mx.symbol.Convolution(data=pool1, kernel = (5,5), num_filter=50)
pool2 = mx.symbol.Pooling(data=conv2, pool_type = "max", kernel=(2,2), stride=(2,2))
flatten = mx.symbol.Flatten(data=pool2)
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=500)
relu1 = mx.symbol.Activation(data=fc1, act_type="relu")
fc2 = mx.symbol.FullyConnected(data=relu1, num_hidden=10)
lenet5 = mx.symbol.SoftmaxOutput(data=fc2, name='softmax')
mod = mx.mod.Module(lenet5, context=mx.gpu(0))

With the model in place, let’s define the data. In MXNet, the data iterator is much more powerful than Caffe's Data Layer, and provides a richer data perturbation method. Here, random cropping and random rotation are used. It should be noted that the built-in random cropping of MXNet is square cropping, and the random rotation is the original size rotation, that is to say, there will be areas without images, which we fill with black. The corresponding code is as follows,

train_dataiter = mx.io.ImageRecordIter(
    path_imgrec ="../data/train.rec",
    data_shape = (1, 28, 28),
    batch_size = 50,
    mean_r = 128,
    scale = 0.00390625,
    rand_crop = True,
    min_crop_size = 24,
    max_crop_size = 28,
    max_rotate_angle = 15,
    fill_value = 0
)
val_dataiter = mx.io.ImageReocordIter(
    path_imgrec = "../data/val.rec",
    data_shape = (1, 28, 28),
    data_size = 100,
    mean_r = 128,
    scale = 0.00390625,
)

The two major elements of the model and data are ready, and the training of the model can be started as follows:

import logging
logging.getLogger().setLevel(logging.DEBUG)
fh = logging.FileHandler('train_mnist_lenet.log')
logging.getLogger().addHandler(fh)
lr_scheduler = mx.lr_scheduler.FactorScheduler(1000,factor=0.95)
optimizer_params = {
    'learning_rate': 0.01
    'momentum': 0.9,
    'wd': 0.0005,
    'lr_scheduler': lr_scheduler
    }
checkpoint = mx.callback.do_checkpoint('mnist_lenet', period=5)
mod.fit(train_dataiter,
        eval_data=val_dataiter,
        optimizer_params=optimizer_params,
        num_epoch=36,
        epoch_end_callback=checkpoint)

Put all the above codes together, save them in train_lenet5.py, and execute them,

>> python train_lenet5.py

This starts the training. After the training, a mnist_lenet-symbol.json file will be output, which is the description file of the model structure. According to the setting, the model parameter archive is saved every 5 iterations, and the naming format is mnist_lenet-[training algebra].params. Of course, there is also the output log, examples are as follows, mainly including training accuracy, verification set accuracy, learning rate changes and saved model information.

MXNet does not have a tool for drawing training curves in Caffe, but there is a mxnet/example/kaggle-ndsb1/training_curves.py in the built-in example, which is useful for training log files that only output accuracy:

>> python /path/to/mxnet/example/kaggle-ndsb1/training_curves.py --log-file=train_mnist_lenet.log

Run the program to get the visualization result as shown in the figure:

3. Testing and Evaluation

Test model accuracy: MXNet is very simple to evaluate on the test set. Read the trained model into a Module, load the test data into an ImageRecordIter, and then call the score() function of the Module.

import mxnet as mx
test_dataiter = mx.io.ImageRecordIter(
        path_imgrec="../data/test.rec"
        data_shape=(1,28,28),
        batch_size=100,
        mean_r=128,
        scale=0.00390625,
)
mod = mx.mod.Module.load('mnist_lenet', 35, context=mx.gpu(0))
mod.fit(..., begin_epoch=35)
mod.bind(
        data_shapes=test_dataiter.provide_data,
        label_shapes=test_dataiter.provide_label,
        for_traning=False)
metric = mx.metric.create('acc')
mod.score(test_dataiter, metric)
for name, val in metric.get_name_value():
    print('{}={:.2f}%'.format(name, val*1000))

Save and execute the above code to get the evaluation results on the test machine.

Evaluate model performance: Here is a rough method to evaluate forward calculation performance, which is to use python's time module to iterate a certain number of times to calculate the total time, and then obtain the time consumed for each forward calculation.

Guess you like

Origin blog.csdn.net/Fan0920/article/details/107716910