Deep learning and computer vision practical learning (3) The simplest image classification - handwritten digit recognition (based on Caffe and LeNet-5)

The simplest image classification - handwritten digit recognition

LeNet-5 is trained on the MNIST data set to recognize handwritten digits - Hello World in image classification

1. Prepare data - MNIST

In most framework examples, examples of training LeNet-5 with MNIST are highly encapsulated by scripts. You only need to execute the script to complete the process from downloading data to training. For example, in MXNet, just go to mxnet/example and execute train_mnist.py. There is also a similar shell script in Caffe.

Then this is not conducive to beginners understanding what is going on. This article separates the data preparation part, details each training to a picture, and then goes through the process completely from scratch. Understand this process, and you basically understand how to start from image data to train a model for classification.

You can download it directly with wget on Linux:

>> wget http://deeplearning.net/data/mnist/mnist.pkl.gz

The downloaded compressed file mnist.pkl.gz is actually the data training set, verification set and test set. The files exported by pickle are compressed into gzip format, so the gzip module in python can be used as a file to read. Each data set is a tuple, and the first element stores a handwritten digital image, indicating that each image is a one-dimensional floating-point numpy array with a length of 28*28=784, and this array is a single-channel grayscale image Expanded by rows, the maximum value is 1, representing the white part, and the minimum value is 0, representing the black part. The second element in the tuple is the label corresponding to the picture, which is a one-dimensional numpy array of integers, corresponding to the numbers in the picture according to the subscript position. Based on the above, the code to convert the data set into images is as follows,

import os
import pickle, gzip
from matplotlib import pyplot
print('Loading data from mnist.pkl.gz ...')
with gzip.open('mnist.pkl.gz', 'rb') as f:
  train_set, valid_set, test_set = pickle.load(f)
imgs_dir = 'mnist'
os.system('mkdir -p {}'.format(imgs_dir))
datasets = {'train': train_set, 'val': valid_set, 'test': test_set}
for dataname, dataset in datasets.items():
  print('Converting {} dataset ...'.format(data_dir))
  for i, (img, label) in enumerate(zip(*dataset)):
    filename = '{:0>6d}_{}.jpg'.format(i, label)
    filepath = os.sep.join([data_dir, filename])
    img = img.reshape((28, 28))
    pyplot.imsave(filepath, img, cmap='gray')
    if(i+1) % 10000 == 0:
       print('{} images converted!'.format(i+1))

This script first creates a folder called mnist, and then creates three subfolders train, val, and test under mnist, which contain training images, verification images, and test images respectively, which are used to save the corresponding three data sets after conversion. the resulting picture. The naming rule of each file is that the first field is the serial number, the second field is the value of the number, and it is saved in JPG format.

2. Train a model for handwritten digit recognition based on Caffe and LeNet-5, and evaluate and test the model

(1) Make LMDB

If it is implemented based on Caffe, you need to create LMDB data first. LMDB is the most commonly used database format in Caffe, and its full name is Lightning Memory-Mapped Database (lightning-fast memory-mapped database). In addition to being fast, LMDB also supports multiple programs to read data at the same time, which is the advantage of LevelDB, which was earlier supported by Caffe. Now LMDB is almost the most commonly used data format for Caffe to train pictures.

Caffe provides an official tool for converting images to LMDB for image classification tasks, the path is caffe/build/tools/convert_imageset. To use this tool, the first step is to generate a list of image file paths. Each line is the file path and the corresponding label (subscript), separated by the space key or tab character (Tab).

Convert the image paths and corresponding labels in the three folders generated under the MNIST folder previously, train, val and test into the above format. The code is as follows:

import os
import sys
input_path = sys.argv[1].rstrip(os.sep)
output_path = sys.argv[2]
filenames = os.listdir(input_path)
with open(output_path, 'w') as f:
    for filename in filenames:
        filepath = os.sep.join([input_path, filename])
        label = filename[:filename.rfind('.')].split('_')[1]
        line = '{} {}\n'.format(filepath, label)
        f.write(line)

Save this file as gen_caffe_imglist.py, and then execute the following commands in sequence:

>> python gen_caffe_imglist.py mnist/train train.txt

>> python gen_caffe_imglist.py mnist/val val.txt

>> python gen_caffe_imglist.py mnist/test test.txt

This generates file lists and corresponding labels for the three data sets. Then call convert_imageset directly to create lmdb.

>> /path/to/caffe/build/tools/convert_imageset ./ train.txt train_lmdb --gray --shuffle

>> /path/to/caffe/build/tools/convert_imageset ./ val.txt train_lmdb --gray --shuffle

>> /path/to/caffe/build/tools/convert_imageset ./ test.txt train_lmdb --gray --shuffle

Among them, --gray is an option for single-channel reading of grayscale images, and --shuffle is a commonly used option to disrupt the order of the file list, but it is dispensable in this example, because the original order is disordered. Executing this tool is to read the image as Mat of opencv and then save it to lmdb. For more usage of convert_imageset, you can execute the following command or refer to the source code:

>> /path/to/caffe/build/tools/convert_imageset -h

(2) Training LeNet-5

There is no difference from the version of Caffe's official example, except that the input data layer has become a self-made LMDB. The lenet_train_val.prototxt used to describe the data source and network structure is as follows,

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  include {
    phase: TRAIN
  }

  transform_param {
    mean_value: 128
    scale: 0.00390625
  }
  data_param {
    source: "../data/train_lmdb"
    batch_size: 50
    backend: LMDB
  }
}

layer {
  name: "mnist"
  type: "Data"
  top: "data"
  include {
    phase: TEST
  }

  transform_param {
    mean_value: 128
    scale: 0.00390625
  }
  data_param {
    source: "../data/val_lmdb"
    batch_size: 100
    backend: LMDB
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "relu1"
}

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer{
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

The parameters of the data layer, the function of specifying the mean value and scaling ratio is to subtract the mean_value from the data and then multiply it by the scale. Specifically, for the mnist image, it is to scale the value between 0~255 to -0.5~0.5 to help convergence; the convolution kernel The learning rate of the basic learning rate is multiplied by lr_mult, and the biased learning rate is multiplied by the basic learning rate of lr_mult; weight_filler is used to initialize parameters, and xavier is an initialization method derived from the 2010 paper "Understanding the difficulty of training" by the Bengio group deep feedforward neural networks"; the convolutional layer is followed by the Pooling layer; the ReLU unit has a better convergence effect than the original Sigmoid; the Accuracy layer is only used in the verification/testing phase to calculate the accuracy of the classification.

In addition to the network structure and data, you also need to configure a lenet_solver.prototxt.

net: "lenet_train_val.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 36000
snapshot: 5000
snapshot_prefix: "mnist_lenet"
solver_mode: GPU

For more details on Solver, please refer to the Caffe official website http://caffe.berkeleyvision.org/tutorial/solver.html

Next, you can call the following command for training:

>> /path/to/caffe/build/tools/caffe train -solver lenet_solver.prototxt -gpu 0 -log_dir ./

Or parameter commands starting with double dashes,

>> /path/to/caffe/build/tools/caffe train --solver=lenet_solver.prototxt --gpu=0 --log_dir=./

However, the second method cannot use the terminal’s automatic completion, so it is not as convenient as the first method!

Among them, the gpu parameter specifies which GPU to use for training (if there are multiple blocks, such as a multi-card GPU server), if necessary, you can use the -gpu all parameter to train all cards. The log_dir parameter specifies the path to the output log file, provided that this path must exist in advance. You will see the print after executing the command.

 

Note that because the TEST data layer is specified, the accuracy and loss of the current model on val_lmdb will be output in the output according to the interval specified in the solver. After training, several files ending with caffemodel and solverstate will be generated. This is the archive of model parameters and solver state at the specified number of iterations and the end of training. The name prefix is ​​the prefix specified in lenet_solver.prototxt. Of course, a log file is also generated at the same time, and the name is:

caffe.[Host Name].[Domain Name].[User Name].log.INFO.[Year, Month, Day]-[Hours, Minutes, Seconds].[Microseconds]

Caffe officially also provides tools for visualizing log files. There is plot_training_log.py.example under caffe\tools\extra. Copy this file and name it plot_training_log.py, which can be used to draw pictures. The input parameters of this script are: , the type of graph, the path to generate the image and the path to the log.

Among them, the input and corresponding types of image types are as follows:

0: Test accuracy vs. number of iterations

1: Test accuracy vs. training time (seconds)

2: Test loss vs. number of iterations

3: Test accuracy vs. number of iterations

4: Test accuracy vs. training time (seconds)

5: Test loss vs. number of iterations

6: Test accuracy vs. training time (seconds)

7: Test loss vs. number of iterations

In addition, the script log file must end with .log. We use the mv command to change the log file name to mnist_train.log, for example, to see how the test accuracy and test loss change with the number of iterations, and execute them sequentially.

>> python plot_training_log.py 0 test_acc_vs_iters.png mnist_train.log

>> python plot_training_log.py 2 test_loss_vs_iters.png mnist_train.log

 

(3) Testing and evaluation

Test model accuracy

After training the model, it needs to be tested and evaluated. In fact, during the training process, the accuracy of the model has been evaluated on val_lmdb every 500 iterations. However, MNIST also has a test set in addition to the verification set. The data is evaluated based on the test set.

Evaluate model performance

Generally speaking, the main evaluation is speed and memory usage.

(4) Recognize handwritten digits

With the trained model, it can be used to recognize handwritten digits. We test with the pictures of the test dataset and the list generated before.

 

(5) Add translational and rotational disturbances

 

Doing perturbation directly on the basis of samples to increase data is only one of the methods of data increase, and it is not a good solution, because the amount of added data is limited, and it also takes up extra hard disk space of the original samples. The best way is to perturb the data in real time during training, which is equivalent to infinite random perturbations. In fact, Caffe's data layer already comes with the most basic data perturbation function, but it is limited to random cropping and random mirroring, which is not very useful. There are some open source third-party implementations of real-time perturbation Caffe layers on Github, which include various common data perturbation methods. You only need to search for caffe augmentation in the Github search box to find many.

 

(The missing parts will be added later)

Guess you like

Origin blog.csdn.net/Fan0920/article/details/107562357