Interesting Convolutional Neural Networks

I. Introduction

    I have been studying deep learning recently, and thinking of what I learned before, I lament that mathematics is a simple and magical science. F=G*m1*m2/r² gravitational force describes the law of the universe's galaxies, E=mc² describes the mystery of star luminescence, V=H*d Hubble's law describes the mystery of the expansion of the universe, most phenomena and laws of nature All can be described by mathematical functions, that is, a function can be obtained.

    Neural network ( "Simple and Complex Artificial Neural Network" ) can approximate any continuous function, then neural network has infinite generalization ability. For most classification problems, the essence is to obtain a function y=f(x), for example: for image recognition, it is to obtain a function y=F (pixel tensor) with pixel tensor as an independent variable, Where y=cat, dog, flower, car, etc.; for text sentiment analysis, it is to obtain a function y=F (word vector) with word vector or paragraph vector as the independent variable, where y=positive, negative etc……

2. Introduction

    This blog includes the following:

    1. The principle of convolutional neural network

    2. Based on dl4j training a convolutional neural network for handwritten digit recognition

3. Principle of Convolutional Neural Network

    There is a 9*9 grid on the left below. The red filled part constitutes the number 7. Fill the red part with 1 and the blank part with 0 to form a two-dimensional matrix. The traditional method can be used to find the vector distance. If All the numbers are written in the same position in the grid as standard, so they must be accurate. However, in fact, the number 7 may be a little left, a little right, deformed and distorted during the writing process, and it is difficult to identify at this time. . In addition, the number of pixels in a picture is huge. For example, a 50*50 picture will have 2500 pixels, and each pixel has colors in three dimensions: R, G, and B. Then the number of input parameters There are 7500 numbers, which is a huge amount of computation.

                  

    Then there is a need to have an abstract feature and a method to reduce the data dimension. This refers to the convolution operation, which uses a convolution kernel smaller than the image to scan the entire image to obtain the dot product. The process of convolution is shown in the figure below. The picture comes from https://my.oschina.net/u/876354/blog/1620906

    

    The process of convolution operation is to find the salient features in the picture and achieve the purpose of dimensionality reduction. The whole process is equivalent to one function sweeping another function, and the overlapping part of the integral of the two functions does not change the characteristic shape of the picture when sweeping. The dimension can be reduced, and the features can be extracted by partitioning and splicing the features.

convgaus

    In order to further reduce the dimension, pooling is introduced. There are many ways of pooling, such as maximum value and average value. The figure below shows a 2*2 maximum pooling process with a step size of 2. A 2*2 block is scanned to find the Max. A total of 4 scans are scanned. The maximum values ​​of the 4 scans are 6, 8, 3, 4.

maxpool

    Finally, after multi-layer convolution and pooling, a matrix will be obtained. This matrix is ​​used as the input of a fully connected network. After approximating a function, the number is recognized. The 6, 8, 3, and 4 obtained in the above figure are For example, a fully connected network finds a function.

Fourth, deeplearning4j handwriting recognition

    1. First download the mnist dataset, the address is as follows:

       http://github.com/myleott/mnist_png/raw/master/mnist_png.tar.gz

    2. Unzip (I unzip it on the E drive)

    3. Train the network and evaluate (some of the more difficult parts are annotated)

public class MnistClassifier {

  private static final Logger log = LoggerFactory.getLogger(MnistClassifier.class);
  private static final String basePath = "E:";

  public static void main(String[] args) throws Exception {
    int height = 28;
    int width = 28;
    int channels = 1; // 这里有没有复杂的识别,没有分成红绿蓝三个通道
    int outputNum = 10; // 有十个数字,所以输出为10
    int batchSize = 54;//每次迭代取54张小批量来训练,可以查阅神经网络的mini batch相关优化,也就是小批量求平均梯度
    int nEpochs = 1;//整个样本集只训练一次
    int iterations = 1;

    int seed = 1234;
    Random randNumGen = new Random(seed);

    File trainData = new File(basePath + "/mnist_png/training");
    FileSplit trainSplit = new FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);
    ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator(); //以父级目录名作为分类的标签名
    ImageRecordReader trainRR = new ImageRecordReader(height, width, channels, labelMaker);//构造图片读取类
    trainRR.initialize(trainSplit);
    DataSetIterator trainIter = new RecordReaderDataSetIterator(trainRR, batchSize, 1, outputNum);

    // 把像素值区间 0-255 压缩到0-1 区间
    DataNormalization scaler = new ImagePreProcessingScaler(0, 1);
    scaler.fit(trainIter);
    trainIter.setPreProcessor(scaler);
    

    // 向量化测试集
    File testData = new File(basePath + "/mnist_png/testing");
    FileSplit testSplit = new FileSplit(testData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);
    ImageRecordReader testRR = new ImageRecordReader(height, width, channels, labelMaker);
    testRR.initialize(testSplit);
    DataSetIterator testIter = new RecordReaderDataSetIterator(testRR, batchSize, 1, outputNum);
    testIter.setPreProcessor(scaler); // same normalization for better results

    log.info("Network configuration and training...");
    Map<Integer, Double> lrSchedule = new HashMap<>();//设定动态改变学习速率的策略,key表示小批量迭代到几次
    lrSchedule.put(0, 0.06); 
    lrSchedule.put(200, 0.05);
    lrSchedule.put(600, 0.028);
    lrSchedule.put(800, 0.0060);
    lrSchedule.put(1000, 0.001);

    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
        .seed(seed)
        .iterations(iterations)
        .regularization(true).l2(0.0005)
        .learningRate(.01)
        .learningRateDecayPolicy(LearningRatePolicy.Schedule)
        .learningRateSchedule(lrSchedule) 
        .weightInit(WeightInit.XAVIER)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(Updater.NESTEROVS)
        .list()
        .layer(0, new ConvolutionLayer.Builder(5, 5)
            .nIn(channels)
            .stride(1, 1)
            .nOut(20)
            .activation(Activation.IDENTITY)
            .build())
        .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
            .kernelSize(2, 2)
            .stride(2, 2)
            .build())
        .layer(2, new ConvolutionLayer.Builder(5, 5)
            .stride(1, 1) 
            .nOut(50)
            .activation(Activation.IDENTITY)
            .build())
        .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
            .kernelSize(2, 2)
            .stride(2, 2)
            .build())
        .layer(4, new DenseLayer.Builder().activation(Activation.RELU)
            .nOut(500).build())
        .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
            .nOut(outputNum)
            .activation(Activation.SOFTMAX)
            .build())
        .setInputType(InputType.convolutionalFlat(28, 28, 1)) 
        .backprop(true).pretrain(false).build();

    MultiLayerNetwork net = new MultiLayerNetwork(conf);
    net.init();
    net.setListeners(new ScoreIterationListener(10));
    log.debug("Total num of params: {}", net.numParams());

    // 评估测试集
    for (int i = 0; i < nEpochs; i++) {
      net.fit(trainIter);
      Evaluation eval = net.evaluate(testIter);
      log.info(eval.stats());
      trainIter.reset();
      testIter.reset();
    }
    ModelSerializer.writeModel(net, new File(basePath + "/minist-model.zip"), true);//保存训练好的网络
  }
}

Run the main method and get the following evaluation results:

 # of classes:    10
 Accuracy:        0.9897
 Precision:       0.9897
 Recall:          0.9897
 F1 Score:        0.9896

    The overall effect is relatively good. Save the trained network and it can be used for handwriting data recognition. The next blog will introduce how to load the stereotyped network and develop a handwriting recognition application with springMVC.

 

 

 

    

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324402813&siteId=291194637