deeplearning4j - Convolutional Neural Network for Recognition of Captcha

I. Introduction  

  There has been no major breakthrough in computer vision for a long time. The emergence of convolutional neural networks has brought breakthroughs in this field. In this blog, we will take a look at the application of convolutional neural networks in image recognition through specific examples.

    Guided reading

    1. Problem description

    2. The idea of ​​solving the problem

    3. Implementation with DL4J

Second, the problem

    There is a set of pictures of the verification code as follows, the picture size is 60*160, the verification code consists of 5 numbers, the number range is 0 to 9, and the interference background is added to each verification code picture, the file name of the picture , indicating the number on the verification code, the sample picture is as follows:

    It is almost impossible to exhaustively enumerate the possibility of each picture, so it is impossible to solve this problem with traditional programming ideas, so the computer must be able to acquire the ability to recognize the verification code through self-learning. First let the computer look at a large number of pictures of the verification code, and tell the computer the results of these pictures, let the computer learn by itself, and slowly the computer will learn to recognize the verification code.

3. Solutions

    1. Features

    The shape of each number is different, and its own characteristics are obvious. The characteristics here actually refer to the different representations of the shape, such as the direction of the line, the degree of bending, etc., then for detecting the shape on the graph, the convolutional neural network plus Relu Sampling with Max can do this very accurately. The essential reason is that when you straighten the convolution kernel, what you do in essence is the dot product operation of vectors, which is to find the projection of one vector on another vector. For the principle of convolutional neural networks, you can see "Interesting Convolutional Neural Networks"

    2. Network structure design

    For each picture, there are 5 numbers as the output result, then a deep neural network with 5 outputs must be designed. First, the structure of multiple convolution kernels + Max sampling layer is used to extract obvious features, and finally the obtained features After the approximation of two fully connected layers, the fully connected layer is added here for two purposes. First, the value is compressed between 0 and 1 through the sigmoid function, which is convenient for softmax calculation. Second, the fully connected layer can be added to more abstract features. , which makes the approximation of the function easier. The final network structure is as follows:

    3. Tensor representation

    The representation of Label is represented by one-hot, which can be well matched with softmax. The following figure shows the digital representation from 0 to 9, along the direction of the row, from top to bottom, representing 0 to 9 respectively

    

    For the pixels on the picture, the value range is between 0 and 255. If the picture is color, there will actually be three channels, all of which are black and white. Therefore, there is only one channel, taking the value of the real pixel on the picture, Divide by 255 for normalization.

Fourth, code implementation

    1. Network structure

public static ComputationGraph createModel() {

        ComputationGraphConfiguration config = new NeuralNetConfiguration.Builder()
            .seed(seed)
            .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
            .l2(1e-3)
            .updater(new Adam(1e-3))
            .weightInit( WeightInit.XAVIER_UNIFORM)
            .graphBuilder()
            .addInputs("trainFeatures")
            .setInputTypes(InputType.convolutional(60, 160, 1))
            .setOutputs("out1", "out2", "out3", "out4", "out5", "out6")
            .addLayer("cnn1",  new ConvolutionLayer.Builder(new int[]{5, 5}, new int[]{1, 1}, new int[]{0, 0})
                .nIn(1).nOut(48).activation( Activation.RELU).build(), "trainFeatures")
            .addLayer("maxpool1",  new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2,2}, new int[]{2, 2}, new int[]{0, 0})
                .build(), "cnn1")
            .addLayer("cnn2",  new ConvolutionLayer.Builder(new int[]{5, 5}, new int[]{1, 1}, new int[]{0, 0})
                .nOut(64).activation( Activation.RELU).build(), "maxpool1")
            .addLayer("maxpool2",  new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2,1}, new int[]{2, 1}, new int[]{0, 0})
                .build(), "cnn2")
            .addLayer("cnn3",  new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{0, 0})
                .nOut(128).activation( Activation.RELU).build(), "maxpool2")
            .addLayer("maxpool3",  new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2,2}, new int[]{2, 2}, new int[]{0, 0})
                .build(), "cnn3")
            .addLayer("cnn4",  new ConvolutionLayer.Builder(new int[]{4, 4}, new int[]{1, 1}, new int[]{0, 0})
                .nOut(256).activation( Activation.RELU).build(), "maxpool3")
            .addLayer("maxpool4",  new SubsamplingLayer.Builder(PoolingType.MAX, new int[]{2,2}, new int[]{2, 2}, new int[]{0, 0})
                .build(), "cnn4")
            .addLayer("ffn0",  new DenseLayer.Builder().nOut(3072)
                .build(), "maxpool4")
            .addLayer("ffn1",  new DenseLayer.Builder().nOut(3072)
                .build(), "ffn0")
            .addLayer("out1", new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(10).activation(Activation.SOFTMAX).build(), "ffn1")
            .addLayer("out2", new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(10).activation(Activation.SOFTMAX).build(), "ffn1")
            .addLayer("out3", new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(10).activation(Activation.SOFTMAX).build(), "ffn1")
            .addLayer("out4", new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(10).activation(Activation.SOFTMAX).build(), "ffn1")
            .addLayer("out5", new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(10).activation(Activation.SOFTMAX).build(), "ffn1")
            .addLayer("out6", new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(10).activation(Activation.SOFTMAX).build(), "ffn1")
            .pretrain(false).backprop(true)
            .build();
        ComputationGraph model = new ComputationGraph(config);
        model.init();

        return model;
    }

    2. Training set construction

 public MultiDataSet convertDataSet(int num) throws Exception {
        int batchNumCount = 0;

        INDArray[] featuresMask = null;
        INDArray[] labelMask = null;

        List<MultiDataSet> multiDataSets = new ArrayList<>();

        while (batchNumCount != num && fileIterator.hasNext()) {
            File image = fileIterator.next();
            String imageName = image.getName().substring(0,image.getName().lastIndexOf('.'));
            String[] imageNames = imageName.split("");
            INDArray feature = asMatrix(image);
            INDArray[] features = new INDArray[]{feature};
            INDArray[] labels = new INDArray[6];

            Nd4j.getAffinityManager().ensureLocation(feature, AffinityManager.Location.DEVICE);
            if (imageName.length() < 6) {
                imageName = imageName + "0";
                imageNames = imageName.split("");
            }
            for (int i = 0; i < imageNames.length; i ++) {
                int digit = Integer.parseInt(imageNames[i]);
                labels[i] = Nd4j.zeros(1, 10).putScalar(new int[]{0, digit}, 1);
            }
            feature =  feature.muli(1.0/255.0);

            multiDataSets.add(new MultiDataSet(features, labels, featuresMask, labelMask));

            batchNumCount ++;
        }
        MultiDataSet result = MultiDataSet.merge(multiDataSets);
        return result;
    }

5. Postscript

    Using deeplearning4j to build a deep neural network, there is almost no redundant code, and it can solve a complex image recognition problem very elegantly. There are a few explanations for the above code:

    1. For the DenseLayer layer, the size of the network input is not set here. In fact, this set operation has been done inside dl4j

    2. For gradient update optimization, Adam is used here. Adam combines the factors of momentum and adaptive learningRate, and usually has better results

    3. The log-like function used in the loss function has the same effect as cross entropy

    4. After the model is trained, you can use ModelSerializer.writeModel(model, modelPath, true) to save the network structure, which can be used for image recognition

    For the complete code, you can view the example of deeplearning4j

 

---------------------------------------------------------------------------------------------------------

Happiness comes from sharing.

   This blog is original by the author, please indicate the source for reprinting

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324124234&siteId=291194637