Encog3 in Java: Chapter 9 Using Image Data

Reprinted in: Chapter 9 Using Image Data - A Code Farmer

• Image Processing

• Look for boundaries

•   Downsampling

• Working with image datasets

Recognizing images using neural networks is a very general task. This chapter will explore how Encog uses images. Using the same feed-forward neural network used in previous chapters, neural networks can be designed to recognize certain images. Dedicated datasets simplify the The process of inputting image data into a neural network.

This chapter introduces the ImageMLDataSet. This class can accept a list of loaded images and process them into an Encog-friendly form. ImageMLDataSet is based on BasicMLDataSet. BasicMLDataSet is actually just an array of double values ​​for input and ideal values. ImageMLDataSet simply adds a special function to load images into a double array.

There are several important issues to consider when loading image data into a neural network. ImageMLDataSet has two important aspects to consider. The first aspect is boundary detection and recognition, and the second is that the sampled image is usually at high resolution. The sampling format at a higher rate, the sampling format must be uniformly converted to low resolution before feeding into the neural network,

9.1 Finding the Boundary

An image is a rectangular area, and specifying its data is important to the neural network. It may be useful for only a part of the image. Ideally, the actual image recognized by the neural network is equal to the entire physical image, not just the original image. part. Take the example in Figure 9.1 for example.

As you can see in the image above, the character "X" is almost drawn over the entire physical image, this image will require minimal boundary detection.

Images are not always created perfectly, consider the picture specified in Figure 9.2.

Here the letter "X" is scaled differently than the previous image. To correctly identify it we have to look for the bounds. Figure 9.3 shows a bounding box around the letter X. Only the data surrounding the inside of the box can be used to identify the image.

As you can see, the letter X boundary has been detected, the inner data represented by the bounding box will be identified, and the direction of X is now roughly the same as in Figure 9.1.

9.2 Downsampling of images

Even with bounding boxes, the image size is not necessarily consistent. Figure 9.3 The letter X is much smaller than Figure 9.1. When the image is recognized, we will draw a grid on the image, and each grid corresponds to an input neuron in turn. To do this, the image must be of uniform size. Further , most images are too high resolution to use neural networks.

Downsampling solves both problems by reducing image resolution and scaling all images to make them the same size. To see this in action, consider Figure 9.4, which shows the high resolution of the Encog Logo.

Figure 9.5 shows the downsampled image.

Did you notice the grid pattern? It has been reduced to 32*32 pixels, these pixels will form the input of a neural network, the neural network will require 1024 input neurons, if the network only looks at the intensity of each square. Looking at the strength limit a neural network sees in "black and white".

If you want the neural network to see colors, it is necessary to provide red green blue (RGB) values ​​for each pixel, which will mean that each pixel has three input neurons, and the number of input neurons will be 3072.

The Encog image dataset provides boundary detection, as well as RGB and intensity downsampling. In the next section, the Encog image dataset will be introduced.

9.2.1 How to deal with output neurons

The output neurons should represent the groups these images will fall into. For example, if you write an OCR application that uses one output neuron for each recognized character, equilateral encoding is also useful here, which was described in Chapter 2 "For Encog Acquiring Data" is discussed.

Supervised training also requires ideal output data for each image, such as a simple OCR, here there may be 26 output neurons, one neuron for each letter, what these ideal output images actually train the neural network on. Whether training is supervised or unsupervised, output neurons communicate how the neural network interprets each image.

9.3 Using the Encog Image Dataset

Before the ImageMLDataSet object is instantiated, a Downsampled object must be created, which is a tool of Encog and used to perform downsample. All Encog downsample objects must implement the Downsample interface. Encog currently supports two downsample classes, as follows:

SimpleIntensityDownsample does not consider color, it simply calculates intensity or darkness pixels, and the number of input neurons is the height multiplied by the width value, because here only one input neuron is required for each pixel.

RGBDownsample is more advanced than SimpleIntensityDownsample. The Downsample object contains the resolution you specify and the three primary colors (RGB) input for each pixel. The total number of input neurons generated by this object will be height times width times three. The following code instantiates a SimpleIntensityDownsampe object, which will be used to create the training set.

Now that a downsample object has been created, it’s time to create a dataset using the ImageMLDataSet class. It must specify several parameters. The code is as follows:

Parameters 1 and -1 specify the range of color normalization, for intensity colors or three separate RGB colors, a false value means that the data set should not try to detect boundaries, if the value is true, Enog will try to detect boundaries.

The current Encog border detection is not very advanced, it looks for a consistent color on both sides of the image and tries to remove that area as much as possible, more advanced border detection will be possible in future Encog versions, if advanced border detection is necessary, it is best to construct the images before sending them to the ImageMLDataSet object,

Now that the ImageMLDataSet object has been created, it's time to add some images. To add images to this dataset, an ImageMLData object must be created for each image. The following code will add an image from a file.

To load an image from a file using the Java ImageIO class, to read an image file, any valid java image can be used through the Dataset object.

When using supervised training, the desired output should be specified. For unsupervised training, this parameter can be omitted. Once the ImageMLData object is instantiated, add it to the dataset. Repeat these steps for each image.

Once all images are loaded and they are ready to be downsampled, the downsample image calls the downsample method.

Specify the width and height of the downsample, all images will be downsample in size, after calling the downsample method, the training data will be generated and a neural network can be trained.

9.4 Image Recognition Example

Now, we will see how to visualize all Encog classes with an example. A general-purpose image recognition program will serve as an example, which can easily be the basis for a more complex image recognition program. This example is driven from a script file. Listing 9.1 shows the script-driven file

This script file uses a very simple syntax, with a command, followed by a colon, followed by a comma-separated list of arguments. Each parameter is also a key-value pair separated by a comma. There are five control commands here: CreateTraining, Input, Network, Train and WhatIs.

The CreateTraining command creates a new training set. To do this, specify the sample height, width and type (RGB or intensity).

The Input command inputs new images for the training set. Each input command specifies the image and the identity of the image. Multiple images can have the same identity.

The Network command creates a new neural network for training and recognition. It has two parameters that specify the size of the first and second hidden layers. If you don't want the second hidden layer, specify the hidden2 parameter as 0.

The Train command trains the neural network. The model can choose console or GUI mode. The minutes parameter specifies how long the network needs to train. This parameter is only used for training in console mode. For GUI training, this parameter should be set to 0, and the strategy tells the training Algorithm, how many cycles to wait to reset the neural network if the error level has not dropped by the specified amount. The WhatIS command takes an image and identifies it, this example will print the identity of the image it thinks is most similar.

Let's now take a look at this image recognition example, which is in the following location.

In the above example some codes, script files and parameters are involved. String parsing is not really the focus of this book, we will focus on how each command is executed and how the neural network is constructed. The next section discusses the implementation of these commands.

9.4.1 Create training set

The CreateTraining command is implemented through the processCreateTraining method, which is as follows:

Get the three parameters of the CreateTraining command, the following lines read these parameters:

Width and height are two integer parameters and need to be parsed.

We now have to create the downsample object, if the mode is RGB then use RGBDownsample, otherwise use SimpleIntensityDownsample.

The ImageMLDDataSet can now be created.

Now that the training set is created, we can input images, the next section describes how to do this.

 

9.4.2 Image input

Input commands are implemented through processInput. The method looks like this:

Get the two parameters of the Input command, the following line reads these parameters:

Identity is a text string specifying what the image is, we keep track of the number of unique identities and assign each one an incrementing number. These identities queue structures come from the output layer of the neural network. Each identity queue will be marked as an output neuron. When the image is passed to the neural network, the output neuron with the highest output will represent the identity of the image in the network. The assignIdentity method is a The simple method, which marks the incremented count, and a map of identity strings to neuron indices.

Create a file object to hold this image, then use to read this image,

At this point, we don't want to actually load the entire single image, we'll simply hold the images through an ImagePair object. ImagePair objects index numbers from images to output neurons. The ImagePair class is not built by Encog, of course, it is used in this example to construct an image map.

Finally, we display a message telling us the image was added.

Once all the images have been added, the number of output neurons is apparent, and the actual neural network can be created, which is covered in the next section.

 

9.4.3 Creating a Neural Network

The Network command is implemented through the processInput method, as shown below:

Start sampling images, looping through each previously created ImagePair.

Create a BasicMLData to hold the desired output for each output neuron.

The output neuron corresponding to the identity of the image currently being trained will be set to 1, and the other output neurons will be set to -1.

The input data in the training set is set as downsampled images. First, load the image into a java Image object.

Create an ImageMLData object to hold the image and add it to the training set.

Here the Network command provides two parameters to specify the number of neurons in the two hidden layers. The second hidden layer has no neurons, and here is a single hidden layer.

We now prepare downsamples for all images.

Finally, create a new neural network with the specified parameters, the last true parameter specifies that we are going to use the hyperbolic tangent activation function.

Once the neural network is created, report completion by printing a message.

Now that the neural network is created, it can be trained, which is covered in the next section.

9.4.4 Training Neural Networks

The Train command is implemented through the processTrain method, which is shown below.

The Train command has four parameters, which are read in the following lines.

Once the parameters are read, a message is displayed indicating that training has started.

Parse the two strategy parameters.

The neural network is initialized with random weights and thresholds. Sometimes, the random setting of weights and thresholds will cause the neural network training to stagnate. In this case, reset a new random value for training once.

Training is initialized by creating a new ResilientPropagation trainer, RPROP training is covered in Chapter 5, "Propagation Training".

Encog allows adding training strategies to handle this situation. A particularly useful training strategy is the Resetstrategy, which takes two parameters. The first parameter states the minimum error the network must reach before automatically resetting to a new random value. The second parameter specifies the number of cycles the network is allowed to achieve this error rate. If the specified number of epochs is reached and the network does not achieve the desired error rate, the weights and thresholds will be randomized.

Encog supports many different training strategies. The training strategy improves the training method in training. They allow small adjustments during training. Encog supports the following strategies:
Greedy
HybridStrategy
ResetStrategy
SmartLearningRate
SmartMomentum
StopTrainingStrategy

The greedy strategy only allows a training iteration to save its weight and threshold changes if the error rate improves. Then the main training method is stopped, and the hybridstrategy allows the backup training method to be used. Chapter 7, "Other Neural Network Types," explains mixed strategies. resetstrategy resets the network if it is stuck. The smartlearningrate and smartmomentum strategies are used with backpropagation training to attempt to automatically adjust the learning rate and momentum. The stoptrainingstrategy stops training if a certain error level has been reached.

The following line of code adds the reset policy.

If we use GUI training, then we must use trainDialog, otherwise we should use trainConsole.

The program will display to stop the training by displaying a message like below. When the dialog is dismissed, or in GUI mode, the training process stops.

Once the neural network is trained, it can recognize images. This will be discussed in the next section.

9.4.5 Image recognition

The WhatIs command is implemented through the processWhatIs method, which is shown below:

The WhatIs command can get a parameter, and the following line reads this parameter.

The image is loaded into an ImageMLData object.

Image downsampling rate to a suitable range.

The downsampled image is passed to the neural network, which selects the "winner" neuron. The winning neuron is the neuron with the largest output for the presented pattern. This is simply one of the "standardizations discussed in Chapter 2". Chapter 2 also introduces equilateral regularization, which can also be used.

Finally, we show the patterns recognized by the neural network.

This example demonstrates a simple script-based image recognition program. This application can easily be used as a starting point for other more advanced image recognition applications. A very useful extension to this application could be the ability to load and save trained neural networks.

9.5 Summary

This chapter demonstrates how to use image data as input to Encog. Any of the neural network types discussed in this book can be used to recognize images. Encog mainly deals with image data provided by a neural network class, rather than defining the actual neural network structure.

The Encog image processing classes provide some very important functionality including boundary detection and downsampling.

Boundary detection deals with trimming unimportant parts of an image. Encog supports simple boundary checking, which simply removes background-consistent colors. This prevents objects in the input image from undermining the neural network's ability to process the image. If the image to be recognized is in the upper left or lower right corner during boundary detection, there is no need to consider it.

Downsampling is the process of reducing the resolution of an image. Images have very high resolution and often contain a lot of color. Encog provides downsampling to handle both cases. Images can be downscaled to a very low resolution for sending To the input neuron, downsampling can also discard color information and only process intensity.

In this book, we have looked at a number of different types of neural networks. This chapter shows how feed-forward neural networks can be used on images. Self-organizing maps (SOM) are another type of neural network used on images. Next Chapter will look at SOM.

Guess you like

Origin blog.csdn.net/u012970287/article/details/79528608