Convolutional Neural Network Study Notes and Experience (2) Dataset

In the field of machine learning, there are many ready-made datasets, which are produced, organized by individuals or organizations, and publicly downloaded to the outside world. For example, in the field of character recognition, there are mnist datasets, and in the field of data mining, there are Iris and Adult datasets. These datasets provide great convenience for researchers of related technologies. With these resources, researchers can focus more on model research. It can be said that the creators of these datasets are very important to promote data mining and research. The development of machine learning has made a huge contribution.
However, in many cases, these datasets may not meet our needs, so we need to make our own datasets. I have to say that the experience of making datasets in this character recognition project really made me feel the greatness of those data organizers. Next, I wrote down the production process of the dataset, hoping to bring help and inspiration to everyone.

There are a total of 110 characters to be recognized this time, including 52 uppercase and lowercase letters, 10 numbers, 46 Chinese, and 2 punctuation marks. As we all know, the deep neural network is fed by massive data, so each character needs a large number of samples to support. In order to solve the problem of the number of samples, I did this:
* 1. First, display the characters in white in turn background, and then save it in .jpg format.
* 2. Delete the blank pixels around the characters to make the characters top grid.
* 3. Compare the number of rows and columns, and fill the pixels evenly on both sides in the smaller direction until the number of rows and columns are equal. In this way, the characters must be top grid in at least one direction of the row or column, and the entire The picture is a square. Then add background pixels evenly around the image as appropriate.
* 4. Perform a rotation transformation of -10° to 10° in steps of 2° for each image with its normal as the axis, and do -10° to 10° steps around the x-axis and y-axis of the image plane is a perspective transformation of 1°. Each character can get 11*21*21=4851 sample images.
* 5. Among these 4851 images, 147 images are randomly selected for validation and testing, and the remaining 4704 images are used for training.

You may have doubts about the second and third steps. The reason is actually very simple. In the caffe framework, a network only accepts input images of one size, which will bring several problems:
* 1. Each image The characters in the .
* 2. The aspect ratio of each picture is different. If the resolution is unified, some pictures will be distorted.
* 3. If the image is top grid, the processing of the edge may be troublesome. This involves the specific implementation of the convolution operation in the model, which will be explained in detail later.
write picture description herewrite picture description herewrite picture description here
Figure 1 Character '3'

Therefore, the role of the second step is to make the characters account for the largest proportion of the image, and I will introduce the benefits of this in a later article. The third step is to avoid image distortion and affect the training effect.
The fourth step is to improve the generalization ability of the model. If you have done your homework before reading this article, you should know that convolutional neural networks are resistant to affine transformations.
In this way, the training set of the picture is ready. Of course, caffe cannot train directly from the picture, but also needs to be converted into .mdb format. There is a lot of information on the Internet, so I won't go into details here.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325923053&siteId=291194637