Simple Four Arithmetic Operations to Identify Computational Networks

mathAI : A big guy uses LeNet5, a mathematical formula recognition model trained using the CROHME dataset.

Recently, I need to do a simple formula recognition of the four arithmetic operations, and I don't need to recognize any complicated formulas, as long as it can be used in elementary school. At the beginning, PaddleOCR was used for detection (ocr_det) and recognition (ocr_rec) (see the figure below), but after all, it is not specially used to recognize mathematical formulas, so the recognition effect of mathematical symbols such as plus and division signs is not very good. So I hope to train a simple model to recognize numbers and mathematical symbols and improve the accuracy a little.

insert image description here

The above project is implemented using LeNet5, but I want the model to be smaller, so I hope to try it with MobileNets. However, you can implement it with LeNet5 first to get familiar with the process.

CROHME dataset processing

The first question is that this is the first time I came into contact with the CROHME dataset. When I opened it, I saw that they were all annotation files with the suffix *inkML.

I want to recognize the mathematical characters in the picture individually and then combine them.

Therefore, there is no need to process the formula, as long as the characters in the formula are extracted as the training set. Equivalent to a simple classification model.

Looking at the various ways of data extraction on the Internet, I feel that I can only try to extract it in my own way.

Store all the same characters in a folder according to the characters, so as to save each picture from having to put another text for labeling.

The database I use is CROHME2013_data, after the final conversion, there are more than 90,000 data samples.

Structure in *.inkML file:

<trace>

<trace id="18">
766 127, 766 127, 766 127, 766 127, 766 127, 764 127, 764 127, 764 127, 764 127, 764 127, 764 127, 763 127, 763 127, 763 127, 763 127, 763 127, 763 127, 763 127, 762 128, 762 128, 762 128, 762 128, 762 128, 763 127, 763 127, 763 127, 763 127, 765 127, 770 125, 770 125, 770 125, 770 125, 770 125, 777 124, 784 122, 784 122, 784 122, 784 122, 791 121, 791 121, 791 121, 797 120, 801 119, 801 119, 801 119, 803 119, 803 119, 803 119, 803 119, 803 119, 803 119, 803 119, 803 119, 803 119, 803 119, 803 119, 802 120, 802 120, 800 121
</trace>
<trace id="19">
768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 123, 768 124, 768 124, 768 124, 768 126, 768 126, 768 126, 768 126, 768 126, 768 130, 768 134, 768 134, 768 134, 768 134, 767 138, 767 138, 767 138, 766 143, 765 146, 765 146, 765 146, 765 148, 765 148, 765 148, 765 149, 765 149, 765 149, 765 149, 765 149, 765 149, 765 149, 765 149, 764 151, 764 151, 764 151, 764 151, 764 151, 764 151, 764 151, 764 151, 764 151, 764 151, 764 151, 764 149, 764 149, 764 149, 764 149, 766 148, 766 148, 766 148, 768 146, 768 146, 768 146, 768 146, 768 146, 768 146, 768 146, 770 144, 772 144, 772 144, 774 143, 774 143, 774 143, 774 143, 775 143, 776 143, 776 143, 776 143, 777 143, 777 143, 777 143, 780 144, 780 144, 780 144, 780 144, 781 144, 781 144, 781 144, 781 144, 781 144, 783 145, 783 145, 783 145, 783 145, 785 145, 786 147, 786 147, 786 147, 786 147, 786 147, 788 148, 788 148, 788 148, 790 149, 790 149, 790 150, 790 150, 790 150, 790 150, 790 150, 790 150, 790 153, 790 153, 790 153, 790 153, 790 153, 790 153, 789 155, 789 155, 789 155, 789 156, 789 156, 789 156, 789 156, 789 156, 789 157, 789 157, 789 157, 789 157, 789 159, 789 159, 789 159, 788 160, 788 160, 788 160, 788 160, 788 160, 788 160, 786 161, 786 161, 786 161, 786 161, 786 161, 786 163, 786 163, 786 163, 785 164, 785 164, 784 165, 784 165, 784 165, 784 165, 784 165, 782 166, 782 166, 782 166, 782 166, 782 166, 780 167, 780 167, 780 167, 779 168, 779 168, 779 168, 779 168, 777 168, 777 168, 777 168, 777 168, 775 168, 775 168, 775 168, 773 168, 773 168, 773 168, 773 168, 773 168, 773 168, 773 168, 773 168, 771 169, 771 169, 771 169, 771 169, 769 169, 767 169, 767 169, 767 169, 767 169, 767 169, 766 169, 766 169, 766 169, 766 169, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 765 168, 764 168, 764 168, 764 168, 764 168, 764 168, 764 168, 764 168, 764 168, 764 168, 764 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 762 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 761 168, 762 168
</trace>

<trace><\trace>: The pixels in this label are handwriting. Some processing is required to draw the image. Each trace only represents one stroke, which is similar to this word. In theory , it is no problem to xput it in one tag to represent it, but the two strokes are still divided into two to save.tracetrace

traceGroup

<traceGroup xml:id="29">
	<annotation type="truth">5</annotation>
	<traceView traceDataRef="18"/>
	<traceView traceDataRef="19"/>
	<annotationXML href="5_1"/>
</traceGroup>

This <traceGroup>will put together the strokes belonging to a math character to form a complete math character. Similar to 5this number, it consists of two strokes.

traceDataRefThis attribute is the previous traceone id, and we use this information to establish the connection between the annotation and the image.

Why label data like this ? I guess there are some other similar ones cos, logsuch as mathematical operators, which are composed of several words, but need to be recognized as a whole mathematical character.

Generate pictures based on pixels

First of all, be sure to check traceGroupto know which handwriting traceDataRefbelongs to a group. Then based on this, go to the trace to find the corresponding strokes, combine them together, and become a character.

It is equivalent to separating the mathematical characters in the formula in the original inkML file as a data set of different characters.

I wrote a picture that is generated from an inkml file and then saved to the corresponding classification folder.

There were many problems during the conversion process. First, the CROHME data set should be a combination of multiple handwritten data sets. The tracescales of the handwriting in it are not the same. At the beginning, I encountered about 200~300 range. , and then opened a few compressed packages, and saw about 10,000, and also saw decimals, and even a few files were negative. I feel like I'm playing with the applause

And the division sign I most want for elementary school \divis only available in one of the data sets (HAMEX), and most of the others are for the feeling of advanced mathematics.

And some parameter settings in the conversion process are also very particular, because I draw points first, and then use expansion to connect. I feel that the canvas size, coordinate scale, and expansion kernel size defined at the beginning can all affect the generated data image. The effect makes an impact:

insert image description here
Some of the above are generated well, and the points in the lower part are a bit discrete.

I don't know what else to do to connect the points smoothly? Ah, after an hour, I finally solved it, and I also understand the good intentions of the data set to combine characters through handwriting. That is, tracethe points inside are arranged in the order of strokes, as long as they are connected according to the relationship between the front and back, so that a relatively continuous picture can be obtained. There is no need for closing operations.

I'm using it cv2.linefor wiring, and it seems to work fine.

Discrete point connection
I wrote the code for converting inkml to img too badly, so I will upload it after a period of optimization.

todo: It is also necessary to 20 x 20remove images smaller than a certain size ( ) pixels.
Structure of the dataset
Next, we can finally start preparing the network model.

MobileNet network model

Guess you like

Origin blog.csdn.net/Jiangnan_Cai/article/details/127276236