Andrew Ng machine learning notes (58) - Machine picture identification (Application Example)

Chapter XVIII application examples: Image Character Recognition (Application Example: Photo OCR)

1, a flow chart description of the problem and

What image character recognition application is made to identify the text from a given picture. This is much more complex than the identifying text from a scanned document.
Here Insert Picture Description
To accomplish this work, you need to take the following steps:
1. Text Detection (Text detection) - The text on pictures with other environmental objects separated from the
2. character segmentation (Character segmentation) - The text is divided into a a single character
3. character classification (character classification) - to determine what each character is a flow chart can be used to express the problem task, each task can be made a separate team responsible for solving:
Here Insert Picture Description

2, the sliding window

Sliding window is to be extracted from an image is used in the art. If we need to identify a pedestrian in the picture, the first thing to do is to use a number of fixed-size images to train a model can accurately identify pedestrians. Then the image size when we use the model to identify pedestrians before training used to crop the picture we want to pedestrian recognition, and then cut the slices get to the model, so the model is to determine whether a pedestrian, and then slide the cut in the picture regional re-cut, cut a slice of the new model is also to be judged, and so on until all the testing finished picture.
Once this is done, we have scaled clipping region, and then a new crop the image size of the newly cut slice scaled down to the size of the model adopted, the model to be determined, so the cycle.
Here Insert Picture Description
Sliding window technique is also used for character recognition, the first training model can distinguish character and non-character, then, using a sliding window technique character recognition, character recognition, once completed, we will identify some areas resulting expansion, then the overlap the area combined. Then we aspect ratio as a filter to filter out regions of greater width than height (the length of a word that is generally larger than the height). Under The green area is the result of these steps are considered to be the region after the text, while the red areas are ignored.
Here Insert Picture Description
The above is the text detection stage. The next step is to train a model to complete the task of dividing the text into a character, need training set by the picture between the picture and the individual character of the two connected characters to train the model.
Here Insert Picture Description
After training model, we are still using a sliding window technique to character recognition.

3, access to large amounts of data and manual data

If our model is a low variance, the more data used to train the model is able to have a better effect. The question is, how do we get the data, the data may not always be directly obtained, we may need to manually create some data.
With our character recognition application, for example, we can font download a variety of fonts, and then use these different font with a variety of different random background image to create some examples for training, which allows us to get an infinite the training set. This is an example to create from scratch.
Another method is to use the existing data, and then modify, for example, have some character image distortion, rotation, blur. As long as we believe it is possible and actual data through data thus treated similarly, we can use this method to create large amounts of data.
For several ways to get more data:
1. Data artificial synthetic
2. Manual collection, mark data
3. Crowdsourcing

4, the upper limit of analysis: which part of the pipeline to do next

In the application of machine learning, we usually need several steps to a final prediction, how can we know which part of the most worth our time and effort to improve it? This question can be answered by the upper limit of analysis.
Returning to our character recognition applications, we flowchart is as follows:
Here Insert Picture Description
Output flow chart of each part is input in the next section, the upper limit of the analysis, we selected a part of the right hand 100% of the output, and then look at the application how much to enhance the overall effect. If our example, the overall effect is a 72% accuracy rate.
If we make the text portion of the output of the detection result 100% correct, we found that the overall effect of the system increased from 72% to 89%. This means that we are likely to want to invest time and effort to improve our text detection section.
Then we select the data manually, so that the character segmentation 100% correct output results and found that the overall effect of the system only to enhance the 1%, which means that our character segmentation section may have been good enough.
Finally, we hand-select the data, so that the result of character classification output 100% correct, the overall effect of the system has improved by 10%, which means we will probably should have put more time and effort to improve the overall performance of the application.
Here Insert Picture Description

Thank Dr. widely team Yellow Sea translation and notes

Published 80 original articles · won praise 140 · views 640 000 +

Guess you like

Origin blog.csdn.net/linjpg/article/details/104562978