OCR study notes

1 Acknowledgements

Thanks for the difflib library provided in Python ~

2 Preface

We hope we can create our own OCR model~

3 pretreatment

3.1 Obtain the contour of the image-cv2.findContours()

You can learn the blog post "findContours() function (explanation)"

3 post-processing

After the OCR recognition process, the recognized text may have errors, so post-processing is required;

3.1 Similarity matching-difflib

We use similarity to find the most similar names in the vocabulary . The library used is difflib ,

3.1.1 Use set_seq1() and set_seq2() to optimize performance

In the difflib documentation, they gave such optimization suggestions:

SequenceMatcherFor more information about computing and caching second sequence, so if you want a sequence of a plurality of sequences were compared using set_seq2()a one-time set the commonly used sequence and repeatedly call each other once for each sequence set_seq1().

We can also optimize according to this method;

 

Guess you like

Origin blog.csdn.net/songyuc/article/details/107081814