1 Acknowledgements
Thanks for the difflib library provided in Python ~
2 Preface
We hope we can create our own OCR model~
3 pretreatment
3.1 Obtain the contour of the image-cv2.findContours()
You can learn the blog post "findContours() function (explanation)"
3 post-processing
After the OCR recognition process, the recognized text may have errors, so post-processing is required;
3.1 Similarity matching-difflib
We use similarity to find the most similar names in the vocabulary . The library used is difflib ,
3.1.1 Use set_seq1() and set_seq2() to optimize performance
In the difflib documentation, they gave such optimization suggestions:
SequenceMatcher
For more information about computing and caching second sequence, so if you want a sequence of a plurality of sequences were compared usingset_seq2()
a one-time set the commonly used sequence and repeatedly call each other once for each sequenceset_seq1()
.
We can also optimize according to this method;