3.1.1.CRNN Introduction
After CNN by the feature extracted from image sequences using RNN predict the final result obtained by a final CTC translation layer. It means CNN + RNN + CTC structure.
Git address https://github.com/bgshih/crnn
paper: paper http://arxiv.org/abs/1507.05717.
3.1.2.CNN Introduction
CNN structure is used in VGG structure and VGG article on the network to do some fine-tuning
3.1.3.RNN Introduction
Wherein RNN sequence for the CNN network output, each input has an output yt. In order to prevent the disappearance of the gradient of training, the article uses LSTM nerve cells as a unit of RNN. Article that predicted for the sequence, the sequence information before and after the information to help predict the sequence, so the article uses a dual RNN network. LSTM bidirectional neuronal structure and configuration as shown in FIG RNN.
3.1.4.CTC translation layer
Test, translation is divided into two types, one is with a dictionary, one is no dictionary.
With the dictionary is in the test, the test is set to the dictionary, the output probability is calculated for all the test dictionary, taking the maximum of the string is the final predicted
Without a dictionary, the test means comprising a current collector which is not given the test string, selects the output the probability that the predicted maximum as the final prediction string.
3.1.5. Debugging on tensorflow of crnn
1. First download from the git
Git address: HTTPS: //github.com/MaybeShewill-CV/CRNN_Tensorflow
2. Download the pre-trained models, their own training, then do not download, training data from several GB of it.
The pretrained crnn model weights on Synth90k dataset can be found here
3. downloaded can be used directly, and use the following commands:
Python Tools / test_shadownet.py --image_path Data / test_images / test_01.jpg --weights_path Model / crnn_synth90k / shadownet.ckpt --char_dict_path Data / char_dict / char_dict_en.json Data --ord_map_dict_path / char_dict / ord_map_en.json
4. Some pits
(1) a modified tools py file, add the following code. I am running windows directly, mainly to find relevant directories.
os.getcwd () method returns the current working directory.
2.windows creation process does not fork method, the default is to spawn, and the process of creating the default linux is fork method. You will be reported the following error:
at The "freeze_support ()" Line Program at The CAN BE Omitted IF
IS not going to to Produce Frozen BE AN Executable.
Modify data_provider / tf_io_pipline_fast_tools.py file, add "IF name == ' main ':", as follows Fig.
if name == ‘main’:
_SAMPLE_INFO_QUEUE = Manager().Queue()
_SENTINEL = ("", [])
3.1.6. English OCR successful operation
python tools/test_shadownet.py --image_path data/test_images/test_01.jpg --weights_path model/crnn_synth90k/shadownet.ckpt --char_dict_path data/char_dict/char_dict_en.json --ord_map_dict_path data/char_dict/ord_map_en.json
It found that only with the training data, identifies go.
3.1.7. Chinese OCR successful operation
1.下载预训练好的模型
I have uploaded a newly trained crnn model on chinese dataset which can be found here. Sorry for not knowing the owner of the dataset. But thanks for his great work. If someone knows it you’re welcome to let me know. The pretrained weights can be found here
2.修改配置文件config/global_config.py:
__C.ARCH.NUM_CLASSES = 5825 # cn dataset
#__C.ARCH.NUM_CLASSES = 37 # synth90k dataset
3.运行demo
python tools/recongnize_chinese_pdf.py -c ./data/char_dict/char_dict_cn.json -o ./data/char_dict/ord_map_cn.json --weights_path model/crnn_chinese/shadownet.ckpt --image_path data/test_images/test_pdf.png --save_path data/test_images/pdf_recognize_result.txt
I do not git recognition effect on a good show. Prepare the data themselves, their training is even more right.