For text positioning by CTPN, CRNN character recognition and realization of Flask Web bank card number identification
Github Address
Since I am not a machine learning direction, to complete this project only school curriculum needs
so the article may be just getting started and completed this project, as deep-seated principle, recommended two Chinese Bowen
[OCR] Technical Series Five Survey natural scene text detection (CTPN, SegLink, EAST)
[OCR] Technical Series Seven-end variable-length character recognition algorithm Detailed CRNN
Construction of the foundation operating environment
And the hardware device driver dependent portion and the following:
Ubuntu18.04 the CUDA 8.0.61 + + + the NVIDIA the GeForce GTX 960M Driver 430.14 + + Tensorflow-GPU Python3.6
NVIDIA 430.14 driver download
CUDA 8.0 download
After the installation is complete, you can check in your device in the following order
Clone source code, and generates a virtual environment Python3
git clone https://github.com/bay1/card-crnn-ctpn.git
python3 -m virtualenv venv
source venv/bin/activate # 激活虚拟环境
pip install -r requirements.txt # 安装项目依赖
Configuring warpctc-pytorch
Project uses warpctc-pytorch, we need to manually install
Note that the command to be executed in Python virtual environment
git clone https://github.com/SeanNaren/warp-ctc.git
cd warp-ctc
mkdir build; cd build
cmake ..
make
You may encounter the following error, it is because your gcc version too, need less than 5.0 version
/usr/local/cuda-8.0/include/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 5 are not supported!
#error -- unsupported GNU version! gcc versions later than 5 are not supported!
^~~~~
If there are multiple versions of gcc on your system, you can execute the following command specifies the "gcc" command to link the specific instructions
for example, I specify a different gcc version I present in the system: gcc-4.9
sudo rm /usr/bin/gcc
sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc
You may also encounter the following error
/usr/bin/ld: CMakeFiles/test_gpu.dir/tests/test_gpu_generated_test_gpu.cu.o: relocation R_X86_64_32S against `.bss' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: 最后的链结失败: 输出不可表示的节
collect2: error: ld returned 1 exit status
CMakeFiles/test_gpu.dir/build.make:98: recipe for target 'test_gpu' failed
make[2]: *** [test_gpu] Error 1
CMakeFiles/Makefile2:146: recipe for target 'CMakeFiles/test_gpu.dir/all' failed
make[1]: *** [CMakeFiles/test_gpu.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
According to the contents of error, we can directly modify CMakeCache.txt directory
CMAKE_CXX_FLAGS:STRING=-fPIC # 39 行
Then we explained according warp-ctc, execute the following command
cd ../pytorch_binding
python setup.py install
At this point you may encounter the following error
src/binding.cpp:6:10: fatal error: torch/extension.h: 没有那个文件或目录
#include <torch/extension.h>
^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
According to previous experience, https://github.com/SeanNaren/warp-ctc/issues/101
we switch directly to the previous version
git checkout ac045b6
CTPN
Text position location, I direct borrowing of this project and its training model text-detection-ctpn
actually model the effect of the training is not very good, but my hands did not model data, their own training or training need to download this author data
so simply direct use of the trained model data can be downloaded from the author of two ways of ckpt file
The disk in drive GOOGL
to Baidu yun
This folder into ctpn /, and then execute the following command
cd ctpn/utils/bbox
chmod +x make.sh
./make.sh
PS: If you want to train yourself CTPN model data, you can perform train.py file ctpn folder
Which text-detection-ctpn only for text positioning position
and our specific positioning for the bank card number, they still need to do some custom processing of
my idea is chosen out of all Box calculate length and width, and then based on long to width ratio interception area card
course, the premise is that this effect, we were able to partially detected the card number, but the effect achieved is still quite good
def get_wh(box_coordinate):
"""
计算box坐标宽高
box格式: [xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax, score]
"""
xmin = box_coordinate[0]
xmax = box_coordinate[2]
ymin = box_coordinate[1]
ymax = box_coordinate[5]
width = xmax - xmin
height = ymax - ymin
return width, height
murmur
At this point need to return to the root directory of the project, the first picture I got a simple data processing
is about the original picture data, that data / images of pictures, converted into the form needed to generate lmdb
python crnn/handle_images.py
Pictures path and the correct label text: crnn / to_lmdb / train.txt
picture new path after treatment: crnn / to_lmdb / train_images
Then we need to convert our data into the hands of the train lmdb files needed to
execute the following command
python crnn/to_lmdb/to_lmdb_py3.py # python crnn/to_lmdb/to_lmdb_py2.py
Generated lmdb file directory: crnn / to_lmdb / lmdb
This time we can train our hands specific data
python crnn/train.py
Model save directory: crnn / expr
Custom parameters
This project has a variety of custom data directory, as well as training the model parameters
if you want to modify these parameters or data path, you can go to the following two files
- ctpn/params.py
- murmur / params.py
Training model in which the parameters crnn Detailed
--random_sample 是否使用随机采样器对数据集进行采样, action='store_true'
--keep_ratio 设置图片保持横纵比缩放, action='store_true'
--adam 使用adma优化器, action='store_true'
--adadelta 使用adadelta优化器, action='store_true'
--saveInterval 设置多少次迭代保存一次模型
--valInterval 设置多少次迭代验证一次
--n_test_disp 每次验证显示的个数
--displayInterval 设置多少次迭代显示一次
--experiment 模型保存目录
--alphabet 设置检测分类
--crnn 选择预训练模型
--beta1
--lr 学习率
--niter 训练回合数
--nh LSTM隐藏层数
--imgW 图片宽度
--imgH 图片高度, default=32
--batchSize 设置batchSize大小, default=64
--workers 工作核数, default=2
--trainroot 训练集路径
--valroot 验证集路径
--cuda 使用GPU, action='store_true'
Visualization
After completion of training CRNN, crnn test load model default path is: crnn / trained_models / crnn_Rec_done.pth
that we need to put our training into good model to rename the directory
and then we can execute the following command in the root directory of the project
python run.py
Browser to open links: http://127.0.0.1:5000
Show results
This is a local iterations 60 times the effect of
test pictures can fully identify a bank card number which