[AI combat] Teach you text recognition by hand (testing part 2: AdvancedEAST, PixelLink method)

 
Text detection in natural scenes is an important application of deep learning. In previous articles, text detection methods in simple and complex scenes have been introduced, including methods such as MSER+NMS, CTPN, SegLink, and EAST. See the article for details. :

[AI combat] Teach you text recognition by hand (testing part 1: MSER, CTPN, SegLink, EAST methods)

Today, I will continue to introduce the text detection method based on deep learning in complex scenes, and teach you how to use AdvancedEAST and PixelLink for text detection.

1. Actual combat of AdvancedEAST method
In the previous article on AI combat of text detection, the EAST detection method was introduced, and good detection results were achieved, but the effect in long text prediction is not very satisfactory. As a result, some Daniel improved the EAST detection method, obtained better prediction accuracy than EAST (especially on long texts), and opened up the source code, which is the AdvancedEAST method. The network structure is as follows:
 
The network structure of AdvancedEAST is similar to that of EAST (see the article for the technical principle of EAST: EAST, the classic model of big talk text detection ), but VGG is used as the backbone structure of the network, written based on Keras, and the back convolution is added to the feature extraction layer. The number of channels of the layer, the post-processing method is also optimized. Let's try the actual detection effect of AdvancedEAST.
(1) Download the source code
 
First, download the AdvancedEAST source code on github (https://github.com/huoyijie/AdvancedEAST), which can be downloaded directly into a zip archive or a git clone

git clone https://github.com/huoyijie/AdvancedEAST.git

(2) Download the model file
Download the AdvancedEAST pre-trained model, download link: https://pan.baidu.com/s/1KO7tR_MW767ggmbTjIJpuQ Extraction code: kpm2
Create a folder saved_model, decompress the downloaded model file and put it in it
 
Modify the train_task_id in the cfg.py file, and modify the id to be consistent with the downloaded pre-training model, so that the model can be automatically loaded when the program is executed. The modification is as follows:

train_task_id = ‘3T736’

Download the VGG pre-training model of keras, because AdvancedEAST uses VGG as the backbone structure of the network, therefore, the VGG pre-training model will be loaded when calling keras, and the download address is https://github.com/fchollet/deep-learning-models /releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5, and then put it in the default path where keras loads the model, the directory is as follows:

~/.keras/model

If there is no manual download, the program will also download automatically when loading the VGG model of keras, but the general speed will be very slow, and it will often time out

(3) Prepare the basic environment
AdvancedEAST depends on the following basic environment, and uses conda or pip for installation preparation.

  • python 3.6.3+
  • tensorflow-gpu 1.5.0+(or tensorflow 1.5.0+)
  • hard 2.1.4+
  • numpy 1.14.1+
  • tqdm 4.19.7+

(4) AdvancedEAST detects text
Execute python predict.py for text detection
 
By default, the demo/012.png file that comes with the project is read, and the
 
following files are generated after detection.
 
Among them, 012.png_act.jpg is the result of the detection process
 
012.png_predict .jpg is the result of detecting the final text box.
 
012.txt is the position coordinates of the detected text box (4 vertices).
 
When the model is executed, some of the detection results will be incomplete (the border is less than 4 vertices) , it will be displayed by default.
 
You can modify the last behavior by keeping it silent (not prompting incomplete detection results) in the source code of predict.py

predict(east_detect, img_path, threshold, quiet=True)

If you want to detect the specified image, you can specify the image path by adding parameters when executing python predict.py. In addition, you can also specify a threshold, that is, the threshold for judging whether the pixel is text or not, and the default is 0.9. The execution command is as follows:

python predict.py --path=/data/work/tensorflow/data/icdar_datasets/ICDAR2015/ch4_test_images/img_364.jpg --threshold=0.9

The execution effect is as follows:
 

(5) AdvancedEAST interface encapsulation
In order to facilitate other programs to call AdvancedEAST's text detection capabilities, code modifications are made on the basis of predict.py, and the AdvancedEAST interface is encapsulated. The core code is as follows:

# sigmoid 函数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# AdvancedEAST 模型
def east_detect():
    east = East()
    east_detect = east.east_network()
    east_detect.load_weights(cfg.saved_model_weights_file_path)
    return east_detect

# 基于 Advanced EAST 的文本检测
# 输入:AdvancedEAST模型,图片路径,像素分类阈值
# 返回:检测后文本框的位置信息
def text_detect(east_detect,img_path,pixel_threshold=0.9):
    img = image.load_img(img_path)
    d_wight, d_height = resize_image(img, cfg.max_predict_img_size)
    scale_ratio_w = d_wight / img.width
    scale_ratio_h = d_height / img.height
    img = img.resize((d_wight, d_height), Image.NEAREST).convert('RGB')
    img = image.img_to_array(img)
    img = preprocess_input(img, mode='tf')

    x = np.expand_dims(img, axis=0)
    y = east_detect.predict(x)

    y = np.squeeze(y, axis=0)
    y[:, :, :3] = sigmoid(y[:, :, :3])
    cond = np.greater_equal(y[:, :, 0], pixel_threshold)
    activation_pixels = np.where(cond)
    quad_scores, quad_after_nms = nms(y, activation_pixels)

    bboxes = []
    for score, geo in zip(quad_scores, quad_after_nms):
        if np.amin(score) > 0:
            rescaled_geo = geo / [scale_ratio_w, scale_ratio_h]
            rescaled_geo_list = np.reshape(rescaled_geo, (8,)).tolist()
            bboxes.append(rescaled_geo_list)

    return bboxes

 

2. The actual combat of the Pixel Link method
The text detection method introduced above generally performs two predictions: judging whether it is text/non-text by classification, and determining the position and angle of the bounding box by regression. Among them, the time-consuming of regression is much more than that of classification, and the PixelLink (pixel connection) method all realizes the judgment of text/non-text through "classification", and gives the position and angle of the text box at the same time. The technical principle is detailed in the previous article: PixelLink, the classic model of big talk text detection .
The overall framework of PixelLink is as follows:
 
The following describes how to use the PixelLink model to detect text.
(1) Download the source code and model
 
First, download the PixelLink source code on github (https://github.com/ZJULearning/pixel_link), which can be downloaded directly into a zip archive or a git clone

git clone https://github.com/ZJULearning/pixel_link.git

Download pylib, the download path is https://github.com/dengdan/pylib/tree/e749559c9a4bcee3339081ec2d159a6dcf41636e, after decompression, put the util folder in the src directory under the pylib directory, and then add it to the environment variable, in test_pixel_link.py, test_pixel_link_on_any_image .py, visualize_detection_result.py, add in front of datasets/dataset_utils.py

import sys
sys.path.append('/data/work/tensorflow/model/pixel_link/pixel_link-master/pylib')
sys.path.append('/data/work/tensorflow/model/pixel_link/pixel_link-master/pylib/util')

Or execute the following command in the current window, or add the following command in /etc/profile, ~/.bashrc file

export PYTHONPATH=xx:$PYTHONPATH

Download the pre-trained model based on the IC15 dataset. The author provides two pre-trained models PixelLink + VGG16 4s (download address https://pan.baidu.com/s/1jsOc-cutC4GyF-wMMyj5-w), PixelLink + VGG16 2s (download link https://pan.baidu.com/s/1asSFsRSgviU2GnvGt2lAUw)

Create a new folder models/4s and models/2s, decompress the model compressed file, and put the two models into the corresponding directory for easy calling

(2) Install the basic environment
The pixel_link_env.txt file in the downloaded source code file provides the conda basic environment installation package. Since the conda mirror source of Tsinghua University has stopped serving, it is replaced with the conda mirror source of the University of Science and Technology of China. Modify The dependent base environment is as follows:

name: pixel_link
channels:
- menpo
- https://mirrors.ustc.edu.cn/anaconda/pkgs/free
- https://mirrors.ustc.edu.cn/anaconda/pkgs/main
- defaults
dependencies:
- certifi=2016.2.28=py27_0
- cudatoolkit=7.5=2
- cudnn=5.1=0
- funcsigs=1.0.2=py27_0
- libprotobuf=3.4.0=0
- mkl=2017.0.3=0
- mock=2.0.0=py27_0
- numpy=1.12.1=py27_0
- openssl=1.0.2l=0
- pbr=1.10.0=py27_0
- pip=9.0.1=py27_1
- protobuf=3.4.0=py27_0
- python=2.7.13=0
- readline=6.2=2
- setuptools=36.4.0=py27_1
- six=1.10.0=py27_0
- sqlite=3.13.0=0
- tensorflow-gpu=1.1.0=np112py27_0
- tk=8.5.18=0
- werkzeug=0.12.2=py27_0
- wheel=0.29.0=py27_0
- zlib=1.2.11=0
- opencv=2.4.11=nppy27_0
- pip:
  - backports.functools-lru-cache==1.5
  - bottle==0.12.13
  - cycler==0.10.0
  - cython==0.28.2
  - enum34==1.1.6
  - kiwisolver==1.0.1
  - matplotlib==2.2.2
  - olefile==0.44
  - pillow==4.3.0
  - polygon2==2.0.8
  - pyparsing==2.2.0
  - python-dateutil==2.7.2
  - pytz==2018.4
  - setproctitle==1.1.10
  - subprocess32==3.2.7
  - tensorflow==1.1.0
  - virtualenv==15.1.0

Use the following commands to create a conda virtual environment for pixel_link and install the basic dependencies

conda env create --file pixel_link_env.txt

 
After completing the installation of the basic environment package, you can use the following commands to switch to the pixel_link virtual environment and perform corresponding operations in it

source activate pixel_link

The basic environment provided by the source code is based on the python2 version. If you have installed the corresponding basic environment before, you can use it directly. It should be noted that if you use python3, you need to modify the following scripts:

  • Modify test_pixel_link.py line 94, line 156, line 164, line 166, and add parentheses after print
  • Modify line 112 of datasets/dataset_util.py and add parentheses after print
  • Modify line 174 of pylib/util/plt.py and add parentheses after print
  • Modify line 8 of pylib/util/img.py and add parentheses after print
  • Modify lines 29 and 30 of pylib/util/proc.py and add parentheses after print. Line 35, put parentheses after raise
  • Modify line 39 of pylib/util/thread_.py and add parentheses after raise
  • Modify lines 29, 46, 47, and 50 of pylib/util/caffe_.py, and add parentheses after print
  • Modify line 187 of pixel_link.py and add parentheses after raise
  • Modify line 46 of pylib/util/tf.py and change xrange to range
  • Modify line 153 of models/4s/config.py and change xrange to range
  • Modify pixel_link.py line 257, line 353, change xrange to range
  • Since there is no cPickle in python3, in pylib/util/io_.py, line 11, change import cPickle as pkl to import _pickle as pkl
  • Since commands have been replaced by subprocess in python3, in pylib/util/io_.py, on line 12, change import commands to import subprocess as commands. In pylib/util/cmd.py, line 4, change import commands to import subprocess as commands
  • Add import util.cmd in front of test_pixel_link.py

(3) PixelLink detection text test (batch images) Test
by running the following command

./scripts/test.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${image_dir}

This command consists of three parameters. The first parameter indicates the serial number of the GPU, the second parameter indicates the model path, and the third parameter indicates the directory of the test image.
Use the PixelLink+VGG16 4s pre-training model you just downloaded here, and use the scene text image dataset ICDAR2015 for testing (download address http://rrc.cvc.uab.es/?ch=4&com=downloads), or you can use your own the test image, put the test image into the specified directory.
Execute the script as follows:

./scripts/test.sh 0 models/4s/model.ckpt-38055 /data/work/tensorflow/data/icdar_datasets/ICDAR2015/ ch4_test_images


 
The detected results are saved in the model directory, and the result file is the text box position (4 coordinate points) in the test image, as shown in the following figure:
 
All results will also generate zip compressed files, as shown in the following figure:

If you want to make the detection result explicit, you can call the scripts/vis.sh script, and the result of the text detection will be displayed directly on the picture. The calling command is:

./scripts/vis.sh ${image_dir} ${det_dir}

Among them, the first parameter represents the path of the original image, the second parameter represents the directory where the detected text box location file is located, and the final output marked text box image result is saved in the ~/temp/no-use/pixel_result directory to
execute the script as follows

./scripts/vis.sh /data/work/tensorflow/data/icdar_datasets/ICDAR2015/ch4_test_images models/4s/test/icdar2015_test/model.ckpt-38055/txt


 

The output result picture is as follows

 

(4) PixelLink detection text test (arbitrary picture)
For the convenience of testing, you can directly call the following command to test any picture, the command is as follows:

./scripts/test_any.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${image_dir}

This command consists of three parameters. The first parameter represents the serial number of the GPU, the second parameter represents the model path, and the third parameter represents the image path.
For example, still take the test image set of ICDAR2015 for testing, and execute the command as follows:

./scripts/test_any.sh 0 models/4s/model.ckpt-38055 /data/work/tensorflow/data/icdar_datasets/ICDAR2015/ ch4_test_images


 
After execution, the text detection results will be directly displayed on the picture, as shown below

 

Some people may have doubts. Isn't the test_any.sh command just integrating the two commands, test.sh and vis.sh, so what's the difference? The main differences are as follows:
a. The test_any.sh command calls test_pixel_link_on_any_image.py, while the test.sh command calls test_pixel_link.py. When the two call the detection model, test_pixel_link_on_any_image.py puts the post-processing of the union search in the Inside the model, while test_pixel_link.py puts the post-processing of the union search outside the model. From the point of view of detection efficiency, test_pixel_link_on_any_image.py is much slower than test_pixel_link.py. This is due to the fact that the merge and search processing requires a lot of calculation, and it is faster to use CPU calculation outside the model
. b. test_pixel_link.py only outputs text boxes Position data, and test_pixel_link_on_any_image.py directly marks the detected text box on the image

(5) PixelLink text detection capability package
In order to facilitate other programs to call PixelLink's text detection capabilities, by transforming and packaging the test_pixel_link.py, visualize_detection_result.py codes, the text detection capabilities can be provided to the corresponding program calls. The core code is as follows:

# 模型参数
checkpoint_dir='/data/work/tensorflow/model/pixel_link/pixel_link-master/models/4s/model.ckpt-38055'
image_width = 1280
image_height = 768

# 配置初始化
def config_initialization():
    image_shape = (image_height, image_width)
        
    config.init_config(image_shape, 
                       batch_size = 1, 
                       pixel_conf_threshold = 0.8,
                       link_conf_threshold = 0.8,
                       num_gpus = 1, 
                   )
    
# 文本检测
def text_detect(img):
    with tf.name_scope('eval'):
        image = tf.placeholder(dtype=tf.int32, shape = [None, None, 3])
        image_shape = tf.placeholder(dtype = tf.int32, shape = [3, ])
        processed_image, _, _, _, _ = ssd_vgg_preprocessing.preprocess_image(image, None, None, None, None, 
                                                   out_shape = config.image_shape,
                                                   data_format = config.data_format, 
                                                   is_training = False)
        b_image = tf.expand_dims(processed_image, axis = 0)
        net = pixel_link_symbol.PixelLinkNet(b_image, is_training = True)
        global_step = slim.get_or_create_global_step()
    
    sess_config = tf.ConfigProto(log_device_placement = False, allow_soft_placement = True)
    sess_config.gpu_options.allow_growth = True
    
    saver = tf.train.Saver()
            
    checkpoint = util.tf.get_latest_ckpt(checkpoint_dir)
    bboxes = []

    with tf.Session(config = sess_config) as sess:
        saver.restore(sess, checkpoint)
        image_data = img

        pixel_pos_scores, link_pos_scores = sess.run(
            [net.pixel_pos_scores, net.link_pos_scores], 
            feed_dict = {
                image:image_data
        })

        mask = pixel_link.decode_batch(pixel_pos_scores, link_pos_scores)[0, ...]
        bboxes = pixel_link.mask_to_bboxes(mask, image_data.shape)

    return bboxes

(6) keras version of pixellink
The pixel link just introduced is based on the tensorflow version, and Daniel used Keras to rewrite the core code, and opened the source code of the keras version of pixel link on github. The specific usage is as follows:
a. Download the source code
 
The download address is https://github.com/opconty/pixellink_keras, which can be downloaded directly into a zip archive or a git clone

git clone https://github.com/opconty/pixellink_keras.git

b. Download the pre-trained model
The author did not retrain the model, but directly took the PixelLink-VGG 4s result of the tensorflow version of the training model and converted it into a Keras weight file. The download address is https://drive.google.com/file/d/1MK0AkvBMPZ-VfKN5m4QtSWWSUqoMHY33/view?usp=sharing

c. Install the basic environment
In addition to installing Keras, you also need to install the imutils dependency package

pip install imutils

If OpenCV 4.x is used, you need to modify line 220 of pixellink_utils.py, change the return result of cv2.findContours from 3 results to 2 results, and change _,cnts,_ to cnts,_

d. Run the model
Execute pixellink_eval.py to perform text detection, the command is as follows

python pixellink_eval.py


 
By default, the pictures that come with the project are detected (path ./samples/img_1099.jpg), and the detection effect is as follows:

 


If you want to specify an image for detection, you can modify line 21 in the pixellink_eval.py file, modify img_path to the specified image path, and then execute python pixellink_eval.py to perform text detection on the specified image

For the convenience of introduction, when the above text detection capabilities of AdvancedEAST and PixelLink are encapsulated, some codes such as loading model, text box prediction, and picture drawing text box are written together, but in actual production use, they are generally separated. If you want to know the detailed usage in the production environment, you can communicate by private message .

 

Welcome to follow my WeChat public account "Big Data and Artificial Intelligence Lab" (BigdataAILab) to get the  complete source code

 

Recommended related reading

1. AI combat series

2. Dahua Deep Learning Series

3. Graphical AI series

4. AI talk

5. Big data super detailed series

 

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324148572&siteId=291194637