[AI combat] Teach you deep learning text recognition (text detection: based on MSER, CTPN, SegLink, EAST, etc.)

Text detection is a very important part in the process of text recognition. The main goal of text detection is to detect the position of the text area in the picture, so as to facilitate the subsequent text recognition. Only when the text area is found, the content can be detected. identify.

There are two main types of text detection scenarios, one is a simple scenario and the other is a complex scenario. Among them, the text detection of simple scenes is relatively simple, such as book scans, screenshots, or high-definition, regular photos, etc.; while complex scenes mainly refer to natural scenes, the situation is more complicated, such as street billboards, Product packaging boxes, descriptions on equipment, trademarks, etc., there are various situations such as complex background, flickering light, tilt angle, distortion, lack of clarity, etc., and it is more difficult to detect text. As shown below:

This article will introduce common text detection methods in simple and complex scenes, including morphological operations, MSER+NMS, CTPN, SegLink, EAST and other methods , and mainly introduce how to use these methods with the ICDAR scene text image dataset, as shown below:

1. Simple scene: morphological operation method

By using image morphological operations in computer vision, including basic operations of dilation and erosion, text detection in simple scenes can be achieved, such as detecting the position of text areas in screenshots, as shown in the following figure:

Among them, "expansion" is to expand the highlighted part in the image to make more white areas; "erosion" means that the highlighted part in the image is eroded, making more black areas. Through a series of operations of expansion and corrosion, the outline of the text area can be highlighted, and some border lines can be eliminated, and then the position of the text area can be calculated by the method of finding the outline. The main steps are as follows:

  • Read the image and convert it to grayscale
  • Image binarization, or denoising and then binarizing to simplify processing
  • Dilation, erosion operations, highlighting outlines, eliminating border lines
  • Find outlines and remove borders that do not meet the characteristics of text
  • Returns the border result of text detection

Through OpenCV, the above process can be easily realized. The core code is as follows:

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# 读取图片
imagePath = '/data/download/test1.jpg'
img = cv2.imread(imagePath)

# 转化成灰度图
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 利用Sobel边缘检测生成二值图
sobel = cv2.Sobel(gray, cv2.CV_8U, 1, 0, ksize=3)
# 二值化
ret, binary = cv2.threshold(sobel, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)

# 膨胀、腐蚀
element1 = cv2.getStructuringElement(cv2.MORPH_RECT, (30, 9))
element2 = cv2.getStructuringElement(cv2.MORPH_RECT, (24, 6))

# 膨胀一次,让轮廓突出
dilation = cv2.dilate(binary, element2, iterations=1)

# 腐蚀一次,去掉细节
erosion = cv2.erode(dilation, element1, iterations=1)

# 再次膨胀,让轮廓明显一些
dilation2 = cv2.dilate(erosion, element2, iterations=2)

#  查找轮廓和筛选文字区域
region = []
contours, hierarchy = cv2.findContours(dilation2, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for i in range(len(contours)):
    cnt = contours[i]

    # 计算轮廓面积,并筛选掉面积小的
    area = cv2.contourArea(cnt)
    if (area < 1000):
        continue

    # 找到最小的矩形
    rect = cv2.minAreaRect(cnt)
    print ("rect is: ")
    print (rect)

    # box是四个点的坐标
    box = cv2.boxPoints(rect)
    box = np.int0(box)

    # 计算高和宽
    height = abs(box[0][1] - box[2][1])
    width = abs(box[0][0] - box[2][0])

    # 根据文字特征,筛选那些太细的矩形,留下扁的
    if (height > width * 1.3):
        continue

    region.append(box)

# 绘制轮廓
for box in region:
    cv2.drawContours(img, [box], 0, (0, 255, 0), 2)

cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

The image processing process is shown in the following figure:

It can be seen that the text area in the image was successfully detected in the end.

This method is characterized by simple calculation and very fast processing, but it has very limited application scenarios in text detection. Readjustment can only be detected, and the effect will not be very good, as shown below. For example, the code described above is for the detection of black characters on a white background. If it is a dark background with white characters, the code needs to be readjusted. If necessary, you can privately message me to communicate .

 

2. Simple scene: MSER+NMS detection method

MSER (Maximally Stable Extremal Regions, Maximum Stable Extremal Regions) is a more popular traditional method of text detection (compared to AI text detection based on deep learning), which is widely used in traditional OCR. In some scenarios, Fast and accurate.

The MSER algorithm was proposed in 2002, mainly based on the idea of ​​watershed for detection. The idea of ​​the watershed algorithm comes from topography, and the image is regarded as a natural landform. The gray value of each pixel in the image represents the altitude of the point, and each local minimum value and area is called a catchment basin. Two catchment basins The boundary between them is a watershed, as shown below:

The processing process of MSER is as follows: a grayscale image is binarized with different thresholds. The threshold increases from 0 to 255. This incremental process is like the continuous rise of the water surface on a piece of land. As it continues to rise, some lower areas will gradually be submerged. From a bird's-eye view from the sky, the earth becomes two parts: land and water, and the water part is constantly expanding. In the process of "flooding", some connected areas in the image have little or no change, and this area is called the maximum stable extreme value area. On an image with text, the text area is consistent in color (gray value), so in the process of continuous growth of the horizontal plane (threshold), it will not be "submerged" at first until the threshold increases to the level of the text itself. It is "submerged" only when the grayscale value is used. This algorithm can be used to roughly locate the text area in the image.

It sounds like this processing process is very complicated. Fortunately, the MSER algorithm has been built in OpenCV, which can be called directly, which greatly simplifies the processing process.

The detection effect is as follows:

The result of detection is that there are various irregular detection frame shapes, and by reprocessing the coordinates of these boxes, they become rectangular boxes one by one. As shown below:

The core code is as follows:

# 读取图片
imagePath = '/data/download/test2.jpg'
img = cv2.imread(imagePath)

# 灰度化
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
orig = img.copy()

# 调用 MSER 算法
mser = cv2.MSER_create()
regions, _ = mser.detectRegions(gray)  # 获取文本区域
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]  # 绘制文本区域
cv2.polylines(img, hulls, 1, (0, 255, 0))
cv2.imshow('img', img)

# 将不规则检测框处理成矩形框
keep = []
for c in hulls:
    x, y, w, h = cv2.boundingRect(c)
    keep.append([x, y, x + w, y + h])
    cv2.rectangle(vis, (x, y), (x + w, y + h), (255, 255, 0), 1)
cv2.imshow("hulls", vis)

As can be seen from the above figure, many of the detection frames are overlapping. There are small frames in the big frame, and there are intersections between the frames. Some frames only circle the radicals or certain strokes of Chinese characters, and we expect to be able to circle The outer border of the text is displayed, which is convenient for subsequent text recognition. In order to deal with these many overlapping size boxes, the NMS method (Non Maximum Suppression, non-maximum value suppression) is generally used, that is, the elements of the non-maximum value are suppressed, that is, the box that is not the largest size is suppressed, which is equivalent to removing the large box. The included small box achieves the purpose of removing duplicate areas and finding the best detection position.

The main process of the NMS algorithm is as follows:

  • Sort all boxes by confidence score (or by coordinates if the box doesn't have a confidence score)
  • Take the box with the highest score
  • Traverse the overlap area (IoU) of this box with the rest of the boxes
  • Delete boxes with IoU greater than a certain threshold (the threshold can be set as needed, such as 0.3, 0.5, 0.8, etc.)
  • Take the box with the highest score and repeat the process

After the above steps, the last remaining is the text detection box that does not contain overlapping parts. The core code is as follows:

# NMS 方法(Non Maximum Suppression,非极大值抑制)
def nms(boxes, overlapThresh):
    if len(boxes) == 0:
        return []

    if boxes.dtype.kind == "i":
        boxes = boxes.astype("float")

    pick = []

    # 取四个坐标数组
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    # 计算面积数组
    area = (x2 - x1 + 1) * (y2 - y1 + 1)

    # 按得分排序(如没有置信度得分,可按坐标从小到大排序,如右下角坐标)
    idxs = np.argsort(y2)

    # 开始遍历,并删除重复的框
    while len(idxs) > 0:
        # 将最右下方的框放入pick数组
        last = len(idxs) - 1
        i = idxs[last]
        pick.append(i)

        # 找剩下的其余框中最大坐标和最小坐标
        xx1 = np.maximum(x1[i], x1[idxs[:last]])
        yy1 = np.maximum(y1[i], y1[idxs[:last]])
        xx2 = np.minimum(x2[i], x2[idxs[:last]])
        yy2 = np.minimum(y2[i], y2[idxs[:last]])

        # 计算重叠面积占对应框的比例,即 IoU
        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)
        overlap = (w * h) / area[idxs[:last]]

        # 如果 IoU 大于指定阈值,则删除
        idxs = np.delete(idxs, np.concatenate(([last], np.where(overlap > overlapThresh)[0])))

    return boxes[pick].astype("int")

The detection results after NMS processing are as follows:

As can be seen from the above figure, after MSER+NMS, the text area has been well detected and circled.

The MSER+NMS detection method is widely used in traditional OCR applications, and the detection speed is also very fast, which can meet certain text recognition scenarios. However, in complex natural scenes, especially those with complex backgrounds, the detection effect is not satisfactory, and some irrelevant factors will also be detected, as shown in the following figure:

【The key point is here】

The method to be introduced next is mainly an AI text detection method based on deep learning, which can be applied to complex natural scenes.

 

3. Complex scene: CTPN detection method

CTPN (Detecting Text in Natural Image with Connectionist Text Proposal Network) is a text detection method based on convolutional neural networks and recurrent neural networks. Its basic approach is to generate a series of text proposals of appropriate size ( Pre-selection box) to detect text lines. The schematic diagram is as follows. For the specific technical principle, please refer to the previous article (Article: Classical Model of Big Word Text Detection: CTPN )

CTPN detection method can adapt to more complex natural scenes, and is one of the commonly used methods for text detection in deep learning. The original author of CTPN provides the source code of the algorithm (https://github.com/tianzhi0549/CTPN), which is based on the caffe deep learning framework. You may be more familiar with tensorflow, so someone provided the tensorflow version of the CTPN program on github (https://github.com/eragonruan/text-detection-ctpn). The following describes how to use this program for text detection.

(1) Download source code and model

① First, download the source code of the CTPN program of the tensorflow version, which can be downloaded directly into a zip archive or a git clone

git clone https://github.com/eragonruan/text-detection-ctpn.git

② Next, compile and install, execute the following command

cd utils/bbox

chmod +x make.sh

./make.sh

③ Download the pre-trained model, the download address is https://pan.baidu.com/s/1BNHt_9fiqRPGmEXPaxaFXw, the downloaded compressed file is checkpoints_mlt.zip, create a new directory text-detection-ctpn, and extract the checkpoints_mlt folder Put it in the text-detection-ctpn directory

(2) CTPN text detection ability test

Put the picture in the data/demo directory (there is a test picture by default, if you want to detect your own picture, put your own picture in the data/demo directory), and then execute the following command, you can use CTPN for text detection

python ./main/demo.py

The detection result is stored in the data/res directory. The detection result consists of two files: the image, the location of the detection frame, and the confidence score information, as shown in the following figure:

After opening the file, as shown in the figure below, it can be seen that the text has been well detected:

Open other pictures, you can see the detection results are as follows, the detection effect is not bad, as shown below:

(3) CTPN text detection capability package

By slightly modifying the main/demo.py program, the CTPN detection capability can be encapsulated and provided to other programs to call. The core code is as follows:

# 基于 CTPN 的文字检测方法
# 输入:图片
# 返回:文本框位置和置信度分数
def text_detect(image):

    with tf.get_default_graph().as_default():
        # 模型参数定义
        input_image = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='input_image')
        input_im_info = tf.placeholder(tf.float32, shape=[None, 3], name='input_im_info')

        global_step = tf.get_variable('global_step', [], initializer=tf.constant_initializer(0), trainable=False)

        bbox_pred, cls_pred, cls_prob = model.model(input_image)

        variable_averages = tf.train.ExponentialMovingAverage(0.997, global_step)
        saver = tf.train.Saver(variable_averages.variables_to_restore())

        with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
            # 加载模型
            ckpt_state = tf.train.get_checkpoint_state(checkpoint_dir)
            model_path = os.path.join(checkpoint_dir, os.path.basename(ckpt_state.model_checkpoint_path))
            saver.restore(sess, model_path)

            # 预测文本框位置
            img = image
            h, w, c = img.shape
            im_info = np.array([h, w, c]).reshape([1, 3])
            bbox_pred_val, cls_prob_val = sess.run([bbox_pred, cls_prob],
                                                   feed_dict={input_image: [img],
                                                              input_im_info: im_info})

            textsegs, _ = proposal_layer(cls_prob_val, bbox_pred_val, im_info)
            scores = textsegs[:, 0]
            textsegs = textsegs[:, 1:5]

            textdetector = TextDetector(DETECT_MODE='H')
            boxes = textdetector.detect(textsegs, scores[:, np.newaxis], img.shape[:2])
            boxes = np.array(boxes, dtype=np.int)

    return boxes,scores

From the above detection results, CTPN detection method has better detection effect in complex natural scenes.

 

4. Complex scene: SegLink detection method

Although the text detection effect of CTPN in natural scenes is not bad, the detection effect of CTPN is based on the horizontal direction, which is not good for non-horizontal text detection. In natural scenes, there are a lot of text information with a certain rotation and inclination angle, such as billboards on the street. The SegLink detection method introduced next can realize multi-angle detection of rotated text. This model mainly detects text through Segment (slice) and Link (link). The schematic diagram is as follows. For the specific technical principle, please refer to the previous article. (Article: Classic Model of Big Word Text Detection: SegLink )

https://oscimg.oschina.net/oscnet/c8917ae95caadfb03dc1339ce8989a0594e.jpg

The following describes how to use SegLink to detect text.

(1) Download source code and model

① First, download the SegLink source code of the tensorflow version on github (https://github.com/dengdan/seglink), which can be downloaded directly into a zip archive or a git clone

git clone https://github.com/dengdan/seglink.git

② Download pylib, the download path is https://github.com/dengdan/pylib/tree/f7f5c5503fbb3d9593e6ac3bbf0b8508f53ee1cf, after decompression, put the util file in src under the pylib directory, then add it to the environment variable, and add in front of test_seglink.py superior

import sys

sys.path.append('/data/PycharmProjects/tensorflow/ocr/seglink/util')

Or execute the following command in the current window, or add the following command in /etc/profile, ~/.bashrc file

export PYTHONPATH=xx:$PYTHONPATH

③ Download the pre-trained model (based on SynthText, IC15 dataset), the author provides two pre-trained models seglink-384 (based on 384x384 pictures), seglink-512 (based on 512x512 pictures), the download address is https ://pan.baidu.com/s/1slqaYux

④ Install dependencies

conda install -c cachemeorg setproctitle

#或以下命令

#pip install setproctitle

⑤ If python uses python3, you need to make the following modifications (if python 2.x is used, please ignore)

  • Modify test_seglink.py line 69, line 133, line 139, line 144, line 145, line 146, add parentheses after print
  • Modify pylib/util/io_.py, modify line 11, and modify import cPickle as pkl to import pickle as pkl
  • Modify pylib/util/io_.py, modify line 12, and change import commands to import subprocess as commands
  • Modify pylib/util/caffe_.py, modify lines 29, 46, 47, and 50, and add parentheses after print
  • Modify pylib/util/tf.py, modify line 41, and change xrange to range
  • Modify config.py, modify line 129, and change xrange to range
  • Modify tf_extended/seglink.py, modify line 337, line 625, line 626, line 759, line 761, change xrange to range
  • Modify test_seglink.py, line 153, and comment out print(util.cmd.cmd(cmd))

⑥ Modify ./tf_extended/seglink.py, line 808, there is no cv.BoxPoints() function in opencv3, and the modification is as follows:

# points = cv2.cv.BoxPoints(bbox)   #opencv2.4.9

points = cv2.boxPoints(bbox)       #opencv3.1.0

 

(2) SegLink detection text test (text box coordinates)

Test by running the following command

./scripts/test.sh 0 GPU_ID CKPT_PATH DATASET_DIR

The command consists of three parameters, the 1st represents the GPU, the 2nd represents the model path, and the 3rd represents the data directory. For example, we use the seglink-384 pre-training model we just downloaded, put the images to be detected in the specified directory for testing (you can use your own images, or use the scene text image dataset ICDAR2015 for testing, the download address is http:// rrc.cvc.uab.es/?ch=4&com=downloads), then the executed script is as follows:

./scripts/test.sh 0 ./models/seglink-512/model.ckpt-217867  ./dataset/ICDAR2015 /ch4_test_images

After the detection, the text box position (8 coordinate points) detected by the picture is generated and stored in the txt file, as shown below:

From the results of these detected text box positions, it is not explicit, and I don't know how the actual detection effect in the picture is.

(3) SegLink detection text test (result explicit)

In order to explicitly display the image results of text detection, the following commands can be used to display them in the format:

python visualize_detection_result.py \

    --image=检测的图片所在目录

    --det=经过test_seglink.py检测输出的文本框位置坐标

--output=指定将文本框位置绘制到图片上的输出目录

This command consists of three parameters, the first is the input image, the second is the text information of the output detection result, and the third is the image of the output detection result

① Add environment variables in visualize_detection_result.py

import sys

sys.path.append('/data/PycharmProjects/tensorflow/ocr/seglink/util')

② If python uses python3, add parentheses after print to line 65 of visualize_detection_result.py

 

Visually display the test result information just output, and call the command as follows (take the ICDAR2015 test image set as an example, if you want to use your own photos, please replace the image directory):

python visualize_detection_result.py \

    --image=./dataset/ICDAR2015/ ch4_test_images/  \

    --det=./models/seglink-512/model.ckpt-217867/test/icdar2015_test/model.ckpt-217867/seg_link_conf_th_0.800000_0.500000/txt \

    --output=./dataset/output

After execution, you can see that the result picture after detection is directly output, as shown below:

Open other pictures, the detection effect is as follows:

From the above detection results, the text in natural scenes can be well detected, especially some text with a certain tilt or rotation angle can also be detected.

(4) SegLink text detection capability package

In order to facilitate calling the detection capability of SegLink in other programs, encapsulating and transforming the code based on test_seglink.py and visualize_detection_result.py can encapsulate the detection capability of SegLink and provide it to other program calls. The core code is as follows:

# 基于 SegLink 的文字检测方法
# 输入:图片
# 返回:文本框位置
def text_detect(img):
    with tf.name_scope('eval'):
        with tf.variable_scope(tf.get_variable_scope(),reuse=True):
            # 模型参数
            image = tf.placeholder(dtype=tf.int32, shape=[None, None, 3])
            image_shape = tf.placeholder(dtype=tf.int32, shape=[3, ])
            # 预处理图片
            processed_image, _, _, _, _ = ssd_vgg_preprocessing.preprocess_image(image, None, None, None, None,
                                                                                 out_shape=config.image_shape,
                                                                                 data_format=config.data_format,
                                                                                 is_training=False)
            b_image = tf.expand_dims(processed_image, axis=0)
            b_shape = tf.expand_dims(image_shape, axis=0)
            # 预测文本框
            net = seglink_symbol.SegLinkNet(inputs=b_image, data_format=config.data_format)
            bboxes_pred = seglink.tf_seglink_to_bbox(net.seg_scores, net.link_scores,
                                                     net.seg_offsets,
                                                     image_shape=b_shape,
                                                     seg_conf_threshold=config.seg_conf_threshold,
                                                     link_conf_threshold=config.link_conf_threshold)

    sess_config = tf.ConfigProto(log_device_placement=False, allow_soft_placement=True)
    sess_config.gpu_options.allow_growth = True

    saver = tf.train.Saver()
    if util.io.is_dir(checkpoint_dir):
        checkpoint = util.tf.get_latest_ckpt(checkpoint_dir)
    else:
        checkpoint = checkpoint_dir

    with tf.Session(config=sess_config) as sess:
        # 加载模型
        saver.restore(sess, checkpoint)
        # 预测文本框
        image_data = img
        image_bboxes = sess.run([bboxes_pred], feed_dict={image: image_data, image_shape: image_data.shape})
        bboxes = image_bboxes[0]

    return bboxes

 

5. Complex scene: EAST detection method

The CTPN detection method and the SegLink detection method realize the detection of text by first predicting proposals (pre-selection boxes) and segments (slices), and then regressing, merging, etc. The intermediate process is relatively long. The EAST detection method introduced next reduces the intermediate process to only two stages of FCN (full convolution network) and NMS (non-maximum value suppression), and the output results support multiple angle detection of text lines and words. It is not only efficient and accurate, but also adaptable to a variety of natural application scenarios, as shown in the figure below. For the specific technical principles, please refer to the previous article (article: Classical model of big talk text detection: EAST )

https://oscimg.oschina.net/oscnet/e0c69cf042328840c3312b25619d6fe4b76.jpg

The following describes how to use EAST to detect text.

(1) Download source code and model

① First download the source code of EAST on github (https://github.com/argman/EAST), which can be downloaded directly into a zip archive or a git clone

git clone https://github.com/argman/EAST.git

② Download the pre-trained model file (based on ICDAR 2013, ICDAR 2015 dataset training) on ​​Baidu network disk, the download address is http://pan.baidu.com/s/1jHWDrYQ

③ To install the shapely dependency package, execute the following command

conda install shapely

# 或执行以下命令

# pip install shapely

(2) EAST detection text test (demo page)

Enter the EAST-master directory and execute the following command to start the demo page

python run_demo_server.py –checkpoint_path model/east_icdar2015_resnet_v1_50_rbox/

The page will load the output result image by default. There is no result output when the page is loaded for the first time, so a 404 will be prompted, which does not affect the subsequent use.

After executing the command, you can start the web service, enter http://localhost:8769 in the browser, and open the demo page, as shown below:

Click "Select File" to select the image to be tested, and click "Submit" to submit for testing. After testing, the detected image will be displayed on the page. Three images are randomly selected, and the testing effect is as follows:

The author also provides an online demo page very intimately, so that users can directly experience and use it. The usage method is the same as the demo page above. The website link is http://east.zxytim.com/

(3) EAST detection text test (batch detection)

You can call a batch of pictures through the command line to detect text in batches, or use the ICDAR picture data set just now for detection (if you want to detect your own pictures, please replace the data directory), the command is as follows:

python eval.py –test_data_path=/data/work/tensorflow/model/seglink/ICDAR2015/ch4_test_images/ --checkpoint_path=model/east_icdar2015_resnet_v1_50_rbox/ --output_dir=/tmp/east

After executing this command, the images will be read in batches for detection, and the detection results will be output, including the position of the text box detected in the picture, and the picture after the detection results frame the text, as shown in the following figure:

It can also be seen from the above figure that EAST can also better detect the text of natural scenes, and can also accurately detect some of the text with rotation angle.

(4) EAST text detection capability package

In order to conveniently provide EAST to other code calls, by modifying eval.py, the method of encapsulating EAST text detection can be directly called by other codes. The code is as follows:

# 基于 EAST 的文字检测方法
# 输入:图片
# 返回:文本框位置相关信息
def text_detect(img):
    # 模型路径
    checkpoint_path='/data/PycharmProjects/tensorflow/ocr/east/model/east_icdar2015_resnet_v1_50_rbox/'

    # 模型参数
    input_images = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='input_images')
    global_step = tf.get_variable('global_step', [], initializer=tf.constant_initializer(0), trainable=False)

    f_score, f_geometry = model.model(input_images, is_training=False)

    variable_averages = tf.train.ExponentialMovingAverage(0.997, global_step)
    saver = tf.train.Saver(variable_averages.variables_to_restore())

    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))

    # 加载模型
    ckpt_state = tf.train.get_checkpoint_state(checkpoint_path)
    model_path = os.path.join(checkpoint_path, os.path.basename(ckpt_state.model_checkpoint_path))
    saver.restore(sess, model_path)

    # 预测文本框
    im_resized, (ratio_h, ratio_w) = resize_image(img)
    score, geometry = sess.run(
        [f_score, f_geometry],
        feed_dict={input_images: [im_resized[:,:,::-1]]})

    boxes,_ = detect(score_map=score, geo_map=geometry, timer=collections.OrderedDict([('net', 0),('restore', 0),('nms', 0)]))

    if boxes is not None:
        scores = boxes[:,8].reshape(-1)
        boxes = boxes[:, :8].reshape((-1, 4, 2))
        boxes[:, :, 0] /= ratio_w
        boxes[:, :, 1] /= ratio_h

    text_lines = []
    if boxes is not None:
        text_lines = []
        for box, score in zip(boxes, scores):
            box = sort_poly(box.astype(np.int32))
            if np.linalg.norm(box[0] - box[1]) < 5 or np.linalg.norm(box[3]-box[0]) < 5:
                continue
            tl = collections.OrderedDict(zip(
                ['x0', 'y0', 'x1', 'y1', 'x2', 'y2', 'x3', 'y3'],
                map(float, box.flatten())))
            tl['score'] = float(score)
            text_lines.append(tl)
    ret = {
        'text_lines': text_lines,
    }
    return ret

 

For the convenience of introduction, when the above text detection capabilities of CTPN, SegLink, and EAST are encapsulated, the code for loading model, text box prediction, picture drawing text box, etc. is written together, but in actual production use, it is generally separated. When the OCR service capability is activated, the model is preloaded, and then the core text detection and recognition capabilities are provided. Whether the output result draws the text box on the image depends on the specific demand scenario. In the production environment, how to encapsulate AI capabilities more effectively can be communicated by private message .

 

Welcome to follow my WeChat public account "Big Data and Artificial Intelligence Lab" (BigdataAILab) to get the  complete source code

 

Recommended related reading

1. AI combat series

2. Dahua Deep Learning Series

3. AI talk

4. Big data super detailed series

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324243733&siteId=291194637