Opencv multilingual natural scene text recognition system (source code & tutorial)

1. Research Background

Humans can quickly locate and recognize the text information they see in natural scenes, but it is more difficult for computers to do the same as humans. Developers have long wanted machines to recognize text in images as well. Of course, for natural scenes, the information in the image is complex or even inefficient. Objects next to the text, light and shadow, as well as font color, size, and writing style will all have varying degrees of impact on machine recognition. Image text recognition is usually divided into two parts: text detection and text recognition.
Text detection is the first and indispensable step in text recognition. In order for the machine to recognize the text information obtained in natural scenes, the machine must first know the location of the text information. Now there are many text detection solutions, but in the face of complex images, the anti-interference ability is often not satisfactory. Moreover, these images usually have different fonts (such as artistic text) and different shapes, which seriously affect the detection and recognition of text.

2. Multilingual selection module

9.png

3. Recognition effect display

4.png

6.png
2.png

3.png

7.png

8.png

4. Effect video demonstration

Opencv multilingual natural scene text recognition system (source code & tutorial)_哔哩哔哩_bilibili

5. Text recognition algorithm CRNN

Text recognition is the process of converting text images into computer-readable text. The input is the image of the candidate area cut out from the original image, and the output is the text sequence contained in the image. Current text recognition methods treat the recognition task as a sequence recognition task, and character segmentation can be omitted. Unlike general image classification tasks, the output of text recognition is a sequence of words, and the length is not fixed. As shown in the figure, the current text recognition methods are divided into two categories according to the different sequence modeling methods:
1. Text recognition algorithm based on CTC (Connectionist Temporal Classification)
2. Text recognition algorithm based on attention mechanism
image.png

The purpose of text recognition is to recognize the text pattern in the candidate area as a standard text. Since the text recognition algorithm is not the research focus of this paper, this section only introduces the CRNN text recognition algorithm used in this paper.
The main idea of ​​CRNN is to regard text recognition as the prediction of the sequence, rather than the text as an independent target, so the RNN network is used to predict the sequence. The main process of the algorithm can be divided into three parts: through the CNN network to extract Image features, and then use the BiSTLM (bidirectional long short-term memory) network to predict the sequence, and finally get the final result through the CTC transcription layer.
image.png

Code

class CRNN(nn.Module):
    def __init__(self, characters_classes, hidden=256, pretrain=True):
        super(CRNN, self).__init__()
        self.characters_class = characters_classes
        self.body = VGG()
        self.stage5 = nn.Conv2d(512, 512, kernel_size=(3, 2), padding=(1, 0))
        self.hidden = hidden
        self.rnn = nn.Sequential(BidirectionalLSTM(512, self.hidden, self.hidden),
                                 BidirectionalLSTM(self.hidden, self.hidden, self.characters_class))

        self.pretrain = pretrain
        if self.pretrain:
            import torchvision.models.vgg as vgg
            pre_net = vgg.vgg16(pretrained=True)
            pretrained_dict = pre_net.state_dict()
            model_dict = self.body.state_dict()
            pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
            model_dict.update(pretrained_dict)
            self.body.load_state_dict(model_dict)

            for param in self.body.parameters():
                param.requires_grad = False

    def forward(self, x):
        x = self.body(x)
        x = self.stage5(x)
        x = x.squeeze(3)
        x = x.permute(2, 0, 1).contiguous()
        x = self.rnn(x)
        x = F.log_softmax(x, dim=2)
        return x

6. Method based on improved CTC

Refer to this blog to directly convert the character feature sequence into character probability , and obtain the recognition Loss through CTC loss. Inspired by speech recognition, CRNN introduces CTC into image-based sequence recognition. CRNN is an end-to-end text sequence recognition system, including convolution module, recursive module and transcription module. In order to extract information from the relevant context, a circular convolutional neural network - LSTM is used, in which "gates" are used to control the forgetting of historical information and the updating of the current state. Star-net combines spatial transformation and CRNN, and introduces a spatial attention mechanism to correct text images with geometric distortion, so as to realize the recognition of scene text with geometric distortion. In order to avoid gradient degradation and gradient explosion during RNN training.
Gao proposed an end-to-end fully convolutional text recognition network, which uses CNN to capture long-term dependencies and replaces RNN to generate sequence features. This model greatly improves the recognition speed of the recognizer. Documents [1]-[2] also use neural networks combined with CTC to achieve accurate and robust recognition of inclined text in natural scenes, as shown in the figure below.

image.png

Code

def ctc(img, text_recs, adjust=False):
    """
    加载CTC模型,进行字符识别
    """
    results = {}
    xDim, yDim = img.shape[1], img.shape[0]

    for index, rec in enumerate(text_recs):
        xlength = int((rec[6] - rec[0]) * 0.1)
        ylength = int((rec[7] - rec[1]) * 0.2)
        if adjust:
            pt1 = (max(1, rec[0] - xlength), max(1, rec[1] - ylength))
            pt2 = (rec[2], rec[3])
            pt3 = (min(rec[6] + xlength, xDim - 2), min(yDim - 2, rec[7] + ylength))
            pt4 = (rec[4], rec[5])
        else:
            pt1 = (max(1, rec[0]), max(1, rec[1]))
            pt2 = (rec[2], rec[3])
            pt3 = (min(rec[6], xDim - 2), min(yDim - 2, rec[7]))
            pt4 = (rec[4], rec[5])

        degree = degrees(atan2(pt2[1] - pt1[1], pt2[0] - pt1[0]))  # 图像倾斜角度

        partImg = dumpRotateImage(img, degree, pt1, pt2, pt3, pt4)
        # dis(partImg)
        if partImg.shape[0] < 1 or partImg.shape[1] < 1 or partImg.shape[0] > partImg.shape[1]:  # 过滤异常图片
            continue
        text = recognizer.recognize(partImg)
        if len(text) > 0:
            results[index] = [rec]
            results[index].append(text)  # 识别文字

    return results

7. System integration

Below source code & environment deployment video tutorial & custom UI interface
1.png

References "Opencv Multilingual Natural Scene Text Recognition System (Source Code & Tutorial)"

8. References

[1] Han Yu. Text recognition application based on CNN and RPN technology [J]. Electromechanical Information. 2019, (21).90-91,93. DOI: 10.3969/j.issn.1671-0797.2019.21.046. [2
] ] Li Ying, Liu Juhua, Yi Yaohua. Character recognition method for natural scene images [J]. Packaging Engineering. 2018, (5). 168-172. [3]
Li Wenxuan, Sun Jifeng. Deep Boltzmann based on composite optimization Computer-based road sign text image recognition algorithm [J]. Computer Engineering and Science. 2018, (1)
. Research on natural scene text detection and recognition methods based on learning [D]. 2019
[5] Chen Guian. Research and implementation of end-to-end natural scene text detection and recognition neural network [D]. 2019 [
6] Baoguang Shi, Xiang Bai, Cong Yao.An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence.2017,39(11).2298-2304.

Guess you like

Origin blog.csdn.net/qunmasj/article/details/127700561