Tesseract uses OpenCV for text detection

I'm not dawdling, it's just that no one sees me when I'm working hard.

1. What is Tesseract?

Tesseract is an open source OCR (Optical Character Recognition) engine . OCR is a technology that can identify and parse the text content in images so that computers can understand and process these texts.
Tesseract provides a wealth of configuration options and interfaces, allowing developers to customize and integrate according to their own needs and scenarios.
By using Tesseract, you can input an image containing text (such as a scanned document, photo or screenshot) into the engine, and then Tesseract will extract the text information in the image through a series of image processing and pattern recognition technologies. It converts recognized text into text content that can be edited and searched by computers.

Simply put, Tesseract is a powerful OCR engine suitable for extracting text from images and converting it into computer-processable text form. It is widely used in many fields and applications such as scanning and digitizing documents, automated data entry, library and archive management, etc.

portal

2. Create a development environment

Use conda to create a development environment named openCV

conda create -n openCV

Introducing the openCV package

pip install opencv-python

Introduce pytesseract package

3. Code practice

Detect strings in pictures and print them

First prepare a picture in the following format

Write code analysis

testDectection.py

import cv2
import pytesseract

img = cv2.imread('1.png')  # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式
print(pytesseract.image_to_string(img))  # 调用pytesseract引擎将图片中的内容输出出来
cv2.imshow('result', img)  # 显示
cv2.waitKey(0)

output

The above is a simple method to obtain the original information of the image by using pytesseract.

Detect characters in the picture and mark them with red boxes

code

import cv2
import pytesseract

img = cv2.imread('1.png')  # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式

# Detecting Characters
hImg, wImg, _ = img.shape  # 找出图片的宽度和高度
boxes = pytesseract.image_to_boxes(img)  # 使用pytesseract找出图片中字符的坐标位置
for c in boxes.splitlines():
    c = c.split(' ')
    print(c)
    x, y, w, h = int(c[1]), int(c[2]), int(c[3]), int(c[4])
    cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (0, 0, 255), 3)    # 使用opencv画框框，使用红色，厚度为3

cv2.imshow('result', img)  # 显示
cv2.waitKey(0)

Enter two pictures

1.png

2.png

output

The coordinates of each detected string

Add recognized text content to images

import cv2
import pytesseract

img = cv2.imread('1.png')  # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式

# Detecting Characters
hImg, wImg, _ = img.shape  # 找出图片的宽度和高度
boxes = pytesseract.image_to_boxes(img)  # 使用pytesseract找出图片中字符的坐标位置
for c in boxes.splitlines():
    c = c.split(' ')
    print(c)
    x, y, w, h = int(c[1]), int(c[2]), int(c[3]), int(c[4])
    cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (0, 0, 255), 3)  # 使用opencv画框框，使用红色，厚度为3
    cv2.putText(img, c[0], (x, hImg - y + 25), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2)   # 向图像中添加文本

cv2.imshow('result', img)  # 显示
cv2.waitKey(0)

The essential

cv2.putText(img, c[0], (x, hImg - y + 25), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2)

putTextThis line of code adds text to the image using functions from the OpenCV library .

The explanation is as follows:

img: Indicates the image to which text is to be added.
c[0]: Indicates the text content to be added, c[0]which may be a string variable used to specify the text to be added.
(x, hImg - y + 25): Represents the starting position of the text, which is a tuple (x, y), which xrepresents the abscissa of the text and hImg - y + 25the ordinate of the text. hImgProbably the height of the entire image, ya variable used to position the top position of the outline of the white text. This hImg - y + 25allows the text to appear some distance below the outline.
cv2.FONT_HERSHEY_COMPLEX: Indicates the font type used. A complex font type is used here.
1: Indicates the font scaling factor of the text, 1indicating the original size.
(50, 50, 255): Indicates the color of the text, which is a tuple (B, G, R), in which B, G, and Rrespectively represent the values of the blue, green, and red channels. In this example, the text color is a dark red.
2: Indicates the line width of the text, that is, the width of the text border. This is set 2to make the text border thicker.

output

Detect consecutive strings

In practice, we generally don’t pay attention to one character, but more to the connected strings.

import cv2
import pytesseract

img = cv2.imread('1.png')  # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式

# Detecting Characters
hImg, wImg, _ = img.shape  # 找出图片的宽度和高度
boxes = pytesseract.image_to_data(img)  # 使用pytesseract找出图片中字符的坐标位置
for x, c in enumerate(boxes.splitlines()):
    if x != 0:
        c = c.split()
        print(c)
        if len(c) == 12:
            x, y, w, h = int(c[6]), int(c[7]), int(c[8]), int(c[9])
            cv2.rectangle(img, (x, y), (x + w, h + y), (0, 0, 255), 3)  # 使用opencv画框框，使用红色，厚度为3
            cv2.putText(img, c[11], (x, y), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2)  # 向图像中添加文本

cv2.imshow('result', img)  # 显示
cv2.waitKey(0)

output

The meaning of each field:

level: Represents the level of text in the page. The levels here start from 1, indicating the nesting level of text.
page_num: Represents the page number where the text is located. In a multi-page document, each page has a unique page number.
block_num: Represents the number of the text block where the text is located. A text block is a rectangular area in a document that contains multiple paragraphs or lines.
par_num: Represents the number of the paragraph in which the text is located. A paragraph is a section of text in a document, usually consisting of a group of related sentences.
line_num: Represents the number of the line of text. A line is usually a line of text within a paragraph.
word_num: Represents the number of the word where the text is located. A word is the smallest unit of text, usually consisting of one or more characters.
left: Represents the position of the left border of the text area relative to the page.
top: Represents the position of the upper boundary of the text area relative to the page.
width: Represents the width of the text area.
height: Represents the height of the text area.
conf: Represents the confidence of the text, usually between 0 and 100. Confidence indicates how trustworthy the OCR algorithm is for the text it recognizes.
text: Represents the recognized text content.

Only recognize numbers in pictures

import cv2
import pytesseract

img = cv2.imread('1.png')  # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式

# Detecting Characters
hImg, wImg, _ = img.shape  # 找出图片的宽度和高度
cong = r'--oem 3 --psm 6 outputbase digits'
boxes = pytesseract.image_to_data(img, config=cong)  # 使用pytesseract找出图片中字符的坐标位置
for x, c in enumerate(boxes.splitlines()):
    if x != 0:
        c = c.split()
        print(c)
        if len(c) == 12:
            x, y, w, h = int(c[6]), int(c[7]), int(c[8]), int(c[9])
            cv2.rectangle(img, (x, y), (x + w, h + y), (0, 0, 255), 3)  # 使用opencv画框框，使用红色，厚度为3
            cv2.putText(img, c[11], (x, y), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2)  # 向图像中添加文本

cv2.imshow('result', img)  # 显示
cv2.waitKey(0)

focus

cong = r'--oem 3 --psm 6 outputbase digits'
boxes = pytesseract.image_to_data(img, config=cong)

Parameter explanation:

oemIs a parameter used to specify the OCR Engine Mode of the OCR engine. OCR engine mode controls Tesseract's behavior and algorithms during text recognition.
psmIt is a Page Segmentation Mode that is used to specify how the OCR engine handles page layout and segmentation issues when recognizing text. psmParameters control how Tesseract segments images into individual characters, words, lines, and blocks of text when recognizing text.