I'm not dawdling, it's just that no one sees me when I'm working hard.
1. What is Tesseract?
- Tesseract is an open source OCR (Optical Character Recognition) engine . OCR is a technology that can identify and parse the text content in images so that computers can understand and process these texts.
- Tesseract provides a wealth of configuration options and interfaces, allowing developers to customize and integrate according to their own needs and scenarios.
- By using Tesseract, you can input an image containing text (such as a scanned document, photo or screenshot) into the engine, and then Tesseract will extract the text information in the image through a series of image processing and pattern recognition technologies. It converts recognized text into text content that can be edited and searched by computers.
Simply put, Tesseract is a powerful OCR engine suitable for extracting text from images and converting it into computer-processable text form. It is widely used in many fields and applications such as scanning and digitizing documents, automated data entry, library and archive management, etc.
2. Create a development environment
Use conda to create a development environment named openCV
conda create -n openCV
Introducing the openCV package
pip install opencv-python
Introduce pytesseract package
3. Code practice
Detect strings in pictures and print them
First prepare a picture in the following format
Write code analysis
testDectection.py
import cv2
import pytesseract
img = cv2.imread('1.png') # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式
print(pytesseract.image_to_string(img)) # 调用pytesseract引擎将图片中的内容输出出来
cv2.imshow('result', img) # 显示
cv2.waitKey(0)
output
The above is a simple method to obtain the original information of the image by using pytesseract.
Detect characters in the picture and mark them with red boxes
code
import cv2
import pytesseract
img = cv2.imread('1.png') # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式
# Detecting Characters
hImg, wImg, _ = img.shape # 找出图片的宽度和高度
boxes = pytesseract.image_to_boxes(img) # 使用pytesseract找出图片中字符的坐标位置
for c in boxes.splitlines():
c = c.split(' ')
print(c)
x, y, w, h = int(c[1]), int(c[2]), int(c[3]), int(c[4])
cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (0, 0, 255), 3) # 使用opencv画框框,使用红色,厚度为3
cv2.imshow('result', img) # 显示
cv2.waitKey(0)
Enter two pictures
1.png
2.png
output
The coordinates of each detected string
Add recognized text content to images
import cv2
import pytesseract
img = cv2.imread('1.png') # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式
# Detecting Characters
hImg, wImg, _ = img.shape # 找出图片的宽度和高度
boxes = pytesseract.image_to_boxes(img) # 使用pytesseract找出图片中字符的坐标位置
for c in boxes.splitlines():
c = c.split(' ')
print(c)
x, y, w, h = int(c[1]), int(c[2]), int(c[3]), int(c[4])
cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (0, 0, 255), 3) # 使用opencv画框框,使用红色,厚度为3
cv2.putText(img, c[0], (x, hImg - y + 25), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2) # 向图像中添加文本
cv2.imshow('result', img) # 显示
cv2.waitKey(0)
The essential
cv2.putText(img, c[0], (x, hImg - y + 25), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2)
putText
This line of code adds text to the image using functions from the OpenCV library .
The explanation is as follows:
img
: Indicates the image to which text is to be added.c[0]
: Indicates the text content to be added,c[0]
which may be a string variable used to specify the text to be added.(x, hImg - y + 25)
: Represents the starting position of the text, which is a tuple(x, y)
, whichx
represents the abscissa of the text andhImg - y + 25
the ordinate of the text.hImg
Probably the height of the entire image,y
a variable used to position the top position of the outline of the white text. ThishImg - y + 25
allows the text to appear some distance below the outline.cv2.FONT_HERSHEY_COMPLEX
: Indicates the font type used. A complex font type is used here.1
: Indicates the font scaling factor of the text,1
indicating the original size.(50, 50, 255)
: Indicates the color of the text, which is a tuple(B, G, R)
, in whichB
,G
, andR
respectively represent the values of the blue, green, and red channels. In this example, the text color is a dark red.2
: Indicates the line width of the text, that is, the width of the text border. This is set2
to make the text border thicker.
output
Detect consecutive strings
In practice, we generally don’t pay attention to one character, but more to the connected strings.
import cv2
import pytesseract
img = cv2.imread('1.png') # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式
# Detecting Characters
hImg, wImg, _ = img.shape # 找出图片的宽度和高度
boxes = pytesseract.image_to_data(img) # 使用pytesseract找出图片中字符的坐标位置
for x, c in enumerate(boxes.splitlines()):
if x != 0:
c = c.split()
print(c)
if len(c) == 12:
x, y, w, h = int(c[6]), int(c[7]), int(c[8]), int(c[9])
cv2.rectangle(img, (x, y), (x + w, h + y), (0, 0, 255), 3) # 使用opencv画框框,使用红色,厚度为3
cv2.putText(img, c[11], (x, y), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2) # 向图像中添加文本
cv2.imshow('result', img) # 显示
cv2.waitKey(0)
output
The meaning of each field:
level
: Represents the level of text in the page. The levels here start from 1, indicating the nesting level of text.page_num
: Represents the page number where the text is located. In a multi-page document, each page has a unique page number.block_num
: Represents the number of the text block where the text is located. A text block is a rectangular area in a document that contains multiple paragraphs or lines.par_num
: Represents the number of the paragraph in which the text is located. A paragraph is a section of text in a document, usually consisting of a group of related sentences.line_num
: Represents the number of the line of text. A line is usually a line of text within a paragraph.word_num
: Represents the number of the word where the text is located. A word is the smallest unit of text, usually consisting of one or more characters.left
: Represents the position of the left border of the text area relative to the page.top
: Represents the position of the upper boundary of the text area relative to the page.width
: Represents the width of the text area.height
: Represents the height of the text area.conf
: Represents the confidence of the text, usually between 0 and 100. Confidence indicates how trustworthy the OCR algorithm is for the text it recognizes.text
: Represents the recognized text content.
Only recognize numbers in pictures
import cv2
import pytesseract
img = cv2.imread('1.png') # 使用opencv将图片读进来
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 将图片的颜色通道格式由BGR转化成pytesseract能识别的RGB格式
# Detecting Characters
hImg, wImg, _ = img.shape # 找出图片的宽度和高度
cong = r'--oem 3 --psm 6 outputbase digits'
boxes = pytesseract.image_to_data(img, config=cong) # 使用pytesseract找出图片中字符的坐标位置
for x, c in enumerate(boxes.splitlines()):
if x != 0:
c = c.split()
print(c)
if len(c) == 12:
x, y, w, h = int(c[6]), int(c[7]), int(c[8]), int(c[9])
cv2.rectangle(img, (x, y), (x + w, h + y), (0, 0, 255), 3) # 使用opencv画框框,使用红色,厚度为3
cv2.putText(img, c[11], (x, y), cv2.FONT_HERSHEY_COMPLEX, 1, (50, 50, 255), 2) # 向图像中添加文本
cv2.imshow('result', img) # 显示
cv2.waitKey(0)
focus
cong = r'--oem 3 --psm 6 outputbase digits'
boxes = pytesseract.image_to_data(img, config=cong)
Parameter explanation:
oem
Is a parameter used to specify the OCR Engine Mode of the OCR engine. OCR engine mode controls Tesseract's behavior and algorithms during text recognition.psm
It is a Page Segmentation Mode that is used to specify how the OCR engine handles page layout and segmentation issues when recognizing text.psm
Parameters control how Tesseract segments images into individual characters, words, lines, and blocks of text when recognizing text.