[pytesseract recognition]

Recognize special characters in OCR, such as: five-pointed star, peach heart (picture)


Error:could not find a version that satisfies the requirement pytesseract


pytesseraot.pytesseraot.TesseractlotFoundError: tesseact is not installed or it's not in you PATH.See READNE file for mowe infonmEation


foreword


1. Solution – pytesseract

==ModuleNotFoundError: No module named ‘pytesseract‘=

pip install pytesseract fails numerous times

Open via webpage ---->> Links for pytesseract (pypi.org)

If it cannot be downloaded, then click F12 to download from the element

insert image description here

  1. Download pytesseract-0.3.7.tar.gz

  2. Put it in "python installation path"\Lib\site-packages\ Unzip

  3. Enter the pytesseract folder, which contains setup.py

  4. Run cmd here, enter the command:

  5. python setup.py install

Solution - Tesseract

Tesseract is an open source OCR (Optical Character Recognition, optical character recognition) engine

pytesseraot.pytesseraot.TesseractlotFoundError: tesseact is not installed or it's not in you PATH.See READNE file for mowe infonmEation.

Abnormal reason:

Just installed the pytesseract library through the PIP tool, and did not install the third-party OCR recognition toolkit. You need to download and install it and configure the environment. Check the
insert image description here
version
Check the version, enter CMD, and enter tesseract --version

C:\Users\Administrator>tesseract --version
tesseract v5.0.1.20220107
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5
 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

problem solved

code demo

Extract image text

from paddleocr import PaddleOCR, draw_ocr
from PIL import Image

# load model
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改 lang参数进行切换
# lang参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`
ocr = PaddleOCR(lang="ch",
                use_gpu=False,
                det_model_dir="../../paddleORC_model/ch_ppocr_server_v2.0_det_infer/",
                cls_model_dir="../ch_ppocr_mobile_v2.0_cls_infer/",
                rec_model_dir="../ch_ppocr_server_v2.0_rec_infer/")

# load dataset
img_path = 'image2.png'
result = ocr.ocr(img_path)
for line in result:
    print(line)

Problem: Heart cannot be extracted
insert image description here
Solution:

from pytesseract.build.lib.pytesseract  import pytesseract
from PIL import Image

# 读取图片
img = Image.open('image2.png')
pytesseract.tesseract_cmd='D:\\Nlp_Room\Tesseract-OCR\\tesseract.exe'
# 使用pytesseract识别图片中的文本
text = pytesseract.image_to_string(img)

# 输出识别结果
print(text)

can be counted by the end
insert image description here

Guess you like

Origin blog.csdn.net/qq_42700796/article/details/130358584