Recognize special characters in OCR, such as: five-pointed star, peach heart (picture)
Error:could not find a version that satisfies the requirement pytesseract
pytesseraot.pytesseraot.TesseractlotFoundError: tesseact is not installed or it's not in you PATH.See READNE file for mowe infonmEation
Article directory
foreword
1. Solution – pytesseract
==ModuleNotFoundError: No module named ‘pytesseract‘=
pip install pytesseract fails numerous times
Open via webpage ---->> Links for pytesseract (pypi.org)
If it cannot be downloaded, then click F12 to download from the element
-
Download pytesseract-0.3.7.tar.gz
-
Put it in "python installation path"\Lib\site-packages\ Unzip
-
Enter the pytesseract folder, which contains setup.py
-
Run cmd here, enter the command:
-
python setup.py install
Solution - Tesseract
pytesseraot.pytesseraot.TesseractlotFoundError: tesseact is not installed or it's not in you PATH.See READNE file for mowe infonmEation.
Abnormal reason:
Just installed the pytesseract library through the PIP tool, and did not install the third-party OCR recognition toolkit. You need to download and install it and configure the environment. Check the
version
Check the version, enter CMD, and enter tesseract --version
C:\Users\Administrator>tesseract --version
tesseract v5.0.1.20220107
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5
Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0
problem solved
code demo
Extract image text
from paddleocr import PaddleOCR, draw_ocr
from PIL import Image
# load model
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改 lang参数进行切换
# lang参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`
ocr = PaddleOCR(lang="ch",
use_gpu=False,
det_model_dir="../../paddleORC_model/ch_ppocr_server_v2.0_det_infer/",
cls_model_dir="../ch_ppocr_mobile_v2.0_cls_infer/",
rec_model_dir="../ch_ppocr_server_v2.0_rec_infer/")
# load dataset
img_path = 'image2.png'
result = ocr.ocr(img_path)
for line in result:
print(line)
Problem: Heart cannot be extracted
Solution:
from pytesseract.build.lib.pytesseract import pytesseract
from PIL import Image
# 读取图片
img = Image.open('image2.png')
pytesseract.tesseract_cmd='D:\\Nlp_Room\Tesseract-OCR\\tesseract.exe'
# 使用pytesseract识别图片中的文本
text = pytesseract.image_to_string(img)
# 输出识别结果
print(text)
can be counted by the end