Python call free Chinese character recognition model Daquan

There are now some free models for recognizing Chinese characters, some of which include:

Tesseract OCR: It can be used for Chinese text recognition, and it is open source and free.
EasyOCR: OCR model for multiple languages and multiple character sets, supports Chinese characters, and is open source and free.
PaddleOCR: Developed by PaddlePaddle, it supports multiple languages and multiple character sets, including Chinese characters. It is open source and free.
CRNN: Bidirectional long-short-term memory network, which can be used for text recognition, including Chinese characters, is open source and free.

But it should be noted that the performance and accuracy of these models vary due to many factors, and you need to choose the appropriate model according to your specific needs and application scenarios.

The following is a code example of Python calling the above models:

Tesseract OCR：

First, you need to install the Tesseract OCR and pytesseract libraries, and then use the following code to call in Python:

import pytesseract
from PIL import Image

# 读取图像
image = Image.open('sample.jpg')

# 图片中文本识别
text = pytesseract.image_to_string(image, lang='chi_sim')

# 打印结果
print(text)

EasyOCR：

You need to install the EasyOCR library, and then use the following code to call it in Python:

import easyocr

# 加载模型
reader = easyocr.Reader(['ch_sim'])

# 读取图像
image = 'sample.jpg'

# 图片中文本识别
results = reader.readtext(image)

# 打印结果
for result in results:
    print(result[1])

PaddleOCR：

You need to install the PaddleOCR library, and then use the following code to call it in Python:

import paddleocr

# 加载模型
ocr = paddleocr.OCR()

# 读取图像
image = 'sample.jpg'

# 图片中文本提取
results = ocr.ocr(image)

# 打印结果
for line in results:
    for word in line:
        print(word[1])

CRNN：

Need to install TensorFlow and Keras libraries, and then use the following code in Python to call:

from crnn import crnn

# 初始化模型
model = crnn.CRNN()

# 读取图像
image = 'sample.jpg'

# 图片中文本识别
text = model.predict(image)

# 打印结果
print(text)

It should be noted that before running each example, you need to replace imagethe variable with your image path. At the same time, each library in these four examples has more options and parameters that can be customized, and you can check their official documents for details.

According to my search results, you can use the following methods to call these models with python:

For EasyOCR ¹ , you can install it with pip, then use the following code to create a reader object, and use it to recognize the text in the image:

import easyocr
reader = easyocr.Reader(['ch_sim','en']) # specify languages
result = reader.readtext('chinese.jpg') # read text from image

For Handwriting-Chinese-Characters-Recognition ² , you can download it from GitHub, then use the following code to load the model and use it to recognize handwritten Chinese characters:

import tensorflow as tf
model = tf.keras.models.load_model('model.h5') # load model
image = tf.keras.preprocessing.image.load_img('handwriting.jpg', color_mode='grayscale') # load image
image = tf.keras.preprocessing.image.img_to_array(image) # convert image to array
image = image.reshape(1, 64, 64, 1) # reshape image
prediction = model.predict(image) # predict character

For Scanner & Translator ³ , you can download it from the App Store, then use the following code to call its API, and use it to recognize and translate text in pictures:

import requests
url = 'https://api.scanner-translator.com/v1/ocr' # api url
headers = {
    
    'Authorization': 'Bearer <your_token>'} # api token
files = {
    
    'file': open('chinese.jpg', 'rb')} # image file
params = {
    
    'lang': 'zh-CN'} # language code
response = requests.post(url, headers=headers, files=files, params=params) # send request
data = response.json() # get response data
text = data['text'] # get text from data
translation = data['translation'] # get translation from data

Python call free Chinese character recognition model Daquan

Guess you like