Table of contents
0. Preface: Introduction to Baidu OCR Text Recognition
Baidu OCR text recognition is an intelligent service launched by Baidu. By using artificial intelligence technology, it can recognize the text content in the picture and convert it into an editable text format.
This service can be applied in many fields, such as the recognition of various documents such as ID cards, bank cards, and business licenses, the recognition of license plate numbers, the recognition of barcodes and QR codes in pictures, and so on.
Baidu OCR text recognition has the following characteristics:
-
High accuracy rate: It adopts multi-level deep neural network, which has a high accuracy rate of text recognition.
-
Multilingual support: Supports text recognition in multiple languages including Chinese, English, Japanese, and Korean.
-
Image processing function: For images with poor shooting environment, blurry or tilted images, the accuracy of text recognition can be improved by preprocessing the images.
-
Rich scene support: Support text recognition in different fields, including documents, license plates, QR codes, bills, etc.
-
Multi-platform support: Provide API-based services that can be easily integrated into various application platforms, such as web pages and mobile applications.
The application scenarios of Baidu OCR text recognition are very extensive and can be applied in all walks of life.
In the financial industry, it can be used to quickly identify bank cards, ID cards and other certificate information;
in the logistics industry, it can be used to identify the waybill number on the express delivery;
in the retail industry, it can be used to identify product barcodes, etc.
Through text recognition technology, work efficiency and accuracy can be greatly improved, and the cost and risk of manual operation can be reduced.
Baidu OCR text recognition is a leading artificial intelligence technology with extremely high accuracy and reliability in text recognition and processing, providing strong support for digital transformation and intelligent applications in various industries.
OS: Windows 10 Home Edition
Development environment: Pycahrm Community 2022.3
Python interpreter version: Python3.8
1. Receive a free identification model
Go here to register a free account
Baidu text recognition
Click to use now
to collect
Don’t take all of them, some have expiry dates , so it’s not necessarily wasteful to use them up
The rest I have no choice are all 365-day usage restrictions
Receive success
2. Create an instance
to create
Choose Personal Attribution, and Create Now
Then don’t close this page, it will be useful at that time
You can view the API documentation here to learn how to use
3. Code testing
Now let's use it, the following is the official demo, you just need to replace your own APIkey and Secretkey:
main.py
# coding=utf-8
import sys
import json
import base64
# 保证兼容python2以及python3
IS_PY3 = sys.version_info.major == 3
if IS_PY3:
from urllib.request import urlopen
from urllib.request import Request
from urllib.error import URLError
from urllib.parse import urlencode
from urllib.parse import quote_plus
else:
import urllib2
from urllib import quote_plus
from urllib2 import urlopen
from urllib2 import Request
from urllib2 import URLError
from urllib import urlencode
# 防止https证书校验不正确
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
API_KEY = '换成你自己的'
SECRET_KEY = '换成你自己的'
OCR_URL = "https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic"
""" TOKEN start """
TOKEN_URL = 'https://aip.baidubce.com/oauth/2.0/token'
"""
获取token
"""
def fetch_token():
params = {
'grant_type': 'client_credentials',
'client_id': API_KEY,
'client_secret': SECRET_KEY}
post_data = urlencode(params)
if (IS_PY3):
post_data = post_data.encode('utf-8')
req = Request(TOKEN_URL, post_data)
try:
f = urlopen(req, timeout=5)
result_str = f.read()
except URLError as err:
print(err)
if (IS_PY3):
result_str = result_str.decode()
result = json.loads(result_str)
if ('access_token' in result.keys() and 'scope' in result.keys()):
if not 'brain_all_scope' in result['scope'].split(' '):
print ('please ensure has check the ability')
exit()
return result['access_token']
else:
print ('please overwrite the correct API_KEY and SECRET_KEY')
exit()
"""
读取文件
"""
def read_file(image_path):
f = None
try:
f = open(image_path, 'rb')
return f.read()
except:
print('read image file fail')
return None
finally:
if f:
f.close()
"""
调用远程服务
"""
def request(url, data):
req = Request(url, data.encode('utf-8'))
has_error = False
try:
f = urlopen(req)
result_str = f.read()
if (IS_PY3):
result_str = result_str.decode()
return result_str
except URLError as err:
print(err)
if __name__ == '__main__':
# 获取access token
token = fetch_token()
# 拼接通用文字识别高精度url
image_url = OCR_URL + "?access_token=" + token
text = ""
# 读取测试图片
file_content = read_file('./text.png')
# 调用文字识别服务
result = request(image_url, urlencode({
'image': base64.b64encode(file_content)}))
# 解析返回结果
result_json = json.loads(result)
for words_result in result_json["words_result"]:
text = text + words_result["words"]
# 打印文字
print(text)
This is my picture:
text.png
4. Recognition results
Let's take a look at the recognition results, that's all: