Table of contents
1.1. Login to Baidu AI Open Platform
1.2, enter the console - text recognition
2. python download baidu-aip library
3. There are two ways to call the api to send a request and get the text recognition result
3.1. Interaction through AipOcr
3.1.3. The user requests the service to recognize all the text in a certain picture.
3.2. Send a network request to the API service address
3.2.1. Acquisition of Access Token
3.2.2. Send a request to the API service address using POST
4.1. Details of general text recognition request parameters
4.2 General Text Recognition Return Data Parameter Details
1. First, enter the Baidu AI Open Platform, register a Baidu Smart Cloud account, enter the console and create an application
1.1. Login to Baidu AI Open Platform
Baidu AI Open Platform: https://ai.baidu.com
The platform can also be logged in directly with a Baidu account
1.2, enter the console - text recognition
After entering the console, you can choose the services provided by the platform. You can see that the platform provides us with text recognition, speech recognition, face recognition and other services. Here we take text recognition as an example, select text recognition, and you can Go to the console overview.
1.3. Create an application
In the console overview, we can watch the operation guidelines prompted by the platform. We follow the order and first get the corresponding free resources, which can be used for personal testing.
We who have completed personal certification have a certain number of times per month, which can fully meet the needs of small-scale professional identification of individuals. If more is needed, more services can be purchased.
For the first use, you need to create an application. When creating an application, you need to fill in the relevant information. After filling all the information, you can create the application.
After creating an application, you can check the relevant information in the application list in the public cloud service of the text recognition console. After the creation is successful, a unique AppID, API Key, Secret Key and other secret keys will be generated, which are important information for calling the baidu-aip interface
1.4. View help documentation
You can view the help documentation to use the relevant API. Next, we will use the general scene text recognition as an example to demonstrate how to use python's baidu-aip
Help document: https://cloud.baidu.com/doc/OCR/s/Ck3h7y2ia
2. python download baidu-aip library
The library only needs to import aip when using it, but the full name of the library is baidu-aip, many people will install errors here
pip install baidu-aip -i https://pypi.tuna.tsinghua.edu.cn/simple
In addition, you can also search and download in IDEs such as pycharm.
3. There are two ways to call the api to send a request and get the text recognition result
3.1. Interaction through AipOcr
3.1.1, Create a new AipOcr
AipOcr is a Python SDK client for OCR, which provides a series of interactive methods for developers using OCR.
Refer to the following code to create a new AipOcr:
from aip import AipOcr
""" 你的 APPID AK SK """
APP_ID = '你的 App ID'
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
3.1.2. Configure AipOcr
If users need to configure the network request parameters of AipOcr (generally no configuration is required), they can call the interface to set parameters after constructing AipOcr. Currently only the following parameters are supported:
interface |
illustrate |
setConnectionTimeoutInMillis |
Connection establishment timeout (unit: milliseconds |
setSocketTimeoutInMillis |
Timeout for transferring data over an open connection (in milliseconds) |
3.1.3. The user requests the service to recognize all the text in a certain picture.
Call the request service method through the AipOcr object, and the related method name can be viewed in the interface document
Interface description: https://cloud.baidu.com/doc/OCR/s/7kibizyfm
# 设置可选参数
options = {}
options["language_type"] = "CHN_ENG"
options["detect_direction"] = "true"
options["detect_language"] = "true"
options["probability"] = "true"
# 调用通用文字识别(标准版),返回值是一个字典
res_image = client.basicGeneral(image, options)
res_url = client.basicGeneralUrl(url, options)
res_pdf = client.basicGeneralPdf(pdf_file, options)
3.1.4. Recognition results
The return value after calling the relevant method is a dictionary, and the required attribute value can be obtained through the corresponding attribute name
Test image:
# 识别网络图片
url = "https://img.zcool.cn/community/01a7195d65df7ca8012187f435d2b7.jpg@1280w_1l_2o_100sh.jpg"
# 标准版
res_url = client.basicGeneralUrl(url)
# 高精度版
# res_url = client.accurateUrl(url)
# 返回一个字典
for keys, values in res_url.items():
print(keys, ":", values)
3.2. Send a network request to the API service address
3.2.1. Acquisition of Access Token
Access_token must be obtained through API Key and Secret Key
Note: access_token is valid for 30 days and needs to be replaced every 30 days;
import requests
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'
# Access_token必须通过API Key和Secret Key获取
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + API_KEY + '&client_secret=' + SECRET_KEY
response = requests.get(host)
access_token = response.json()["access_token"]
3.2.2. Send a request to the API service address using POST
To send a request to the API service address using POST, a parameter must be included in the URL: access_token
You can also set request parameters to get the required data
# 通用识别 高精度 网络地址
request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic"
# 带上参数 access_token
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
# 参数设置
url = "https://img.zcool.cn/community/01a7195d65df7ca8012187f435d2b7.jpg@1280w_1l_2o_100sh.jpg"
params = {"url": url, "language_type": "ENG"}
# 返回一个requests.models.Response类型数据
result = requests.post(request_url, data=params, headers=headers)
3.2.3. Recognition results
The service returns a Response type data, which can return a dictionary through the json method, and obtain the required attribute value through the attribute name
if result:
for keys, values in result.json().items():
print(keys, ":", values)
4. Others
4.1. Details of general text recognition request parameters
parameter |
type |
Range of optional values |
illustrate |
image/ url/pdf_file 【Mandatory and choose one of three】 |
string |
Image data, urlencoded after base64 encoding The complete url of the picture, the length of the url should not exceed 1024 bytes PDF file, urlencoded after base64 encoding |
It is required that the size of the corresponding image after base64 encoding and urlencoding should not exceed 4M, the shortest side should be at least 15px, and the longest side should be at |
pdf_file_num |
string |
- |
The corresponding page number of the PDF file that needs to be recognized. When the pdf_file parameter is valid, it will recognize the corresponding page content of the incoming page number. If it is not passed in, it will recognize the first page by default. |
language_type |
string |
CHN_ENG: Mixed Chinese and English, ENG: English, JAP: Japanese, KOR: Korean, FRE: French, SPA: Spanish, POR: Portuguese, GER: German, ITA: Italian, RUS: Russian |
Identify the language type, the default is CHN_ENG |
detect_direction |
string |
true: detect orientation; |
Whether to detect the image orientation, the default is not detected, that is: false. Orientation means that the input image is in the normal direction and rotated 90/180/270 degrees counterclockwise. |
detect_language |
string |
true/false |
Whether to detect the language, the default is not detected. Currently supported (Chinese, English, Japanese, Korean) |
paragraph |
string |
true/false |
Whether to output paragraph information |
probability |
string |
true/false |
Whether to return the confidence of each row in the recognition result |
4.2. General text recognition return data parameter details
field |
Is it required? |
type |
illustrate |
direction |
no |
int32 |
Image direction, this field is returned when detect_direction=true. |
log_id |
yes |
uint64 |
Unique log id for problem location |
words_result_num |
yes |
uint32 |
The number of recognition results, indicating the number of elements of words_result |
words_result |
yes |
array[] |
recognition result array |
+ words |
no |
string |
recognition result string |
+ probability |
no |
object |
The confidence value of each row in the recognition result, including average: average row confidence value, variance: row confidence variance, min: row confidence minimum value, this field is returned when probability=true |
paragraphs_result |
no |
array[] |
Paragraph detection result, this field is returned when paragraph=true |
+ words_result_idx |
no |
array[] |
The line number contained in a paragraph, this field is returned when paragraph=true |
language |
no |
int32 |
Return this field when detect_language=true |
pdf_file_size |
no |
string |
The total number of pages of the incoming PDF file, this field is returned when the pdf_file parameter is valid |