The use of Baidu OCR general text recognition

Table of contents

1. First, enter the Baidu AI Open Platform, register a Baidu Smart Cloud account, enter the console and create an application

1.1. Login to Baidu AI Open Platform

1.2, enter the console - text recognition

1.3. Create an application

1.4. View help documentation

2. python download baidu-aip library

3. There are two ways to call the api to send a request and get the text recognition result

3.1. Interaction through AipOcr

3.1.1, Create a new AipOcr

3.1.2. Configure AipOcr

3.1.3. The user requests the service to recognize all the text in a certain picture.

3.1.4. Recognition results

3.2. Send a network request to the API service address

3.2.1. Acquisition of Access Token

3.2.2. Send a request to the API service address using POST

3.2.3. Recognition results

4. Others

4.1. Details of general text recognition request parameters

4.2 General Text Recognition Return Data Parameter Details


1. First, enter the Baidu AI Open Platform, register a Baidu Smart Cloud account, enter the console and create an application

1.1. Login to Baidu AI Open Platform

Baidu AI Open Platform: https://ai.baidu.com

The platform can also be logged in directly with a Baidu account

1.2, enter the console - text recognition

After entering the console, you can choose the services provided by the platform. You can see that the platform provides us with text recognition, speech recognition, face recognition and other services. Here we take text recognition as an example, select text recognition, and you can Go to the console overview.

1.3. Create an application

In the console overview, we can watch the operation guidelines prompted by the platform. We follow the order and first get the corresponding free resources, which can be used for personal testing.

We who have completed personal certification have a certain number of times per month, which can fully meet the needs of small-scale professional identification of individuals. If more is needed, more services can be purchased.

For the first use, you need to create an application. When creating an application, you need to fill in the relevant information. After filling all the information, you can create the application.

After creating an application, you can check the relevant information in the application list in the public cloud service of the text recognition console. After the creation is successful, a unique AppID, API Key, Secret Key and other secret keys will be generated, which are important information for calling the baidu-aip interface

1.4. View help documentation

You can view the help documentation to use the relevant API. Next, we will use the general scene text recognition as an example to demonstrate how to use python's baidu-aip

Help document: https://cloud.baidu.com/doc/OCR/s/Ck3h7y2ia

2. python download baidu-aip library

The library only needs to import aip when using it, but the full name of the library is baidu-aip, many people will install errors here

pip install baidu-aip -i https://pypi.tuna.tsinghua.edu.cn/simple

In addition, you can also search and download in IDEs such as pycharm.

3. There are two ways to call the api to send a request and get the text recognition result

3.1. Interaction through AipOcr

3.1.1, Create a new AipOcr

AipOcr is a Python SDK client for OCR, which provides a series of interactive methods for developers using OCR.

Refer to the following code to create a new AipOcr:

from aip import AipOcr

""" 你的 APPID AK SK """
APP_ID = '你的 App ID'
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'

client = AipOcr(APP_ID, API_KEY, SECRET_KEY)

3.1.2. Configure AipOcr

If users need to configure the network request parameters of AipOcr (generally no configuration is required), they can call the interface to set parameters after constructing AipOcr. Currently only the following parameters are supported:

interface

illustrate

setConnectionTimeoutInMillis

Connection establishment timeout (unit: milliseconds

setSocketTimeoutInMillis

Timeout for transferring data over an open connection (in milliseconds)

3.1.3. The user requests the service to recognize all the text in a certain picture.

Call the request service method through the AipOcr object, and the related method name can be viewed in the interface document

Interface description: https://cloud.baidu.com/doc/OCR/s/7kibizyfm

# 设置可选参数
options = {}
options["language_type"] = "CHN_ENG"
options["detect_direction"] = "true"
options["detect_language"] = "true"
options["probability"] = "true"
# 调用通用文字识别(标准版),返回值是一个字典
res_image = client.basicGeneral(image, options)
res_url = client.basicGeneralUrl(url, options)
res_pdf = client.basicGeneralPdf(pdf_file, options)

3.1.4. Recognition results

The return value after calling the relevant method is a dictionary, and the required attribute value can be obtained through the corresponding attribute name

Test image:

# 识别网络图片
url = "https://img.zcool.cn/community/01a7195d65df7ca8012187f435d2b7.jpg@1280w_1l_2o_100sh.jpg"
# 标准版
res_url = client.basicGeneralUrl(url)
# 高精度版
# res_url = client.accurateUrl(url)
# 返回一个字典
for keys, values in res_url.items():
    print(keys, ":", values)

3.2. Send a network request to the API service address

3.2.1. Acquisition of Access Token

Access_token must be obtained through API Key and Secret Key

Note: access_token is valid for 30 days and needs to be replaced every 30 days;

import requests
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'
# Access_token必须通过API Key和Secret Key获取
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + API_KEY + '&client_secret=' + SECRET_KEY
response = requests.get(host)
access_token = response.json()["access_token"]

3.2.2. Send a request to the API service address using POST

To send a request to the API service address using POST, a parameter must be included in the URL: access_token

You can also set request parameters to get the required data

# 通用识别 高精度 网络地址
request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic"
# 带上参数 access_token
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
# 参数设置
url = "https://img.zcool.cn/community/01a7195d65df7ca8012187f435d2b7.jpg@1280w_1l_2o_100sh.jpg" 
params = {"url": url, "language_type": "ENG"}
# 返回一个requests.models.Response类型数据
result = requests.post(request_url, data=params, headers=headers)

3.2.3. Recognition results

The service returns a Response type data, which can return a dictionary through the json method, and obtain the required attribute value through the attribute name

if result:
    for keys, values in result.json().items():
    print(keys, ":", values)

4. Others

4.1. Details of general text recognition request parameters

parameter

type

Range of optional values

illustrate

image/

url/pdf_file

【Mandatory and choose one of three】

string

Image data, urlencoded after base64 encoding

The complete url of the picture, the length of the url should not exceed 1024 bytes

PDF file, urlencoded after base64 encoding

It is required that the size of the corresponding image after base64 encoding and urlencoding should not exceed 4M, the shortest side should be at least 15px, and the longest side should be at
most 4096px .

pdf_file_num

string

-

The corresponding page number of the PDF file that needs to be recognized. When the pdf_file parameter is valid, it will recognize the corresponding page content of the incoming page number. If it is not passed in, it will recognize the first page by default.

language_type

string

CHN_ENG: Mixed Chinese and English, ENG: English, JAP: Japanese, KOR: Korean, FRE: French, SPA: Spanish, POR: Portuguese, GER: German, ITA: Italian, RUS: Russian

Identify the language type, the default is CHN_ENG

detect_direction

string

true: detect orientation;
false: do not detect orientation.

Whether to detect the image orientation, the default is not detected, that is: false. Orientation means that the input image is in the normal direction and rotated 90/180/270 degrees counterclockwise.

detect_language

string

true/false

Whether to detect the language, the default is not detected. Currently supported (Chinese, English, Japanese, Korean)

paragraph

string

true/false

Whether to output paragraph information

probability

string

true/false

Whether to return the confidence of each row in the recognition result

4.2. General text recognition return data parameter details

field

Is it required?

type

illustrate

direction

no

int32

Image direction, this field is returned when detect_direction=true.
- - 1: Undefined,
- 0: Forward,
- 1: 90 degrees counterclockwise,
- 2: 180 degrees counterclockwise,
- 3: 270 degrees counterclockwise

log_id

yes

uint64

Unique log id for problem location

words_result_num

yes

uint32

The number of recognition results, indicating the number of elements of words_result

words_result

yes

array[]

recognition result array

+ words

no

string

recognition result string

+ probability

no

object

The confidence value of each row in the recognition result, including average: average row confidence value, variance: row confidence variance, min: row confidence minimum value, this field is returned when probability=true

paragraphs_result

no

array[]

Paragraph detection result, this field is returned when paragraph=true

+ words_result_idx

no

array[]

The line number contained in a paragraph, this field is returned when paragraph=true

language

no

int32

Return this field when detect_language=true

pdf_file_size

no

string

The total number of pages of the incoming PDF file, this field is returned when the pdf_file parameter is valid

Guess you like

Origin blog.csdn.net/CNDefoliation/article/details/127611048