Use Baidu Developer Platform to process speech reading

--TIME

--Baidu Developer Center-Gathering, Openness, Assistance, and Win-win

--Register an account

 -- Preparation

Preparation

Update time: 2023-01-13

Become a developer

Complete the basic registration and authentication of your account in three steps:

STEP1: Click to enter the console and select the AI ​​service item you want to use. If you are not logged in, you will be redirected to the login interface. Please log in with your Baidu account. If you do not have a Baidu account yet, you can click here to register a Baidu account .

STEP2: For first time use, after logging in, you will enter the developer certification page. Please fill in the relevant information to complete the developer certification. Note: (If you are already a Baidu Cloud user or Baidu Developer Center user, you can skip this step.)

STEP3: Select voice technology through the left navigation of the console, enter the voice technology control panel, and perform related operations.

BB4979E6-CF98-4639-BFAE-696361D6EE1F.JPG

Get free quota

New users who use voice technology can receive the free test quota of the corresponding interface on the console to make interface calls. The validity period of the free quota starts from the date of successful collection. After the validity period expires, the free calling quota will be cleared. For details, please click here to view  the free quota for speech recognition  |  free quota for speech synthesis  |  free quota for call center voice .

WeChat screenshot_20221130182012.png

Create app

You need to create an application before you can formally call voice technology capabilities. Applications are the basic operating unit for you to call services. You can perform interface calling operations and related configurations based on the API Key and Secret Key obtained after the application is successfully created. You can follow the operation process shown in the figure below to complete the creation operation.

WeChat screenshot_20221130182124.png

WeChat screenshot_20221130182151.png

Application name:  The name used to identify the application you created. It supports Chinese and English, numbers, underscores and horizontal lines. Once this name is created, it cannot be modified.

Interface selection:  Each application can check the interface permissions of all AI services required by the business (only interface capabilities with free trial permissions can be checked). All interfaces under voice technology are checked by default. After the application is created, this application is Have the calling authority for the selected service.

Voice package name: If you need to use the voice technology SDK service (iOS/Android), you need to bind the package name information in order to generate an authorization license.

Application ownership: You can choose the service for personal use or company use. If it is used by the company, you can communicate with the dedicated business manager to obtain professional pre-sales support.

Application description:  Describe the business scenario of this application.

Get key

After you create the application, the platform will assign you the relevant credentials for this application, mainly AppID, API Key, and Secret Key. The above three pieces of information are the main evidence for the actual development of your application, please keep them properly. The picture below shows sample content:

image.png

Generate signature

You need to use the AppID, API Key and Secret Key assigned to the created application to generate Access Token (credential for user authentication and authorization). For details, see  Access Token Obtaining . We have prepared requests in several common languages ​​for you. Sample code.

Warm reminder: Access Token is valid for 30 days (in seconds). Please pay attention to regularly request new tokens in the program when integrating.

Start development

Currently, there are two main ways to use voice technology: API and SDK. You can select the documentation of each product to view the specific usage methods and parameters. For details, click here to view  the speech recognition tour  |  speech synthesis tour  |  call center audio tour .

 

How to use visual tools to call

Update time: 2023-01-13

How to use Postman to call the speech technology service API

This article provides an example of calling the short text speech synthesis API through the visual tool Postman to help you quickly experience and become familiar with speech technology services without coding.

1. Download and install the interface calling tool

1.1 Download the interface calling tool—Postman

The download address is as follows:

Mac download address, click to go >>

Windows download address, click to go >>

1.2 Postman installation tutorial

(1) Double-click the installation package.

(2) If you log in for the first time without an account, you can directly enter the postman main interface.

2. Obtain Access Token

Change the request format to "POST" and fill in the request address: https://aip.baidubce.com/oauth/2.0/token

WeChat screenshot_20221130185421.png

Click Body, select "x-www-form-urlencoded", and enter the following three request parameters in key and value respectively.

grant_type : required parameter, fixed to  client_credentials

client_id : required parameter, the API Key of the application

client_secret : required parameter, the applied Secret Key

WeChat screenshot_20221130185459.png

Click the blue "send" in the upper right corner to obtain the access_token in the return value area below.

WeChat screenshot_20221130185603.png

3. Make interface calls

3.1 The specific operations of interface calling are as follows:

(1) Change the request format to "POST" and fill in the request address (taking short text speech synthesis as an example): https://tsn.baidu.com/text2audio

WeChat screenshot_20221130190152.png

(2) Click Body, select "x-www-form-urlencoded", and enter the following request parameters in key and value respectively.

tex : required parameter, synthesized text

tok : required parameter, obtained access_token parameter

cuid : required parameter, user’s unique identifier

ctp : required parameter, client type selection, fill in the fixed value 1 on the web side

lan : required parameter, fixed value zh

(For more parameters, please go to the short text-to-speech synthesis page to view)

WeChat screenshot_20221130190448.png

(3) Modify the request header, click Headers, and enter 1 request parameter in key and value respectively.

Enter the key field: Content-Type

Enter value column: application/x-www-form-urlencoded

WeChat screenshot_20221130190958.png

(4) Click the blue "send" in the upper right corner to obtain the audio in the return value area below.

WeChat screenshot_20221130190455.png

Speech recognition SDK

Update time: 2023-01-13

Android SDK Quick Integration Guide

In just four steps, you can complete the application integration of the speech recognition SDK, giving your application a stable and consistent recognition experience.

Step1: Become a developer of Baidu AI open platform

To use the speech recognition capabilities of Baidu AI open platform, you must first become a developer of Baidu AI open platform. First, let us spend 5 minutes to register as a developer of Baidu AI open platform and create a new Baidu speech recognition application.

1. Create an account

First click here to register a Baidu account and quickly create a Baidu account. Please refer to the picture below: 

Screenshot 2021-12-29 144649.png

2. Create an application

After creating an account, log in to the Baidu AI open platform and click here to create an application, as shown below:

Create APP1.png

When creating an application, be sure to enter the application name, voice package name (enter the sample demo package name: com.baidu.speech.recognizerdemo ) and other information. After the creation is completed, you can see the created application information:

Create APP2.png

The created application information is as shown below:

APP3.png

Step2: Apply for speech recognition quota

1. Real-name authentication

The speech recognition SDK needs to complete real-name authentication before it can be used. The first step is to complete personal authentication or enterprise authentication according to the prompts. Users who complete real-name authentication can receive free quota. Real-name authentication is performed as shown in the figure below: 

Real-name authentication.png

2. Get free quota

After completing the real-name authentication, you need to receive free quota for speech recognition. As shown below:

Get free quota.png

Step3: Download the speech recognition SDK and fill in the authorization information

1. Obtain authentication information

Prepare the three authentication information obtained after creating the application, AppID, API Key, and Secret Key. You need to log in to the console to view the application details to obtain them.

2. Download the speech recognition SDK

Download  the speech recognition Android SDK  file on the SDK download page, link: SDK Download_Text Recognition SDK_Speech Recognition SDK-Baidu AI Open Platform

3. Run directly without modifying the SDK

Unzip the sdk file without making any modifications. Install and run the program directly to see the following interface:

Android modification 1.png

4. Fill in the authentication information

Fill in   the three authentication information of AppID, API Key, and Secret Key for testing. Please follow the steps below to modify them all:

Modify parameters.JPG

Step4: Test the speech recognition function

1. After modifying according to the above document, install the app and open it to enter online recognition. Click Start Recording to perform online speech recognition:

Android modification 2.png

The simple speech recognition Android SDK test is completed, and other sub-functions can be integrated according to the detailed technical documents.


 

iOS SDK Quick Integration Guide

In just four steps, you can complete the application integration of the speech recognition SDK, giving your application a stable and consistent recognition experience.

Step1: Become a developer of Baidu AI open platform

To use the speech recognition capabilities of Baidu AI open platform, you must first become a developer of Baidu AI open platform. First, let us spend 5 minutes to register as a developer of Baidu AI open platform and create a new Baidu speech recognition application.

1. Create an account

First click here to register a Baidu account and quickly create a Baidu account. Please refer to the picture below: 

Screenshot 2021-12-29 144649.png

2. Create an application

After creating an account, log in to the Baidu AI open platform and click here to create an application, as shown below:

Create APP1.png

When creating an application, be sure to enter the application name, voice package name (enter the sample demo package name: com.baidu.speech.BDSClientSample ) and other information. After the creation is completed, you can see the created application information:

ios package name creation.png

The created application information is as shown below:

iosapp2.png

Step2: Apply for speech recognition quota

1. Real-name authentication

The speech recognition SDK needs to complete real-name authentication before it can be used. The first step is to complete personal authentication or enterprise authentication according to the prompts. Users who complete real-name authentication can receive free quota. Real-name authentication is performed as shown in the figure below: 

Real-name authentication.png

2. Get free quota

After completing the real-name authentication, you need to receive free quota for speech recognition. As shown below:

Get free quota.png

Step3: Download the speech recognition SDK and fill in the authorization information

1. Obtain authentication information

Prepare the 3 authentication information obtained after creating the application, AppID, API Key, and Secret Key. You need to log in to the console to view the application details to obtain

2. Download the speech recognition SDK

Download  the speech recognition IOS SDK  file on the SDK download page, link: SDK Download_Text Recognition SDK_Speech Recognition SDK-Baidu AI Open Platform

3. Fill in the authentication information

Fill in   the three authentication information of AppID, API Key, and Secret Key for testing. Please follow the steps below to modify them all:

IOS authentication modification.png

Step4: Test the speech recognition function

1. After modification according to the above document, online speech recognition can be performed after installing the app and opening it:

ios recognition.png

The simple speech recognition IOS SDK test is completed, and other sub-functions can be integrated according to the detailed technical documents.

 

Baidu AI open platform  voice document~

Guess you like

Origin blog.csdn.net/s_sos0/article/details/134792854