Baidu speech recognition - Rest APi (Java) (1)

Baidu Speech Recognition - Rest Api Method (1)

Rest Api Baidu official recognition DEMO address: https://github.com/Baidu-AIP/speech-demo/tree/master/rest-api-asr

Does not support streaming recognition, no system limitation, no programming language limitation;

Feature request:

  1. Rest Api provides users with an HTTP interface to upload (input) the entire audio file and return (output) the recognition result; the recognition time is proportional to the audio time;

  2. Recognition model: support search model, input method model, and far-field model; Mandarin search model can recognize commonly used English.

  3. Voice format: support pcm (uncompressed) sampling rate: fixed value 16000 encoding 16bit, mono, little endian.

                                         wav (do not compress pcm encoding), (that is, add a header file in pcm format)

                                         amr (compressed), 16k mono

  1.  Support for custom thesaurus: The priority of custom recognition word segmentation is higher, and the custom thesaurus of RestApi can only be used for dev_pid=1536. The text file of the custom thesaurus cannot exceed 5M, preferably within 10,000 lines

  2. There are two request methods for RestApi's speech recognition: Json and Raw.

The calling process of DEMO:

Two HTTP requests, the first exchange for the Token value; the second upload the audio file to obtain the recognition result;

1. Get Token value

The requested URL should be:

https://openapi.baidu.com/oauth/2.0/token?grant_type= client_credentials & client_id = "Applied APIKEY" &client_secret= "Applied SecretKey "

2. Upload audio files

    Set input parameters:

format string required The audio file format, pcm or wav or amr. not case sensitive. Recommended pcm file
rate string required Sampling rate, 16000, fixed value
channel string required The number of channels, only supports mono, please fill in a fixed value of 1
some int required The unique identifier of the user is used to distinguish users and calculate the UV value. It is recommended to fill in the machine MAC address or IMEI code that can distinguish users, and the length should be within 60 characters.
token string required The developer [access_token] obtained by the open platform obtains the Access Token "access_token")
dev_pid int optional Do not fill in the lan parameter to take effect, do not fill in, the default is 1537 (Mandarin input method model), dev_pid parameter see the table at the beginning of this section
lan string Optional, obsolete parameter For history compatible parameters, please use dev_pid. If dev_pid is filled in, this parameter will be overwritten. Language selection, input method model, the default Chinese (zh). Chinese=zh, Cantonese=ct, English=en, case insensitive.
speech string optional The binary voice data of the local voice file needs to be base64 encoded. Used in conjunction with the len parameter.
len int optional The number of bytes of the local audio file, in bytes

For specific usage, please refer to the demo;

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/weixin_39147807/article/details/83823759