Call Huawei API to realize Chinese speech recognition

1. Introduction of the author

Zhang Nan, female, School of Electronic Information, Xi'an Polytechnic University, 2022 graduate student
Research direction: image processing
Email: [email protected]

Lu Zhidong, male, School of Electronic Information, Xi'an Polytechnic University, 2022 graduate student, Zhang Hongwei's artificial intelligence research group
Research direction: machine vision and artificial intelligence
Email: [email protected]

2. HUAWEI CLOUD Chinese Speech Recognition

2.1 Recording file identification and acquisition

Since recording file recognition usually takes a long time, the recognition is asynchronous , that is, the interface is divided into two interfaces: creating a recognition task and querying the task status. Create a recognition task interface After creating a task, the job_id parameter is returned, and then the user can obtain the transcription status and results by calling the recording file recognition status query interface.

2.2 Restrictions

1. The audio duration should not exceed 1 minute .
2. Support "North China-Beijing 1", "North China-Beijing 4" and "East China-Shanghai 1" regions.
3. Support pcm16k16bit, pcm8k16bit, ulaw16k8bit, ulaw8k8bit, alaw16k8bit, alaw8k8bit, vox8k4bit, v3_8k4bit, WAV (support pcm/ulaw/alaw/adpcm encoding format), MP3, M4A, ogg-speex, ogg-opus, AMR.
4. The audio duration should not exceed 5 hours , and the file size should not exceed 300M.

2.3 Introduction to Object Storage Service (OBS)

Object Storage Service (OBS) is an object-based mass storage service that provides customers with massive, secure, highly reliable, and low-cost data storage capabilities.

For multimedia files such as pictures and audio, it supports the data processing method of directly using Huawei Cloud OBS services to reduce service usage costs, reduce service response time, and improve service experience.

Currently, only the link to access the user's personal audio on OBS is supported, and it does not support reading the link read by other users.

3. Experimental process and results

3.1 Get the API

1. Register a HUAWEI CLOUD account , perform real-name authentication, and activate the service.
2. For AK/SK authentication , download the file credentials.csv
and log in to the console.
Click on your username in the upper right corner of the page and select My Credentials . Enter the "My Credentials" page:
insert image description here
Click "Add Access Key" under the "Access Key" tab. The "Add Access Key" dialog box will pop up:
insert image description here
Enter the "Login Password". If you bind your mobile phone or email, you also need to obtain a verification code and verify it. After the verification is successful, the access key download dialog box will pop up.
Click OK, and follow the prompts to download and save the access key. If the AK/SK
has been generated , find the original downloaded AK/SK file, the file name is generally: credentials.csv. 3. Create a "public read" bucket Open the console and find the object storage service OBS : In the "bucket list", find the created bucket and create it : fill in as required, note : select "public read and write" for the bucket policy: 4. Upload Audio On the OBS "Bucket List" page, click the created OBS bucket: enter the "Object" page, and upload the audio data to the OBS bucket: 5. Obtain the audio URL Click the uploaded data name to enter the data details page: Copy Link, get data URL:
insert image description here


insert image description here

insert image description here

insert image description here


insert image description here

insert image description here


insert image description here

insert image description here
6. Debug the interface in API Explorer:
insert image description here
In this case, the recording file format of pcm16k16bit is used, and the python implementation code is shown in the figure above.

3.2 Code implementation

1. Submit recording file identification

#############下载需要的库
from huaweicloudsdkcore.auth.credentials import BasicCredentials
from huaweicloudsdksis.v1.region.sis_region import SisRegion
from huaweicloudsdkcore.exceptions import exceptions
from huaweicloudsdksis.v1 import *
###############导入AK和SK
if __name__ == "__main__":
    ak = "<YOUR AK>"
    sk = "<YOUR SK>"

    credentials = BasicCredentials(ak, sk) \

    client = SisClient.new_builder() \
        .with_credentials(credentials) \
        .with_region(SisRegion.value_of("cn-north-4")) \
        .build()

    try:
        request = PushTranscriberJobsRequest()
        configbody = TranscriberConfig(
            audio_format="auto",
            _property="chinese_16k_media",
            add_punc="yes"
        )
        request.body = PostTranscriberJobs(
            data_url="YOUR URL",####获取的音频URL
config=configbody
        )
        response = client.push_transcriber_jobs(request)
        print(response)
    except exceptions.ClientRequestException as e:
        print(e.status_code)
        print(e.request_id)
        print(e.error_code)
        print(e.error_msg)

Generate job_id
insert image description here
2. Obtain the recording file recognition result

#############下载需要的库
from huaweicloudsdkcore.auth.credentials import BasicCredentials
from huaweicloudsdksis.v1.region.sis_region import SisRegion
from huaweicloudsdkcore.exceptions import exceptions
from huaweicloudsdksis.v1 import *
###############导入AK和SK
if __name__ == "__main__":
    ak = "<YOUR AK>"
    sk = "<YOUR SK>"

    credentials = BasicCredentials(ak, sk) \

    client = SisClient.new_builder() \
        .with_credentials(credentials) \
        .with_region(SisRegion.value_of("cn-north-4")) \
        .build()

    try:
        request = CollectTranscriberJobRequest()
        request.job_id = "YOUR JOB_ID"  ####上一步获取的job_id
        response = client.collect_transcriber_job(request)
        print(response)
    except exceptions.ClientRequestException as e:
        print(e.status_code)
        print(e.request_id)
        print(e.error_code)
        print(e.error_msg)

3.3 Running results

insert image description here

References (links and citations available for reference)

1. Sample audio provided by HUAWEI CLOUD
2. Error reference

Guess you like

Origin blog.csdn.net/m0_37758063/article/details/130968069