Abstract: With the advancement of AI technology, intelligent voice begins to liberate human-computer interaction from the traditional mode of hand + eyes. Bring people a more convenient, interesting and humane experience, so that the operated object is no longer just a rigid tool, but more like a living assistant. "Help me turn on the air conditioner", "Do you need to bring an umbrella to work tomorrow?", "Where is the courier?" In the era of the Internet of Everything, all your needs can be fulfilled with just one sentence.

Click here to view the original text: http://click.aliyun.com/m/43694/

With the advancement of AI technology, intelligent voice begins to liberate human-computer interaction from the traditional mode of hand + eyes. Bring people a more convenient, interesting and humane experience, so that the operated object is no longer just a rigid tool, but more like a living assistant. "Help me turn on the air conditioner", "Do I need to bring an umbrella to work tomorrow?", "Help me pay 100 yuan for the phone bill"... In the era of the Internet of Everything, all your needs can be fulfilled with just one sentence.
The Link Voice SDK integrated with AliOS Things can realize intelligent voice interaction.

About Alibaba Smart Voice Service

Alibaba's intelligent voice service provides devices with voice interaction capabilities, rich music content, smart home control, etc., and can be customized for proprietary equipment skills (such as voice-controlled treadmills, massage chairs, and other equipment). include:

General services: search songs, search columns, search radio stations, ask the weather, encyclopedia, four arithmetic, etc.;
Ali service: control smart home, recharge mobile phone fees, Tmall supermarket shopping, check electricity bills, etc. (requires access to the account system, please refer to SDS access);
Private service: control equipment, after-sales telephone inquiry, etc. (skill customization is required, please provide product requirements when signing the contract).

function integration

Alink SDK and Link-Voice SDK need to be integrated for a device to access Alibaba's voice service. Alink SDK provides the device with the connection, account system, network distribution, OTA and other capabilities to access Alibaba's IoT platform, while Link-Voice SDK provides the device with Alibaba's IoT platform capabilities. Smart Voice Service. The device must first integrate the Alink SDK to become a device of the SDS platform, in order to use the Alibaba intelligent voice service by integrating the Link-Voice SDK.
In addition to relying on Alink to complete the platform access device management for the device, the Link-Voice SDK also requires the modules listed in the table to complete the corresponding work. Among them, websockets is used for the interaction of voice data; opus completes the conversion from PCM format of voice recording to opus format (the server only accepts opus format); cjson is used for json parsing; mbedtls is used to encrypt the underlying connection between alink and websockets. Its data transmission provides security.
Dependent components

At present, AliOS Things has completed the transplantation and adaptation of all the modules in the above table and integrated them, so we can directly use AliOS Things to complete the happy intelligent voice development. Main MCU performance recommendations:

Flash>=512KB
RAM>=200KB
CPU>=180Mhz

Single-shot speech recognition flow chart

Simplify details such as buffering:
_19_

Development platform preparation

It is reasonable to meet the performance requirements of the function integration chapter, and a development board with audio recording and playback functions is sufficient. The premise is that the porting and adaptation of AliOS Tings needs to be completed. This article takes the Allwinner xr871evb (completed OS adaptation) as an example for introduction.
Resources on this platform:

cpu:200Mhz cortext-M4f
RAM: 448KB (some hardware-related codes need to be loaded into ram to run, about 280KB is actually available)
FLASH：2MB SPI FLASH

Environment construction and code compilation

First build the AliOS development environment (take linux as an example):
AliOS-Things-Linux-Environment-Setup

Then download the latest version of AliOS Things source code from github Ali official open source library ( https://github.com/alibaba/AliOS-Things ):
Take development under linux as an example:

git clone [email protected]:alibaba/AliOS-Things.git

Switch to master branch:

git checkout master

It is recommended to create a new development branch of your own on the master branch:

git checkout –b dev-xxx(yourname):

So far, the environment has been installed, and the code has been prepared, only to be compiled and burned for testing.
Compile the link-voice test routine:

aos make linkvoiceapp@xr871evb xr871=1

Code burning:

cd platform/mcu/xr871/tools/

Modify the serial port configuration:

vim settings.ini

Change the serial port to the serial port number of your board, you can view it with ls /dev/tty*, save and exit.
uart setting
To program the code, first turn the Allwinner development board startup selection DIP switch to the NO position, as shown:
flash setting
then execute

./phoenixMC_linux  开始进行代码烧写，烧写完成后再将拨码开关拨回靠串口位置，重启。

Open minicom or other serial port tools to monitor the input information of the device, the baud rate is 115200.
Function demonstration:
After the first power-on, configure the network first:

netmgr connect ssid psswd

Where ssid and psswd are replaced with your wireless network name and password respectively.
Since there is no local keyword recognition function, now every conversation needs to be triggered by a button.
According to the terminal prompt, after the network is connected, when the following prompt appears:
press key
Press button 2 (AK2) to trigger the voice recognition function.
talking
At this time, speaking to the development board, the voice data is encoded, uploaded to the cloud, and the corresponding information is returned after the recognition is successful, and there is a corresponding prompt when the recognition fails.

Example:
1: Do I need to bring an umbrella to work tomorrow?
2. Tell me a ghost story.
3. Recommend a suspenseful movie.
4. Turn on the air conditioner.

Here is a small demo video:

http://v.youku.com/v_show/id_XMzQ1NjQ4MjIyOA==.html?spm=a2h3j.8428770.3416059.1

API introduction

1) Initialize

int pal_init(const struct pal_config *config);

Description: SDK initialization, only need to be called once.
Input parameters: The configstructure passes the required parameters to the SDK
Returns: 0 for success; -1 for failure

2) Destroy

void pal_destroy();

Description: SDK is destroyed to release resources.
Input: none
return: none

3) Get SDK version

int pal_version();

Description: Returns the version number of the SDK
Input: none
Return: SDK version number

4) Set the log level

void pal_set_log_level(int level);

Description: Sets the log level of the SDK. The debugging phase can be set to PAL_LOG_LEVEL_DEBUG to facilitate debugging problems. After debugging is stable, set the log level to PAL_LOG_LEVEL_ERROR before going online.
Input parameter: levelthe log level of the SDK
return: none

5) Set up the environment

void pal_set_env(int env);

Description: Set the SDK environment, the default is PAL_ENV_RELEASE, you can connect to the Alibaba online environment in the manufacturer's external environment. Manufacturers do not need to use this interface.
Input parameters: envthe environment of the SDK
returns: none

6) The manufacturer's player reports messages to the SDK

int pal_notify_msg(const char *msg);

Note: The manufacturer's player status or key event needs to be Link_Voice_SDK_播控协议_v1.0.0.xlsxreported to the SDK and the upper-layer application synchronization status according to the defined json format.
Input parameter: msgthe event message that the manufacturer's player needs to pass to the SDK
Returns: 0 for success; -1 for failure

7) The manufacturer's player transparently transmits ALink messages through the SDK

int pal_post_alink_msg(const char *msg);

Description: The SDK initialization process will initialize ALink and maintain a long-term connection with the Alibaba platform. The message that the manufacturer's device needs to report to ALink can be reported through this interface, and the SDK will transparently transmit the message to the ALink server.
Input parameter: msgthe message that the manufacturer needs to transparently transmit to ALink through the SDK for reporting, the format is in accordance with the message format defined by ALink
Returns: 0 for success; -1 for failure

8) Start a speech recognition

int pal_asr_start();

Description: Called when the device triggers voice recognition by pressing a button or wake-up in the far field.
Input: none
Returns: 0 for success; -1 for failure;

9) Send voice data

int pal_asr_send_buffer(const char *buffer, int buffer_len);

Description: This interface should be called after calling pal_asr_start successfully to send voice data. If it is data in PCM format, it requires 640 bytes each time. If PAL_VAD_STATUS_STOP is returned, the cloud detects that the voice is over, and the manufacturer can call pal_asr_stop or pal_asr_stop_async to get the recognition result.
Input parameters: buffervoice data, buffer_lenvoice data length, bytes
Return: Returns the VAD status detected by the cloud

10) End the speech recognition (synchronization interface)

struct pal_rec_result* pal_asr_stop();

Description: The pal_rec_result structure returns the result of this speech recognition, the synchronization interface. The field status in the structure indicates the status of this speech recognition; should_restore_player_status indicates whether the manufacturer's player restores the previous state after processing this speech recognition event, 0 means no recovery, 1 means recovery; asr_result means the text recognized by ASR; task_status Indicates the status of the speech recognition task, PAL_REC_TASK_STATUS_END indicates the end of a single speech recognition session, and PAL_REC_TASK_STATUS_WAITING indicates multiple rounds of dialogue. After the TTS is played, it should automatically enter the pickup state and start a new speech recognition.
Input: none
Returns: the structure of the speech recognition result

11) End the speech recognition (asynchronous interface)

void pal_asr_stop_async(pal_asr_callback callback, void *user);

Description: The interface returned asynchronously, the function is the same as pal_asr_stop.
Input parameter: callbackthe registered callback function, used to return the speech recognition result; userthe user-defined pointer, callbackwhich will be returned to the user in
return: none

12) Cancel this speech recognition

void pal_asr_cancel();

Description: Cancel this speech recognition.
Input: none
return: none

13) Destroy speech recognition results

void pal_rec_result_destroy(struct pal_rec_result *result);

Description: The speech recognition results returned by pal_asr_stop and pal_asr_stop_async need to be released through this interface.
Input parameter: resultthe result that needs to be destroyed
return: none

14) Text-to-speech (synchronous interface)

struct pal_rec_result* pal_get_tts(const char *text);

Description: Provides the function of text-to-speech. The returned result pal_rec_result needs to be destroyed by pal_rec_result_destroy.
Input parameter: textthe text to be converted
Return: The returned structure, the result of text-to-speech is in the tts field, which is a playable url.

For more information about AliOS Things, please refer to https://github.com/alibaba/AliOS-Things/wiki
For more information about Link Voice, please refer to https://iot.aliyun.com/product/voice?spm= a2c2j.8959409.5007732.14.666018deKqxNU7

Play smart voice based on AliOS Things