Baidu small degree DMA + Bluetooth voice solutions incoming App

Hutchison ago

  HCI has gone through three stages of mouse and keyboard, touch screen and voice interaction. In other countries, the competition Google, Amazon, Apple and other giants have reached the white-hot state; in China, Baidu's DuerOS With incoming early, big investment, has become the voice interactive banner. Whether it is from a technical strength, or the pace of business, which are at the forefront of the domestic AI companies. AI want to do the voice of the company, followed by Baidu to go, it can be considered a road.

Bluetooth intelligent voice status quo

  At present, most Bluetooth audio devices face many problems, including the need to touch or trigger button, which is very convenient to use in the car; when using voice interaction device, voice input delay evident, even in the presence App crosstalk problems. It also makes the Bluetooth protocol is now based voice equipment has been heavily criticized.
 The reason for this phenomenon is mostly limited to the Bluetooth device chip resources and costs, can not wake up the introduction of the word, no Hands-free experience; using the A2DP and HFP achieve playback and voice input; no standard protocol to support Bluetooth voice services the voice input process will take up phone call recording channels, resulting in very poor user experience.

DMA protocol

 In order to better voice interactive experience, Baidu open the DMA Bluetooth protocol. And third-party vendors to their programs and companies use products, with a small degree app to use.
 What is DMA it? DMA (DuerOS Mobile Accessory) protocol includes three aspects in the selection of technical solutions optimize voice solutions: Bluetooth transmission protocol selected BLE, RFCOMM dual-mode, the audio compression is recommended not affect the voice interaction effect POUS compression, wake-up mode, touch support , keys, wake-word variety of interactive ways.
 When the peripheral requests received feedback information such as version, if phone only supports BLE, pairing by BLE; if supported RFCOMM, then choose RFCOMM pairing.
Small Robots

Baidu's business model

 Baidu open because it wants to rely on to seize the flow inlet speech. They are not willing to touch the hardware, nor is their areas of expertise, these electronic hardware products for Baidu, is a difficult industry to make money. Far from selling advertising to make money.
 Baidu offers a small degree by App, providing customized solutions and related equipment to support capacity through DMA SDK and source code. To open up the whole industry chain, so that each device can be carried out using a small degree app service on the line. Program in this mode are mainly two:
 Basic: Car Bluetooth explained by the cooperation program, providing a first PCBA board, partners can be developed based on this, if only a small vendor basis of the relevant voice interaction function 5W wireless charging capabilities and, through this program, a few days can be achieved;
 customized version: If you need more customization, Baidu provides protocol-based DMA functionality floor and Bluetooth module, provides an open interface feature floor to do custom feature development. The company may also have a third-party program to provide Baidu solution.
 Of course, Baidu's argument, custom development, and can only be limited to very large customers, if you are SMEs, want to help Baidu custom development, difficult to estimate big.
Xiaodudaiping sound

user experience

 DMA effectively solve the pain points three classic Bluetooth protocol:

1 headset must rely on manipulation button: You can wake up in real time via voice, make phone calls, play music and other specified functions.
2 wakeup local speech recognition rate: Wake on semantic, optimization of the model, in the complex environment wakeup ratio can reach 97%
3 voice input delay obvious problems. DMA protocol using BLE / RFCOMM channel, not only A2DP, and can be used simultaneously, can effectively reduce the time of Bluetooth codec system can be greatly compressed audio delay, delay down from 500ms ~ 2000ms to 200ms ~ 300ms; the real-time voice of the user's interactive experience is a qualitative improvement.

Guess you like

Origin www.cnblogs.com/dylancao/p/12116161.html