"Voice First" Interactive interface design and voice robot design driven by intelligent voice technology (Translator's Preface)...

"Words are the voice of the heart, and words are the state of mind." Language and dialogue are important ways for us to communicate and collaborate. Intelligent voice technology is a voice interaction technology based on artificial intelligence and natural language processing technology. It can convert the user's voice instructions into text through speech recognition technology, then analyze and understand the text through natural language processing technology, and finally generate corresponding responses or perform corresponding operations.

Although smart voice technology has been around for a long time, it was not until the birth of Amazon Echo smart speakers that the industry once again aroused widespread attention to smart voice technology, because such smart speakers bring people a user experience called "voice first" , providing users with a more convenient and efficient way of interaction.

So, what is "voice first"?

Voice first refers to using voice interaction as the main user interface when designing a product or service so that users can complete operations through voice commands. This design method can improve the user experience, especially in scenarios where hands are busy, making voice interaction more convenient and faster. The advantages of voice-first interaction are:

One is speed, you can speak 120-150 words per minute.

The second is to free your hands, and you can complete some desired things through voice while cooking.

The third is intuition. Language is a human talent and a natural means of human communication.

The fourth is empathy. Speech includes tone, volume, intonation and speed. These characteristics convey a large amount of information.

Intelligent voice interaction is inseparable from artificial intelligence technology. The artificial intelligence technology involved in voice-first interaction is shown in the figure below.

fdd3c472f09096e0e11ad38883503299.png 

Today, behind the smart voice devices that serve us, there is a whole set of technologies and processes, from voice wake-up to automatic speech recognition, to natural language understanding, and finally feedback through natural language generation and speech synthesis technology. There are many things behind the whole process. Segmented artificial intelligence technologies are supported, such as dialogue management, deep learning, DNN, CNN, NLP, TTS, etc.

Fortunately, in 2017, I joined Baidu, which claims to be “All in AI”, and was responsible for the research and development of smart speakers. My friends and I endured 88 days of purgatory and finally gave birth to Baidu’s first smart speaker, Raven-H. Later, he also participated in the research and development of Xiaodu speakers, Xiaodu home and other products. Later, as the chief evangelist of DuerOS, he was responsible for the ecological construction of DuerOS. The DuerOS open platform provides developers with tools to develop intelligent voice applications, making the development of intelligent voice services more convenient. Foreign Google and Amazon, and domestic manufacturers such as Xiaomi and Alibaba, also have developer communities similar to the DuerOS open platform.

When developers develop intelligent voice services on various open voice platforms, they generally face a lack of understanding of intelligent voice interaction design, especially the applicable scenarios and uniqueness of voice-first design. As an evangelist, I really wanted to write a book on the design and implementation of intelligent voice interaction, but due to various reasons, I was unable to do so.

4f8d397a7e5c501148b3e3d2a13b6836.png

This book is just such a book. It not only discusses dialogue technology in a simple and in-depth manner, but also tells us step by step the details of intelligent voice interaction design. It is a veritable design guide and practical manual. Thanks to the Machinery Industry Press for allowing me to participate in the translation of this book, which made up for my regrets.

The birth of the translation team originated from the collision of several interesting souls. As senior product managers, Wang Tonglin and Lu Jian had a strong thirst for knowledge and more than ten years of rich product design experience, which was the driving force for them to join the translation team. Rigorous writing and careful verification run through our entire translation process. Even so, I am still walking on thin ice. If there are any inaccuracies in the article, everyone is welcome to correct them.

[Related reading]

Guess you like

Origin blog.csdn.net/wireless_com/article/details/133980536