[blog] Avoid These 11 Common Mistakes When Building Voice Applications 语音应用开发中的 11 个常见错误

链接：https://voicebot.ai/2017/06/02/avoid-11-common-mistakes-building-voice-applications/

Now that voice applications such as Alexa and Cortana Skills and Google Actions are proliferating, there is increased discussion about best practices and common mistakes. Voicebot reached out to 11 experts in voice application design and development to find out what you should avoid. Our goal? Fewer voice applications that suck. Here is the question and our expert answers.

随着基于 Alexa、Cortana Skills 及 Google Actions 的语音应用数量的快速增加，开发语音应用的或良好或错误的实践，成为了一个热点议题。Voicebot 采访了 11 位语音应用设计和开者领域的专业人士，向他们请教开发者应该避开的误区。本文的目的在于使世界上少一些难用的语音应用。下面是问题及专家的回答。

What is a common mistake that voice application publishers make that undermines the quality of their apps?

开发者普遍会犯什么样的错误，导致语音应用的用户体验糟糕？

Jess Williams, Opearlo
Not running a beta test before publishing the voice app. The new Amazon beta testing tool makes this really easy and it’s a great way of identifying any weaknesses in the voice interaction model.

发布语音应用之前，不做验收测试（beta test）。新版的 Amazon Alexa 提供了简单易用的验收测试工具。验收测试也有助于发现语音交互的缺陷。

Dan Whaley, Sabio
Number one issue is with the use case. Failing to think about whether the task suits voice as a medium. Many tasks are either too complex or rely on visual hierarchies and relationships that make it difficult for the user to understand the feedback and achieve their goal on a voice interface. A classic example is flight bookings. A good rule of thumb is, if you can’t (or wouldn’t want to) do a task while washing up, it’s probably not a great use case for voice.

首要的问题在于“应用场景”——开发应用前，没有考虑特定任务是否适合用语音的方式完成。许多任务不是过于复杂，就是依赖于视觉上的层次关系。这使得用户难以通过语音接口，去理解系统的反馈，最终导致任务失败。典型的例子是机票预订。关于应用场景的判断，一个比较好的经验法则是，如果你不能（或不愿意）一边洗澡一边做某件事，那么这项技能很可能就不适合做成语音应用。

Paul Burden, Our Voice
Voice assistants respond almost immediately to simple queries like the time of day, weather, today’s headlines, and so on. However, when interacting with a skill that requires more time to respond, users can get impatient. As such, they frequently interrupt thtzae process by blurting out, “Alexa.” This of course terminates the previous process as the Echo awaits a new command.

Here’s a specific example. The Washington Metropolitan Transportation Authority (WMATA) has a fantastic skill that provides real time updates on the Metro subway system. To query WMATA, the user says, “Alexa, ask DC Metro when the next train arrives at the Bethesda stop.” The Echo usually responds in 8 or 9 seconds. It doesn’t sound like much time, but users often interrupt the process. I’d recommend adding functionality enabling Alexa to respond to the interruption by saying something like: “Do you still want me to get the Metro information for you?”

对于简单的任务（例如，查询时间、天气、热点新闻等），语音助手能够很快地回复。但是，有些复杂的任务刚需要较长的处理时间，用户可能会缺乏足够的耐心。这时，他们会反复呼喊唤醒词，从而打断处理进程。对于 Echo，这会中断当前进行中的任务，转而等待新的指令。

举个具体的例子。“华盛顿大都会地区运输管理局”（Washington Metropolitan Area Transportation Authority，WMATA）开发了一款非常棒的技能，能够查询地铁系统的实时状况。想要查询地铁信息，用户只需要说：“Alexa, ask DC Metro when the next train arrives at the Bethesda stop.”。Echo 通常需要 8、9 秒的时间回应。8、9 秒似乎不是特别久，但是用户经常打断处理进程。我建议这项技能增加一个设计，在 Alexa 被打断时，向用户确认是否中止查询进程（e.g. Do you still want me to get the Metro information for you?”）。

Jo Jaquinta, Tza Tza Tzu
Verbosity. You can’t fast-forward, skim, or skip past content in an audio stream. You have to listen to all of it. So when a voice app drones on with an overly long response the users has to either resentfully sit through it, abort the app, or just forgets the first half of what is said. Replies have to be terse and crisp, and deliver the most compelling information possible. But they should also allow access to more information to let people dive down on precisely what they want (You can learn more on this topic through Jo’s tutorial).

话痨。对于音频流，你不能快进、跳过或回退。你必须把音频内容全部听完。因此，当语音应用扔出一段非常长的语音回复时，用户要么怨念地等着音频播完，要么怒删 app，要么听了后半段却忘了前半段。语音应用的回复要简洁明了、传达信息要点。但同时，用户如果愿意，他们应该有办法进一步了解更精确、详细的信息。

Scott Werner, SaySpring
As far as common mistakes, I think the biggest one I made early on in a project and the one I see a lot of people make when they get started is not prompting their users well with what they’re able to do. To me when I’m building and spending a lot of time thinking about an application, a question like “what would you like to do next?” or “what else?” is fine, because I’ve memorized all the available things my skill or action can do. But, to most users they don’t have that kind of familiarity with the skill and can easily get lost.

在很多常见错误中，我早期（以及许多新手）在项目中犯过的最大的一个是：没能让用户了解，他们能用这款语音应用做些什么。开发者曾经花了大量时间思考、构建应用，对于应用的技能烂熟于心。因此，当应用反馈“你接下想要做什么？”或“还有别的吗？”，开发者知道应该如何回复。但是，对于普遍用户而言，他们并不熟悉这些技能，很容易如堕五里雾中。

Adam Marchick, VoiceLabs
I think the understandable issue is the voice application publishers try to do too many things in version one. Given conversational AI is so powerful, it is enticing to try and offer 10+ features and functionality.

My recommendation is to release 1-3 key features that you think will resonate and bring daily and weekly habit, then analyze the Voice Pathing to see how my consumers are successfully navigating the skill, evaluate your Speech Finder to both ask where people are getting confused and what additional functionality people want, and plan to evaluate your retention metrics weekly. Many of your users are figuring out how to use these two screens together to quickly improve their application. Once you have a successful ‘sticky’ user experience, then add your next five features. Voice programming is much more iterative vs. mobile or web.

我认为一个常见的（但可以理解的）错误在于，语音应用的开发者，试图在一版应用中塞进太多的东西。现在的“对话 AI”非常强大，以至于开发者情不自禁地想在应用中一口气添加 10 项甚至更多的功能。

我的建议是，在语音应用的第一版中，提供 1 到 3 项关键功能，满足用户的潜在高频需求（每天或每周都会使用的）。然后，在运营过程中，不断分析总结，哪些功能设计是成功的，哪些地方会使用户困惑，哪些新技能是用户希望增加的。在用户对已有功能产生黏性的基础上，逐渐增加更多的功能。相比于网页或移动应用开发，语音应用开发更加具有迭代性。

Pat Higbie, XAPPmedia
Giving users too many choices on any single turn of a conversation. Voice is different from web and mobile because the user needs to remember the choices provided in order to respond effectively. So, it’s important to limit the choices on any single turn in the conversation to 3 or 4 at the most.

在一轮对话中给用户太多的选项。不同于网页和手机，语音交互场景下，用户需要记住所有的选项才能进行有效的回复。因此，每轮对话提供的选择不能超过 3 项或 4 项。

Stephane Nguyen, Assist
Voice application publishers try to build tree flows, which function like IVR systems. This is not how people converse in general, specially over voice. How many times have you change subject during a conversation with your friend: “Oh by the way, I think I don’t want to go to this restaurant anymore”, while talking about another thing. This comes back to the article Shane (our CEO) has published. You should never have to “go back” to the restaurant step to change your mind.

语音交互通逻辑通常是基于树状流程开发，类似于互动式语音应答（IVR，Interactive Voice Response）系统（图1）。但是，人类实际的交流过程——特别是语音交流——是非常跳脱的。有多少次，你正在和朋友谈论热映中的电视剧时，朋友突然来了句”哦，对了，晚上我不想吃麻辣烫了，去吃披萨吧“。你当然不需要再重走一遍”订餐“流程来改变心意。

图1. IVR 系统示例【src】

Ahmed Bouzid, Witlingo
They don’t fully understand that the voice medium is fundamentally different from the visual/tactile interface. In the visual interface, you can have as many options as needed, and you have devices such as drop downs, checkbox lists, radio buttons, and images to guide the users. The list can be as long as it can be and the cognitive load on the user is minimal. They can quickly scan the options and make a decision. In voice, the spoken word is linear, and so:

Users have to wait for the options to be read one at a time
They need to remember what the options are, [which means] that users need to remember, creating cognitive load
Users can’t find out about the meaning of an option without disrupting the flow (no hovering over a question mark to find out what the option means)

Worse: Voice is ephemeral and doesn’t persist – so you can’t easily pause the interaction and get back to it later. For example, you are leisurely filling out a a mortgage application form on your comfortable laptop and someone rings the bell. In the visual interface, you can just go ahead and answer the door and then come back and pick things up where you left off. A quick scan tells you where you were and what’s left to do. In voice, you just can’t do that without enduring a horrible experience.

开发者并不完全理解，语音做为一种交流媒介，同视觉和触觉有着本质上的不同。在图形界面下，我们可以使用诸如下拉菜单、复选框列表、单选按钮或图片等方式来指导和帮助用户，从而可以提供任意多的选项供用户选择。用户也没有过多的认识负担，他们可以快速地浏览各个选项，然后做出决定。但在语音交互中，信息（文字）是线性、依次传达给用户的，因此：

用户每次只能听取一个选项。

用户必须记住所有的选项——而这会产生认识负担。

用户希望澄清某个选项时，必须打断当然的播报流程——而在图形界面中，使用 hover 的方式可以对选项提供解释。

更糟糕的是：语音是暂态的，转瞬即逝。因此，暂停进行中的交互流程，然后在适当时候恢复，是难以实现的。例如，你正在听着小曲，逛着淘宝时，门铃响了。在图形界面下，你只需要离开电脑去应门；取完外卖后，你可以接着逛逛逛，买买买。在语音接口下，相同场景的使用体验不要太糟心了，有没有。

Nick Schwab, Independent Developer
Giving users a lengthy response is the most common pitfall that I see in voice applications. For most apps (with the exception of games and news-related apps), users don’t want to be forced into hearing more information than they need. Prompting the user for if they would like additional information is often better than assuming that all your users will want the extra information.

在我见到过的语音应用中，一个常见的错误是，给用户一个冗长的语音反馈。对大部分应用（游戏和新闻类除外）而言，用户不想被迫听需求之外的信息。相比于无选择性的将信息抛给用户，询问用户是否需要更详细的额外信息是一种更好的做法。

John Kelvie, Bespoken
Not keeping a close eye on their voice apps once they go live. Outages lead to bad reviews from customers, and no publishers wants that – whether they are a hobbyist, brand or enterprise. Even if the cause of the outage is outside the developer’s control, knowing what is happening and mitigating or resolving it as quickly as possible is essential.

应用一旦上线，便疏于管理。愤怒导致用户的差评，这是任何开发者（不论是玩儿票、独立开发者还是企业开发者）所不愿见到的。虽然开发者不能控制用户的怒火，但是搞清事故的来龙去脉，即时解决问题才是王道。

[blog] Avoid These 11 Common Mistakes When Building Voice Applications 语音应用开发中的 11 个常见错误

猜你喜欢