Advice on smart product design

We will finally communicate with machines in a natural way

At the 2012 Sundance Independent Film Festival, the film "Robot and Frank" won a special award. The film tells the story of the robot Robot taking care of Frank with Alzheimer's disease. Select two pictures in the film, one is a human and a human Communication, the other is the communication between man and machine.

In the year of AI boom in 2017, whether people can communicate with machines in a human-to-human way like Frank in the movie, this is defined as "human-computer natural interaction" in the human-computer interaction discipline.

What is natural interaction? In short, it is to interact with the computer in the way of daily communication. What is the way of daily communication? It means that people communicate and interact through speech, body, gestures, eyes, expressions and other forms.

 Human-computer interaction is reaching a new level

The development of human-computer interaction (HCI-Human Computer Interaction) is the process from the adaptation of humans to computers to the continuous adaptation of computers to humans. It is divided into four stages: code instruction interaction, graphical user interface interaction, human-computer natural interaction and human-computer emotional interaction. [Quoted from "Human-Computer Emotional Interaction"]

The development of each stage is based on technology, which brings more intuitive human-computer interaction, and is closer to the natural interaction between people. At the same time, it will expand more usage scenarios and cover all age groups. all of them.

As shown in the figure below, instruction – professional technicians operate early computers; mouse and keyboard – educated ordinary people use PCs for learning and work; touch screen – a wider group of people use smartphones for socializing, information, entertainment, etc.; natural interaction – All people will interact with smart products in natural behavioral ways.

With the development of AI technology, the ability of intelligent products at the perception level is becoming stronger. It can perceive people's voice, body language, gestures, expressions, eyes, etc., and realize the possibility of natural human-computer interaction. This is happening. matter.

The trend of smart products in the future is to have Affective Computing. By recognizing human voice information, facial expressions, body movements, etc., it can adjust its own feedback to meet the needs of people at that moment, and the interaction will become The easier it is, the better it will understand you.

Smart products can perceive people's natural movements and understand people's emotions, both of which belong to the level of information input.

At the information output level, how to design smart products to achieve the feeling of "natural communication between people", we will give design suggestions from six dimensions, namely character setting, appearance, voice, action, interface and light effect . Just like the temperament, appearance, voice characteristics, body language, expressions and eyes of the other person when communicating with each other.

Intelligent product design suggestions with voice interaction as the core function

1. Design suggestion for character setting

①Personality is to serve users;

The character design is a high-level avatar design, which is not drawn up by the designer's personal preferences. It is necessary to fully consider the target users served by the product. For example, patients want to see an expert doctor, passengers want to receive sweet service from flight attendants, and diners want to welcome them. Bin's shop assistant is warm and hospitable, and these images come to life in our minds.

For example, Amazon Echo is the image of a mature professional woman (similar to Google Home, Tmall Genie, Jingdong Dingdong, etc.), Olly gives people a sense of trendy design, and they set the image for their respective target audiences.

②Personality can be conveyed through abstract methods, not necessarily concrete;

For example, the "Xiao Ai classmate" defined by Xiaomi smart speakers is a two-dimensional image. At the end of November this year, a limited edition figure will be made, and the image will be transformed into a figurative physical object.

In this regard, the benevolent sees the benevolent and the wise sees the wisdom. Some people say that she is not the "little love" in their minds. Therefore, in the communication of the character image, we suggest that the purpose can be achieved through artistic methods, such as music, painting, literature, movies, etc. The art is packaged and abstracted into a set of visuals to convey, to achieve the realm of "no people in empty mountains, but people's voices can be heard".

③The human design and the product should be considered as one;

For intelligent products with voice interaction as the core function, the "human voice" will allow users to automatically associate with the corresponding image, and at the same time, it is necessary to consider the matching with the appearance to meet the user's expectations.

Some smart products have motion output, such as jibo, which is defined as cute and cute, and its actions should be interesting and cute.

If it is not considered carefully, it will lead to a gap in cognitive dissonance. For example, when Xiaoyu is at home, when the user asks how old it is, it answers "I am two years old this year" in its mature female voice, while the Amazon Echo's answer is "According to the age of human birth, I am two years old this year."

The latter is more acceptable and understandable, and even a trivial sentence of text may make users “play”, so the human design should be considered integrated with the information output level of the entire product.

2. Suggestions on the design of the appearance

① Fully consider the aesthetics and preferences of target users;

According to the target customer group, create the appearance they like. For example, children will like jibo more than Echo, and people who pursue fashion prefer raven R, because they can sing and dance with it.

Unlike screen-based smartphones, users can't change the theme skin to find their own preferences, and Google Home can only "change pants" to meet user preferences and home styles.

② Consider modeling design based on usage scenarios;

It is necessary to consider what kind of real environment the user will use. At present, most of the intelligent voice products on the market are placed on the desktop, and the size must be carefully considered. If it is defined as multiple usage scenarios, it must be portable.

For example, the "dot-matrix touch screen" cover on the top of the raven H can be easily removed by users for voice and finger-touch interaction, so it is not limited by a fixed position.

 

③ Avoid falling into the Uncanny Valley;

To avoid excessive similarity with human features, it is recommended to use abstract methods to extract anthropomorphic elements for design expression, which will help the product convey emotional information to users, thereby effectively improving user favorability.

For example, the artificial intelligence nomi carried by Weilai ES8 and Baidu’s Dumi are all used to create intelligent emotional interaction through such design methods, so that an industrial product can be upgraded into a new partner with life and emotion.

However, if the features are too similar to those of humans, the current technical capabilities will not be able to be realistic and realistic in shape, and at the same time, voices, expressions, and actions will not be able to achieve a natural and perfect match, so this nondescript design will bring bad luck to users. Heart-wrenching experiences, such as Buddy from Blue Frog Robotics, can easily make users fall into the "uncanny valley".

The "Uncanny Valley Theory" was proposed by Japanese robotics expert Masahiro Mori. He believes that the higher the simulation degree of humanoid toys or robots, the more favorable people have, but when a critical point is reached, this favorability will suddenly decrease.

The more like a person, the more disgusted and fearful, until the bottom, called the Uncanny Valley. As shown in the picture, the zombies that move at the bottom of the valley are more terrifying than the still corpses, although the corpses are terrifying enough.

3. Voice Design Suggestions

① a sense of nature;

To avoid monotony, to be as natural as human speech, to sound active and willing in tone, the words and sentences synthesized by each phoneme are clear and identifiable, natural and smooth.

The information of human speech contains acoustical features of speech and textual semantics. The acoustical features of speech are mainly prosody features (referring to the way phonemes are combined into sentences), including tone, stress, pause, speech rate, etc. Chinese is a tonal language, and tones carry Very important emotional information. Voice is a kind of natural interaction, it needs to achieve a "natural" feeling in order for users to perceive it as available.

How to make Siri sound more natural?

The upgrade goal of Siri in the iOS11 version is to "make Siri sound more natural like a human". The method is through deep learning. Each expression has slightly different sound waves, and each sentence contains dozens or hundreds of words. phoneme.

Siri finds the perfect sound combination for each utterance, where phonemes are collected by candidates selected by Apple for pronunciation collection, and emotional corpus is obtained by Apple listening anonymously, and then used for deep learning to train Siri.

②Once the "voice" is determined, it should not be changed at will;

Once the voice of the human design has been rooted in the user's ears, it is not appropriate to change it at will. If the mobile phone interface changes the background image, it is like changing a person's new clothes, and the intelligent product with voice interaction as the core function replaces the "human voice", just like Re-acquainting with a stranger, as the old saying goes, "If you hear the voice, if you see the person", people will naturally associate the voice with a certain person, and whoever the new voice is, will re-"character modeling".

③ Conversation like a human being;

First of all, the dialogue is smooth and timely feedback is achieved. If there is a pause, it should not be too long. The words are short and effective. Don’t take the initiative to terminate the conversation, and try to promote continuous communication. Of course, you can’t ask users to complete a certain task in the form of commands. This is not a suitable conversation. User resentment and resistance.

④Try to initiate a dialogue after sensing the user;

In a few days, Amazon Echo may be able to recognize and calculate based on the speaker's speech emotion, and better understand the user's mood at the moment when the user speaks this sentence through prosody features (intonation, loudness, rhythm, voice quality, etc.), just like a movie. The line "You sound a little unhappy today" in "Her" can sense you and try to initiate a conversation.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325079641&siteId=291194637