Girlfriends are always angry, understand girlfriends, straight men are not as good as algorithms?

Original: HyperAI Super Neural

Scenario description : There are usually two ways to use AI technology to judge a person's emotions, one is through facial expressions, and the other is through voice. The former is relatively mature, while research on emotion in speech recognition is developing rapidly. Recently, some research teams have proposed new methods to more accurately identify emotions in users' voices.

Keywords : Speech Emotion Recognition Emotion Classification


There are many questions on Zhihu about "how to judge whether a girlfriend is angry". Some people answer: the less words, the bigger the problem; some people say: really angry, no contact for a month; Angry".
"Is your girlfriend angry?" is an eternal problem
"Is your girlfriend angry?" is an eternal problem

Therefore, the girlfriend's "I'm not angry/really not angry" = "very angry"; "I'm angry" = "act like a baby, not angry, just hug and hold high". This emotional logic drives straight men crazy.
insert image description here
I can't feel my girlfriend's emotions at all.

How can I tell if my girlfriend is angry? It is said that AI has made achievements in listening to emotions, which may be more accurate than boys scratching their heads for half a day.

Alexa voice assistant: I am practicing to become a warm man

Amazon's voice assistant Alexa may be smarter than your boyfriend when it comes to sensing emotions.

This year, after the latest upgrade, Alexa has been able to identify emotions such as happiness, joy, anger, sadness, irritability, fear, disgust, boredom and even stress by analyzing the response of the user's command, such as the level and volume, and respond to the corresponding command .
insert image description here
For example, if a girl blows her nose and coughs and tells Alexa that she is a little hungry, then Alexa will analyze the tone of the girl's speech (weak, low) and background sounds (coughing, blowing her nose). Get sick, and then send out caring care from the machine: Would you like a bowl of chicken soup, or order a takeaway? Or even order a bottle of cough syrup directly online and have it delivered within an hour?

Isn't this performance more caring than a straight steel boyfriend?

It is nothing new for artificial intelligence to classify emotions. However, recently, the Amazon Alexa Speech team broke the traditional method some time ago and published new research results.

Traditional methods are supervised, and the training data obtained has been labeled according to the emotional state of the speaker. Scientists on Amazon's Alexa Speech team recently took a different approach, presenting their paper "Improving Emotion Classification through Variational Inference of Latent Variables" at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) .
insert image description here
Instead of using a corpus of exhaustively annotated "sentiment" to train the system, they provided an adversarial autoencoder (AAE). This is a publicly available dataset of 10,000 utterances from 10 different speakers.

They found that the neural network was 4 percent more accurate at judging the emotional valence, or emotional value, in people's voices. With a team effort, the user's mood or emotional state can be reliably determined from the user's voice.
insert image description here
Schematic diagram of the principle of the AAE model

Co-author of the paper, Viktor Rozgic, senior applied scientist in the Alexa Speech group, explained that an adversarial autoencoder is a two-part model that includes an encoder — an encoder and a decoder. The encoder learns to generate a compact (or latent) representation of the input speech, encoding all the properties of the training examples; while the decoder reconstructs the input from the compact representation.
insert image description here
Architecture against autoencoders

The researchers' emotion representation consists of three network nodes for three emotion measures: valence, activation (whether the speaker is alert, engaged, or passive), and dominance (whether the speaker feeling controlled by the surrounding situation).

Training takes place in three phases. The first stage trains the encoder and decoder separately using unlabeled data. The second stage is adversarial training, a technique in which the adversarial discriminator tries to distinguish the real representation produced by the encoder from the artificial representation, which is used to adjust the encoder. In the third stage, the encoder is adjusted to ensure the latent emotion representation to predict the emotion label of the training data.

To capture information about speech signals, in "hand-engineered" experiments involving sentence-level feature representations, their AI system was 3% more accurate than traditionally trained networks in assessing valence.

Furthermore, they show a 4% improvement in performance when feeding the network a sequence representing the acoustic properties of 20-millisecond frames (or audio clips).

MIT lab builds neural network to perceive anger in 1.2 seconds

Amazon isn't the only company working on improved voice-based emotion detection. MIT Media Lab Affectiva recently demonstrated a neural network, SoundNet, that could classify anger with audio data, regardless of language, in just 1.2 seconds (beyond the time it takes humans to perceive anger).
insert image description here
Researchers at Affectiva describe the system in a new paper , "Transfer Learning From Sound Representations For Anger Detection in Speech." It builds on voice and facial data to create emotional profiles.

To test the generalizability of the AI ​​model, the team evaluated a model trained on Mandarin speech emotion data (the Mandarin Affective Corpus, or MASC), using the model trained in English. It turns out that not only does it generalize well to English speech data, it also works well on Chinese data, albeit with a slight drop in performance.
insert image description here
The ROC curve of the training results for English and Chinese, the dotted line represents the ROC of the random classifier

"Recognizing anger has a wide range of applications, including conversational interfaces and social bots, interactive voice response (IVR) systems, market research, customer agent assessment and training, and virtual and augmented reality," the team said.

Future work will develop other large public corpora and train AI systems for related speech-based tasks, such as recognizing other types of emotions and affective states.

Israeli app recognizes emotions: 80% accuracy

Israeli start-up company Beyond Verbal has developed an application called Moodies, which can collect the speaker's voice through a microphone, and after about 20 seconds of analysis, judge the speaker's emotional characteristics.
insert image description here
Moodies has a set of special algorithms, and the software analyzes the speaker's emotional dimensions such as rhythm, timing, voice volume, pauses, and energy

While speech analysis experts acknowledge the correlation between language and emotion, many question the accuracy of such real-time measurements—these tools collect very limited sound samples, and actual analysis may take years of samples.

"At the current level of cognitive neuroscience, we simply don't have the technology to truly understand a person's thoughts or emotions." said Andrew Baron, an assistant professor of psychology at Columbia University.

However, Dan Emodi, vice president of marketing at Beyond Verbal, said that Moodies has been researched for more than three years. According to user feedback, the accuracy rate of application analysis is about 80%.

Beyond Verbal said that Moodies can be applied to self-emotional diagnosis, customer service center handling customer relations and even detecting whether job applicants are lying. Of course, you can also bring them to the dating scene to see if the other party is really interested in you.

Speech emotion recognition still faces challenges

Although many technology companies have been doing research in this area for many years, they have also achieved good results. However, as questioned by Andrew Baron above, the technology faces several challenges.

Just like a girlfriend's calm "I'm not angry" doesn't mean she's really not angry, one pronunciation can contain multiple emotions, and the boundaries of different emotions are difficult to define. Which emotion is the current dominant emotion?

You can click here to view the hilarious video released by a domestic speech emotion recognition product.

Not all tone is as obvious and intense as in the video. Expressing emotion is a very personal thing, which varies greatly according to the individual, the environment and even the culture.

In addition, an emotion may last for a long time, but there will also be rapidly changing emotions during the period. Does the emotion recognition system detect long-term emotions or short-term emotions? For example, a person is limited to the pain of unemployment, but he is temporarily happy because of the concern of his friends, but in fact he is still in a sad state. How should AI define his state?

Another worry is that when these products can understand people's emotions, will they ask more privacy questions and obtain various information about users because of users' dependence on them, so as to make "services" Become a "buy and sell"?

May you have Dabai and someone who really understands you

Many people want to have a warm and caring Dabai. Will this robot with high EQ only found in sci-fi animations come true in the future?
insert image description here
Talk to Xiaobing very low and slowly and get a ruthless answer

At present, many chatbots still do not have emotional intelligence, cannot perceive the small emotions of users, and often chat to death. Therefore, the one who can really understand you is still the person who is by your side and listens to you.

Guess you like

Origin blog.csdn.net/HyperAI/article/details/94737043