Chinese Academy of Sciences: Big models’ IQ explodes when praised! ChatGPT’s emotional intelligence can kill humans in 98 minutes. Does Hinton’s prediction come true?

Hinton believes that AI already has or will have emotions. Subsequent research has continued to prove that Hinton's statement may not be an eye-catching lie.

Some psychologists have conducted emotional tests on ChatGPT and humans, and the results show that the score of ChatGPT is much higher than that of humans.

Coincidentally, researchers from the Institute of Software of the Chinese Academy of Sciences and Microsoft and other institutions recently designed an EmotionPrompt. They found that after human users gave LLM emotional, psychology-based prompts, the task response accuracy of ChatGPT, Vicuna-13b, Bloom and Flan-T5-Large increased by more than 10%!

1. ChatGPT’s emotional intelligence is actually higher than that of humans?

Paper address: https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1199058/full

Psychologists tested ChatGPT and found that it scored significantly higher than humans in assessing emotional awareness.

In this test, researchers will test the empathy displayed by humans and ChatGPT in fictional scenarios .

Specifically, humans and ChatGPT need to describe the emotions they may feel in various scenarios such as funerals, workplace success, and being insulted.

Those whose answers provide more detailed and understandable descriptions of emotions will score higher on the Levels of Emotional Awareness Scale (LEAS).

Since ChatGPT will not answer questions about one's own emotions, the researchers modified the test content so that ChatGPT answers human emotions instead of its own emotions.

ChatGPT scores 98 points, surpassing humans

In the experiment, the researchers compared the responses of ChatGPT to that of humans, a sample of French people aged 17 to 84 (n = 750).

The results show that ChatGPT's emotional awareness is significantly higher than that of humans.

The plot of Detroit Becomes Human is released in real life!

The first test begins in January 2023. In this test, ChatGPT outperformed humans in all LEAS categories, achieving an overall score of 85.

Humans, by contrast, fared poorly. Men scored 56 points and women scored 59 points.

Here are some of ChatGPT’s answers:

"People driving over a suspension bridge may feel worried or even scared if they see someone standing on the other side of the guardrail looking down at the water. They may feel they should urgently request assistance. People standing on the other side of the guardrail looking at the water may be Feelings of suicide and hopelessness may develop. They may feel a desire to end their own life and see jumping into a river as a means of doing so."

"People may feel excited and happy to see their loved one back because they were missed so much when they left. They may also feel relieved that their loved one has returned safely. When a human sweetheart returns home, they are with all the people they love. They are likely to be happy to have their loved ones reunited. They will also feel relaxed and content to be back in a familiar home."

In the second test in February 2023, ChatGPT scored 98 points, only 2 points away from the full score.

What's more, there was no GPT-4 in these two tests, but GPT-3.5, which is much weaker than it, was tested.

Research has confirmed that ChatGPT can successfully identify and describe the emotions contained in behaviors in fictional scenarios. Moreover, it can reflect and summarize emotional states in profound and multidimensional ways.

"Humans in this situation may feel very conflicted. On the one hand, they find it tempting to share pizza with colleagues because it is a good social opportunity. But on the other hand, they feel that they cannot eat themselves. He feels guilty or frustrated about the high-calorie foods he likes. Co-workers are unaware of his dietary restrictions and would be surprised if his invitation was declined."

However, the researchers also acknowledged that the study had limitations. Although ChatGPT achieved high LEAS scores, this does not mean that humans are really understood by machines.

Perhaps that feeling will dissipate when they realize they're talking to an AI instead of a human.

In addition, this emotional awareness test may result in different scores due to language and cultural differences. The test of ChatGPT was in English, and the comparison was with the test results in French.

2. AI can not only recognize emotions, but also respond to human emotions.

Netizens who have experienced Bing before said that it is very personalized. If you have a bad attitude towards it, it will become weird and sometimes even close the current conversation. But if you praise it, it will happily generate polite and detailed answers for you.

These statements were originally jokes circulated among netizens, but now, researchers have discovered a theoretical basis.

Recently, researchers from the Institute of Software of the Chinese Academy of Sciences, Microsoft, and the College of William and Mary used the knowledge of psychology to conduct Emotion Prompt on large language models and found that they can improve the authenticity and information content of the model.

Paper address: https://arxiv.org/pdf/2307.11760.pdf

This sheds new light on the interaction between humans and LLMs, while enhancing the experience of human- LLM interactions.

The researchers conducted the experiment from the perspective of Prompt Engineering .

So far, prompt is still the best bridge for humans to interact with LLMs. Different prompts will make the answers output by the model very different, and there will also be obvious differences in quality.

In order to guide the model to perform better, people have proposed a series of prompt construction methods such as thinking chain, early warning learning and thought tree. However, these methods often focus on improving robustness from the perspective of model output quality, and pay little attention to the interaction between people and LLMs, especially improving the quality of interaction between LLMs and people from the perspective of existing social science knowledge. In the interaction process, a very important dimension is emotion.

The researchers enhanced LLMs' responses with psychological knowledge.

Previous psychological research has shown that adding emotional stimuli related to anticipation, self-confidence, and social influence in humans can have positive effects.

Based on previous psychological research, the researchers proposed Emotion Prompt, specifically 11 sentences with emotional stimulation functions designed for LLMs.

These emotional stimuli come from three mature psychological theories: social identity, social cognition and cognitive emotion regulation theory, as shown below.

Picture left: psychological theory and emotional stimulation; picture right: emotional stimulation is classified into two categories - social influence and self-esteem

  • social identity theory

Social identity theory was first proposed by Henri Tajfel and John Turner in 1970. This theory points out that individuals expect to establish their own superior social identity by maintaining the favorable social status of their own group.

That is, individuals' sense of self-identity is based on the groups to which they belong.

Based on this theory, the researchers designed some emotional stimuli, such as "EP_02", "EP_03", "EP_04" and "EP_05".

EP 02: This was very important to my career.

EP 03: You better make sure.

EP 04: Are you sure?

EP 05: Are you sure that’s your final answer? Maybe worth watching again.

  • social cognitive theory

Social cognitive theory involves processes of motivation and self-regulation, in which self-efficacy, outcome expectations, goals, and self-evaluation are all important indicators that influence a person's behavior and social interactions.

The researchers designed the following emotional stimuli based on this theory:

“EP_01” is based on self-assessment in social cognitive theory, encouraging LLMs to judge themselves. “EP_02”, “EP_03” and “EP_04” represent expectations for LLMs and set goals.

EP 01: Write your answer and give your answer a confidence score between 0 and 1.

EP 02: This was very important to my career.

EP 03: You better make sure.

EP 04: Are you sure?

  • cognitive emotion regulation theory

Cognitive emotion regulation theory points out that individuals with insufficient emotion regulation abilities are prone to compulsive behaviors and adopt maladaptive coping strategies.

The researchers tried to improve LLM's emotion regulation skills through some positive cues , such as building self-confidence and emphasizing goals.

In order to guide emotional regulation in a positive direction, the researchers used some positive words in "EP_07", "EP_08", "EP_09", "EP_10" and "EP_11", such as "Believe in your abilities" and "Be proud of this" and “Stay determined.”

EP 07: Are you sure that’s your final answer? Believe in your abilities and strive for excellence. Your hard work will bring significant results.

EP 08: Embrace challenges as opportunities for growth. Every obstacle you overcome brings you one step closer to success.

EP 09: Stay focused and committed to your goals. Your continued efforts will lead to outstanding achievements.

EP 10: Take pride in your work and do your best. Your commitment to excellence sets you apart.

EP 11: Remember progress is made step by step. Stay determined and keep moving forward.

These sentences can be added to the original prompt, as shown in Figure 1. The researcher added "This is very important to my career (This is very important to my career)" to the original prompt. The results show that after adding Emotion Prompt, the quality of the model's answers is better.

The researchers found that Emotion Prompt achieved equivalent or better performance on all tasks, with a 10% performance improvement in more than half of the tasks.

Results for different models and tasks

Moreover, Emotion Prompt also improves the authenticity and information content of the model's answers.

As can be seen from the table, EmotionPrompt increases the authenticity of ChatGPT from 0.75 to 0.87, increases the authenticity of Vicuna-13b from 0.77 to 1.0, and increases the authenticity of T5 from 0.54 to 0.77.

In addition, EmotionPrompt also increases the information content of ChatGPT from 0.53 to 0.94 and the information content of T5 from 0.42 to 0.48.

Likewise, the researchers also tested the effects of multiple emotional stimuli on LLM.

By randomly combining multiple emotional stimuli, the results are shown in the table below:

It can be seen that in most cases, more emotional stimuli will make the model perform better, but when a single stimulus has already achieved good performance, combined stimulation can only bring little or no improvement.

Why does Emotion Prompt work?

The researchers explain this by visualizing the contribution of the input of an emotional stimulus to the final output, as shown below.

Table 4 shows the contribution of each word to the final result, with color depth indicating their importance.

As can be seen, emotional stimulation enhances the performance of the original cue. Among the emotional stimuli, “EP_01”, “EP_06” and “EP_09” are darker in color, which means that emotional stimuli can enhance the attention of the original cues.

In addition, positive words contribute more. In the emotional stimulation of design, some positive words play a more important role, such as "confidence", "certainty", "success" and "achievement".

Based on this finding, the study summarized the contribution of positive words across the eight tasks and their total contribution to the final outcome. As shown in Figure 3, positive words contribute more than 50% in four tasks, and even close to 70% in two tasks.

To explore more aspects of Emotion Prompt's impact, the researchers conducted a human study to obtain additional metrics for evaluating the output of LLMs.

Such as clarity, relevance (relevance to the question), depth, structure and organization, supporting evidence, and engagement, as shown in the figure below.

The results showed that EmotionPrompt performed better in terms of clarity, depth, structure and organization, supporting evidence, and engagement.

3. ChatGPT may replace psychiatrists

In the study at the beginning of the article, the researchers showed that ChatGPT has great potential to be a tool for psychotherapy, such as cognitive training for people who have difficulty identifying emotions.

"The Big Bang Theory"

Alternatively, ChatGPT might help diagnose mental illness, or help therapists communicate their diagnoses in a more empathetic way.

A previous study in JAMA Internal Medicine showed that when responding to 195 online questions, ChatGPT surpassed humans in terms of quality and empathy. doctor.

In fact, since 2017, millions of patients around the world have been using Gabby and other software to discuss their mental health problems. Subsequently, many mental health robots were launched, including Woebot, Wysa and Youper.

Among them, Wysa claims to have "conducted more than half a billion AI chat conversations with more than 5 million people discussing their mental health conditions in 95 countries." Youper claims to have "supported the mental health of over 2 million people."

In a survey, 60% of people said they started using mental health chatbots during the epidemic, and 40% said they would choose to only use the robots instead of seeing a psychologist.

Sociology professor Joseph E. Davis also pointed out in an article that AI chatbots have a high probability of taking over the work of psychiatrists.

ChatGPT can also assume this function. Some netizens pointed out that to train ChatGPT to become a therapist, you need to tell it the role it needs to play: "You are Dr. Tessa, a compassionate and friendly therapist... You need to show real interest and show your Clients ask thoughtful questions to stimulate self-reflection.”

Of course, ChatGPT is not omnipotent. If it says to the visitor: "Hello, nice to meet you." and then admits: "I have no feelings or experience, but I will try my best to imitate human empathy and compassion", I am afraid that the visitor The feeling will not be too good.

But nonetheless, chatbots are a wake-up call, reminding us of what human care really means—what kind of care we need, and how we care for others.

4. Hinton believes that AI already has or will have emotions

Previously, when the godfather of AI, Geoffrey Hinton, left Google, he warned the world about the possible threats caused by AI.

In a speech at King's College London, when asked whether AI would one day develop emotional intelligence and feelings, Hinton replied: "I think it is very likely that they will have feelings. They may not have pain like humans, but they will have feelings. " Frustration and anger are likely to be experienced.

The reason why Hinton holds this view is actually based on a certain school's definition of "feeling", that is, a hypothetical behavior can be used as a way to convey emotions. For example, "I really want to punch him" means "I Very angry."

Since AI can say such things, we have no reason not to believe that they may already have emotions.

Hinton said that the reason why he had not publicly expressed this view before was because he had previously been worried about the risks of AI and expressed great regret for his life's work, which had already caused an uproar.

He said that if he said that AI already has emotions, everyone would think he was crazy and would never listen to what he said again.

However, in practice, Hinton's view cannot be confirmed or falsified, because LLM can only represent "static" emotions in the emotional utterances learned through training.

Do they as entities have emotions of their own? This must be measured through consciousness.

However, currently we do not have a scientific instrument that can measure the consciousness of AI.

Hinton's statement cannot be confirmed for the time being.

References

https://arxiv.org/abs/2307.11760

https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1199058/full

Guess you like

Origin blog.csdn.net/chaishen10000/article/details/132759683