Big language model can't save "Internet mobs"

d806d35fd085c374d86c141522c95759.jpeg

Words can kill - in the Internet age, I believe no one will deny this.

Verbal attacks are one of the most representative forms of cyber violence. Swearing at the mother who just lost her son, slandering the girl's pink hair, mocking the masculinity of being "too motherly", concocting unfounded pornographic rumors... Countless insulting words, rampant in the Internet, have brought endless spirit to others harm.

Language violence has become a global problem in Internet governance.

5146166e235f975fce73c32342fd7dc7.png

Various plans have been resorted to, but none of them can effectively prevent the increase of "Internet mobs" and the rampant language violence. Among them, the solution at the technical level is to use AI algorithms to automatically detect toxic language, set toxicity scores according to aggressiveness, and carry out preventive treatment of highly toxic language, such as shielding and psychological intervention.

However, due to the ambiguity of the language, the previous machine learning algorithms are not robust, and it is easy to make wrong judgments, which leads to unsatisfactory results of identification and intervention, and still requires a large number of human auditors. Not only is the process inefficient, but chronic reading of toxic language can harm the mental health of human auditors.

Large language models such as ChatGPT, with their strong robustness and generalization capabilities, have demonstrated unprecedented language understanding.

It stands to reason that in line with the purpose of "technology for good", large language models should be used to prevent cyber violence more effectively and efficiently, but why have we not seen related applications so far? On the contrary, "technical evil" that uses large language models to generate more harmful content is popular.

A big language model can't save the "Internet mob". Are we destined to "survive digitally" in a toxic network environment?

b450e2746b8c92db062bbe22c9f65e6a.png

big language model

A big step forward for content detection technology

Prevention is the most important link in the governance of cyber violence. The use of AI content detection to prevent cyberbullying has been studied for several years.

In 2015, it was proposed that there is a significant correlation between the individual's emotional state and harmful intentions, and the use of machine learning to detect harmful behavior in social media is considered a good indicator for the detection of online violence.

That is to say, when a person encounters upheavals, ups and downs in life, or feels depressed, depressed, or other emotional states, it is easy to utter offensive language such as hatred, attacks, and slander on the Internet.

In 2017, Google's Jigsaw created Conversation AI, which detects toxic comments online. Many tech giants, which have been incorporating algorithms into their content moderation for years, have a means of identifying and filtering the content of online information. For example, a short video platform in China has developed more than 100 intelligent recognition models to block abusive content in advance, but the platform is still the "hardest hit area" of cyberbullying. A question-and-answer platform will identify the content of comments, remind the content of risks, and not allow it to be posted until the user modifies it.

1e874eaa711ce4a2d2b7aeeff4e0c6ad.png

But it is obvious that these AI detection algorithms have not eradicated cyberbullying, and netizens still criticize the platform’s governance of cyberbullying as “inaction” and “no effect”. One of the reasons is that traditional machine learning algorithms cannot meet the review requirements of network content:

1. Lack of understanding. Harmful language is very difficult to distinguish, and the semantic understanding of AI algorithms is not strong enough, often giving the same score to harmful and harmless comments, not really filtering out those disrespectful comments, or giving lower scores to neutral sentences , filtering normal comments that should not be filtered, hindering the communication between bloggers and fans.

c6f6d9f173ae5d4317752556bea63252.png

2. Insufficient flexibility. Some sites may require detection of offensive language but not gossip, while other sites may require the opposite. Traditional AI detection tools often use a general "toxicity score", which is not flexible enough to meet the different needs of different platforms.

3. The update speed is not enough. Many detection algorithms use APIs to work, and these models are trained through a large amount of network data, and they perform well on examples similar to the training data. Once they encounter unfamiliar toxic language, such as discussions involving fan circles, there will be a lot of black words , yyds and other pinyin abbreviations, as well as constantly new words, are likely to fail. A social media platform set up more than 100 prohibited keywords at the beginning, such as some swear words, "green tea bitch", "why don't you die", etc., and now it has increased to more than 700. Therefore, the AI ​​model lacks efficient and real-time human feedback, and cannot quickly fine-tune and iterate the model, resulting in poor automatic detection results.

We know that the large language model has the characteristics of intelligent emergence, pre-training, and reinforcement learning of human feedback, which has brought great help to traditional methods, stronger language understanding ability, and the use of general models can quickly train the accuracy Higher customized models, and at the same time relying on human feedback to check for gaps and fill in gaps, to obtain better and faster detection results.

Preventing cyber violence has become the focus of Internet governance in various countries, and the platform can also establish a better community atmosphere. Therefore, the big language model should be able to play a big role in detecting harmful language.

But why in this wave of large language models, we rarely see the exploration of using LLM to prevent cyber violence?

AI, a small step to prevent language violence

9a7c3671e9d51e28d40d4d39f705f9b0.png

From the perspective of the AI ​​technology system, from traditional NLP to large language models, it is a leap forward in natural language understanding. But in the larger reality, AI's big step is only to advance the problem solving a little bit.

The effect cannot be said to be non-existent, but it is also very limited. In response to language attacks in the network, the power of AI is still weak.

First, the number of enemies is too large.

Many times, ordinary people like you and me become accomplices in cyberbullying, said Dennis Cu, of Cornell University's Department of Information Sciences. When a large number of Internet users cannot alleviate their own grievances and dissatisfaction, they will not be used to things around them, and use words to attack others on the Internet to relieve negative emotions.

Previously, "Sanlian Life Weekly" had a report that a victim of cyber violence had died, and some perpetrators contacted by the author responded that they "forgot what they did at that time".

Many cyberbullyers usually seem to be very normal. At certain moments and after some accidental events, they will briefly become "language demons", and then "get rid of their clothes and hide their names deeply". Even if it is AI, It is also difficult to determine in a timely and accurate manner who may be attacked.

In addition, verbal attacks are increasingly stealthy.

With the development of AI automatic detection technology today, some obviously harmful speech, such as threats, obscurity, insults, etc., can be directly blocked, but the "creativity" of human beings to hurt people with words is very great, and many of them are seen by machines. The neutral language may also be full of maliciousness.

For example, the mother who lost her child in the previous school accident was commented a lot on "Why doesn't she look sad" and "Why does she have the intention to dress up?" It seems that there is no insulting language, but these questions add up to form "moral judgment" on victims.

bd774141e8ba91ceb9b8e77d58894383.png

For covert offensive language, the current NLP model still has relatively large limitations. The actual and subtle meaning behind the language is difficult to be captured, and human intervention is still required.

However, platforms do not have a common judgment system for monitoring language violence, and each platform often decides on its own. For example, Zhihu will judge behaviors such as evil ways of privacy, insulting and swearing, labeling, and posting negative labels. Douban will deal with sarcasm, trolling, flirting, discrimination and prejudice. However, these standards have a lot of subjective elements, so everyone will see the phenomenon of "missing everything", some very normal speeches are blocked, and some obviously inciting speeches are not dealt with in a timely manner.

In addition, the "Balkanization" of network information.

Balkanization refers to a situation in which some small countries are fragmented and hostile to each other or do not cooperate. A study shows that although the Internet has eliminated geographical barriers and allowed people in different regions to communicate with each other at low cost, it has caused a conceptual "Balkanization", and the separation and fragmentation of public opinion is becoming more and more serious.

The push mechanism of network information, the algorithm design is not scientific enough, and the preference setting is too narrow, using keyword association, address book association, graph network and other filtering methods, similar to "eat a steamed bun = like to eat steamed buns = come to another hundred steamed buns"" Your mother loves steamed buns, so you must too." "Steamed buns = more suitable toast for Chinese babies = look at the toast". People stay within a limited range of information for a long time, rarely read information other than what they are interested in, and the conceptual gap between them and other groups will become wider and wider.

The "Balkanization" of the information acquisition mechanism will lead to the "polarization" of public opinion, that is, the repeated fermentation of a point of view, which will lead to large-scale follow-up behavior, and the risk of cyberbullying will also increase.

Their sheer volume, difficulty of identification, and serious polarization have turned the Internet into a playground for negative language.

a9e9a71e211a0a77476f74c8644b54dd.png

Beyond technology, do more

Of course, AI prevents Internet violence from being difficult and long-term, but we cannot give up our efforts on this.

The emergence of large language models brings more powerful automatic detection potential. Based on the general model, media organizations can train large industry models with higher precision and stronger recognition capabilities, use human expertise to enhance the model effect, and create AI detection models with human intelligence to support more complex content understanding and review decisions , to improve the detection efficiency of harmful content.

In addition to upgrading technology, more must be done. Prevention of cyber violence is not so much a technical issue as it is a social issue. If the network information environment does not change, the attack language will continue to mutate, increasing the difficulty and cost of technical detection, which is unbearable for users, platforms, and society.

But so far, many governance methods have not been very effective.

b05dfdcdfc8ede6204d54214ebeaa5ec.png

For example, online anonymity is an "invisibility cloak" for violence, so the real-name system has become an important means of governance, but the effect is not satisfactory. South Korea was the first country to implement the Internet real-name system. It proposed to implement the Internet real-name system in October 2005. However, according to South Korean statistics, after the real-name system, Internet infringements dropped from the original 13.9% to 12.2%, a drop of only 1.7%.

Legislation is also expected. Countries around the world are constantly introducing laws and regulations. South Korea’s Criminal Law imposes a maximum sentence of seven years in prison for cyber violence. Our country’s criminal law and civil law also have corresponding regulations. It is not impossible to govern cyber violence. Legislation is easy, but enforcement is difficult.

The network environment is complex, and it is difficult to determine the originator of cyberbullying attacks. Cyberbullying is generally formed by the accumulation of a large number of post comments and other offensive behaviors. It is very difficult to collect evidence and is easily lost. The cost of protecting the rights of victims of violence is too high, and most of them end up with nothing. It is difficult to bring about actual punishment to the perpetrators, which encourages the fluke mentality of "the law does not blame the public".

To change the problem of "the law does not blame the crowd", the fundamental solution is to eliminate the "crowd" who "unconsciously follow the trend".

Cyberbullying cannot be done by one person alone. Apart from a few publishers, a large number of offensive speeches come from followers who are above the trend, and are the result of collective irrational actions of netizens.

In the one-way communication in the age of newspapers and television, only a few groups have the opportunity to speak and comment, and when the public communicates face-to-face online, they will not easily insult and attack others. In the Internet age, with the popularization of smart phones, everyone can directly express their opinions on the Internet. Once the media literacy cannot keep up and the ability to identify information is not enough, then in the face of network information that is difficult to distinguish between true and false, inflammatory Language, it is easy to lose control of impulse and unconsciously join the army of cyberbullying.

When many people comment, they don't necessarily think rationally and make judgments. They just see what the bloggers they follow say, or many people are crusading, so they follow the trend and criticize, which escalates the violence on the Internet.

In this regard, blaming the "cyber rioters" for extreme behavior will form a new "cyber violence". "Defeating magic with magic" will seriously disrupt the ecology of online discourse. Many occasional "verbal violence" can be avoided by improving personal media literacy.

This requires professional media organizations and relevant departments to invest more media resources to help people acquire higher media literacy and achieve a more civilized and friendly "digital survival" in the face of the Internet age.

9d6a6ba5d0c985cb78a658304ceedc08.png

There is some kind of violent urge deep inside every human being. As Teacher Luo Xiang said, "We are far more hypocritical and dark than we imagined, and everyone has a Zhang San hidden in their hearts."

When rationality rises, when a person acquires the ability of self-control, then "irrational" violence will definitely decrease. Compared with the reins of AI, what can really eliminate cyberbullying is the moral law in everyone's heart.

9438541ca360525e56b9a438b9edf98b.gif

Guess you like

Origin blog.csdn.net/R5A81qHe857X8/article/details/131671680