Identifying "ChatGPT fraud", the effect surpasses OpenAI: Peking University and Huawei's AI generation detectors are here

The source of this article is the editorial department of the heart of the machine

The success rate of AI fraud is very high. A few days ago, "Cheating 4.3 million in 10 minutes" was also on the hot search. On top of the hottest big language models, researchers recently explored a recognition method.

With the continuous improvement of generative large models, the corpus generated by them is gradually approaching humans. Although the large model is liberating the hands of countless documents, it has also been used by some lawbreakers with its strong ability to fake the real one, causing a series of social problems:

15524704dbe6ccf65a0e8f265300249b.jpeg

f98f0c3ec9c6db062fee9444ff7088c5.png

7c1574242fc941f0a49220116f2b2e48.jpeg

Researchers from Peking University and Huawei have proposed a reliable text detector that recognizes various AI-generated corpus. According to the different characteristics of long and short texts, a PU learning-based multi-scale AI generative text detector training method is proposed. Through the improvement of the detector training process, a considerable improvement in the detection ability of long and short ChatGPT corpora can be achieved under the same conditions, which solves the pain point of the current detector's low recognition accuracy for short text.

bfb2d9f711ff3b65a723f71df9143fd3.jpeg

  • Paper address: https://arxiv.org/abs/2305.18149

  • Code address (MindSpore): https://github.com/mindspore-lab/mindone/tree/master/examples/detect_chatgpt

  • Code address (PyTorch): https://github.com/YuchuanTian/AIGC_text_detector

introduction

As the generation effect of large language models becomes more and more realistic, there is an urgent need for a reliable AI-generated text detector in all walks of life. However, different industries have different requirements for detection corpus. For example, in academia, it is generally necessary to detect large sections of complete academic texts; on social platforms, it is necessary to detect relatively short and fragmented fake news. However, existing detectors are often unable to accommodate various requirements. For example, some mainstream AI text detectors generally have poor prediction ability for shorter corpus.

For the different detection effects of different lengths of corpus, the author observed that there may be some "uncertainty" in the attribution of shorter AI-generated texts; or more bluntly, because some AI-generated short sentences are also often used by humans. Therefore, it is difficult to determine whether the short text generated by AI comes from humans or AI. Here are a few examples of both humans and AI answering the same question:

eaa52a0e3280769abf7ba2a262cba1e6.jpeg

From these examples, it can be seen that it is difficult to identify short answers generated by AI: the difference between such corpus and humans is too small, and it is difficult to strictly judge its true attributes. Therefore, it is inappropriate to simply label short texts as human/AI and perform text detection as a traditional binary classification problem.

In response to this problem, this study transforms the human/AI binary classification detection part into a partial PU (Positive-Unlabeled) learning problem, that is, in shorter sentences, the human language is positive and the machine language is non-positive. Labeled class (Unlabeled), which improves the trained loss function. This improvement considerably improves the classification performance of the detector on various corpora.

algorithm details

Under the traditional PU learning setting, a binary classification model can only learn from positive training samples and unlabeled training samples. A commonly used PU learning method is to estimate the binary classification loss corresponding to the negative sample by formulating the PU loss:

6c56c86f2eef278f8151cf507d9006bd.jpeg

Among them, 8fe9070532b4ed1e32a24e1f382c2485.jpegindicates the binary classification loss calculated by positive samples and positive labels; 81c886df21dc0de3c8af4dc19f368947.jpegindicates the binary classification loss calculated by assuming all unlabeled samples as negative labels; 6c69a889e39f55e4c2294061c3fff798.jpegindicates the binary classification loss calculated by assuming positive samples as negative labels; 9f56140c5948beb3d105c202bf6a04fa.jpegindicates the prior positive Sample probability, that is, the estimated proportion of positive samples in all PU samples. In traditional PU learning, the prior is usually 6dd595e9ca1bf4c9225c42eb008180f0.jpegset as a fixed hyperparameter. However, in the text detection scenario, the detector needs to process texts of different lengths; and for texts of different lengths, the estimated proportion of its positive samples in all PU samples of the same length as the sample is also different. . Therefore, this study improves PU Loss and proposes a length-sensitive multi-scale PU (MPU) loss function.

Specifically, this study proposes an abstract recurrent model to model shorter text detection. When a traditional NLP model processes a sequence, it usually has a Markov chain structure, such as RNN, LSTM, etc. This process of this type of cyclic model can usually be understood as a gradual iterative process, that is, the prediction of each token output is obtained by transforming and merging the prediction results of the previous token and the previous sequence with the prediction results of this token. That is the following process:

4744ade2405d67fdec22f85435ec03d6.jpeg

In order to estimate the prior probability based on this abstract model, it is necessary to assume that the output of the model is the confidence that a certain sentence is a positive class (Positive), that is, the probability of judging that it is a sample spoken by a person. Assume that the contribution of each token is inversely proportional to the length of the token in the sentence, which is positive or unlabeled, and the probability of being unlabeled is far greater than the probability of being positive. Because as the vocabulary of the large model gradually approaches that of humans, most of the vocabulary will appear in both AI and human corpus. According to the simplified model and the set positive token probability, the final prior estimate is obtained by finding the total expectation of the model output confidence under different input situations.

4523ae95825e07ee72e1548cc1b809a8.jpeg

Through theoretical derivation and experiments, it is estimated that the prior probability increases with the increase of the text length, and finally stabilizes gradually. This phenomenon is also expected, because as the text gets longer, the detector can capture more information, and the "source uncertainty" of the text gradually weakens:

9a06852eec489b18f030531f4ff5a8f4.jpeg

Afterwards, for each positive sample, the PU loss is computed based on the unique prior derived from its sample length. Finally, since the shorter text has only some "uncertainty" (that is, the shorter text will also contain some human or AI text features), the weighted addition of the binary classification loss and MPU loss can be used as the final optimization goal:

fb0b446f5c951bce1f5592371324284d.jpeg

In addition, it should be noted that MPU loss is adapted to training corpora with a variety of lengths. If the existing training data is obviously simplistic, and most of the corpus is a large length of text, the effectiveness of the MPU method cannot be fully utilized. In order to make the length of the training corpus more diverse, this study also introduces a multi-scale module at the sentence level. This module randomly covers some sentences in the training corpus, and reorganizes the remaining sentences while retaining the original order. After the multi-scale operation of the training corpus, the length of the training text has been greatly enriched, thus making full use of PU learning for AI text detector training.

Experimental results

8449ea6096f7ef7ce314aaabd6b84345.jpeg

As shown in the above table, the author first tested the effect of MPU loss on the shorter AI-generated corpus dataset Tweep-Fake. The corpus in this data set are relatively short segments on Twitter. On the basis of traditional language model fine-tuning, the author replaces the traditional binary classification loss with an optimization target containing MPU loss. The improved language model detector is more effective than other baseline algorithms.

929bc4a0ab2d6655be7beaad667142a9.jpeg

The author also tested the text generated by chatGPT. The language model detector obtained through traditional fine-tuning performed poorly on short sentences; The complete corpus has achieved considerable improvement, and the F1-score has increased by 1%, surpassing SOTA algorithms such as OpenAI and DetectGPT.

feac0c52a1dcd53a57d809ce14052c45.jpeg

As shown in the above table, the author observed the effect gain brought by each part in the ablation experiment. MPU loss strengthens the classification effect of long and short data.

707d9b3cc377cb27b2a956a16a7f4d12.jpeg

The author also compared traditional PU and Multiscale PU (MPU). It can be seen from the above table that the MPU is more effective and can better adapt to the task of AI multi-scale text detection.

Summarize

By proposing a scheme based on multi-scale PU learning, the author solves the problem of short sentence recognition for text detectors. With the proliferation of AIGC generation models in the future, the detection of such content will become more and more important. This research has taken a solid step forward on the issue of AI text detection. I hope that there will be more similar research in the future to better control AIGC content and prevent the abuse of AI-generated content.

Pay attention to the official account [Machine Learning and AI Generation Creation], more exciting things are waiting for you to read:

Simple explanation of stable diffusion: Interpretation of the potential diffusion model behind AI painting technology

In-depth explanation of ControlNet, a controllable AIGC painting generation algorithm! 

Classic GAN has to read: StyleGAN

c6f7bb30ec3124917a6a4b6f05844a59.png Click me to view GAN's series albums~!

Take out a lunch, become the frontier of CV vision!

The latest and most complete 100 summary! Generate Diffusion Models Diffusion Models

ECCV2022 | Summary of some papers on generating confrontation network GAN

CVPR 2022 | 25+ directions, the latest 50 GAN papers

 ICCV 2021 | Summary of GAN papers on 35 topics

Over 110 articles! CVPR 2021 most complete GAN paper combing

Over 100 articles! CVPR 2020 most complete GAN paper combing

Dismantling the new GAN: decoupling representation MixNMatch

StarGAN Version 2: Multi-Domain Diversity Image Generation

Attached download | Chinese version of "Explainable Machine Learning"

Attached download | "TensorFlow 2.0 Deep Learning Algorithms in Practice"

Attached download | "Mathematical Methods in Computer Vision" share

"A review of surface defect detection methods based on deep learning"

A Survey of Zero-Shot Image Classification: A Decade of Progress

"A Survey of Few-Shot Learning Based on Deep Neural Networks"

"Book of Rites·Xue Ji" has a saying: "Learning alone without friends is lonely and ignorant."

Click for  a lunch delivery and become the frontier of CV vision! , receive coupons, and join  the planet of AI-generated creation and computer vision  knowledge!

Guess you like

Origin blog.csdn.net/lgzlgz3102/article/details/131058770