Identifying "ChatGPT fraud", the effect surpasses OpenAI, and the AI generated detector is here!

 Datawhale dry goods 

AI Generated Detector , Editor: Heart of the Machine

The success rate of AI fraud is very high. A few days ago, "Cheating 4.3 million in 10 minutes" was also on the hot search. On top of the hottest big language models, researchers recently explored a recognition method.

With the continuous improvement of generative large models, the corpus generated by them is gradually approaching humans. Although the large model is liberating the hands of countless documents, it has also been used by some criminals with its strong ability to make fakes real, causing a series of social problems:

3fde92719138cf6e3f384c2b73d66cc7.jpeg

92773f75bbe97b12b3a93f8cf0e277eb.png

655cdad88048aa2d2ace416b57be9360.jpeg

Researchers from Peking University and Huawei have proposed a reliable text detector that recognizes various AI-generated corpus. According to the different characteristics of long and short texts, a PU learning-based multi-scale AI generative text detector training method is proposed. Through the improvement of the detector training process, a considerable improvement in the detection ability of long and short ChatGPT corpora can be achieved under the same conditions, which solves the pain point of the current detector's low recognition accuracy for short text.

1c38596b73f673cac41a58cce72eeaa3.jpeg

  • Paper address: https://arxiv.org/abs/2305.18149

  • Code address (MindSpore): https://github.com/mindspore-lab/mindone/tree/master/examples/detect_chatgpt

  • Code address (PyTorch): https://github.com/YuchuanTian/AIGC_text_detector

introduction

As the generation effect of large language models becomes more and more realistic, there is an urgent need for a reliable AI-generated text detector in all walks of life. However, different industries have different requirements for detection corpus. For example, in academia, it is generally necessary to detect large sections of complete academic texts; on social platforms, it is necessary to detect relatively short and fragmented fake news. However, existing detectors are often unable to accommodate various requirements. For example, some mainstream AI text detectors generally have poor prediction ability for shorter corpus.

For the different detection effects of different lengths of corpus, the author observed that there may be some "uncertainty" in the attribution of shorter AI-generated texts; or more bluntly, because some AI-generated short sentences are also often used by humans. Therefore, it is difficult to determine whether the short text generated by AI comes from humans or AI. Here are a few examples of both humans and AI answering the same question:

97d33715693785e05e2d6100ef7bf0d4.jpeg

From these examples, it can be seen that it is difficult to identify short answers generated by AI: the difference between such corpus and humans is too small, and it is difficult to strictly judge its true attributes. Therefore, it is inappropriate to simply label short texts as human/AI and perform text detection as a traditional binary classification problem.

In response to this problem, this study transforms the human/AI binary classification detection part into a partial PU (Positive-Unlabeled) learning problem, that is, in shorter sentences, the human language is positive and the machine language is non-positive. Labeled class (Unlabeled), which improves the trained loss function. This improvement considerably improves the classification performance of the detector on various corpora.

algorithm details

Under the traditional PU learning setting, a binary classification model can only learn from positive training samples and unlabeled training samples. A commonly used PU learning method is to estimate the binary classification loss corresponding to the negative sample by formulating the PU loss:

1ce34b7ff4ffeab4502d8291e75c84e3.jpeg

Among them, 668bf1c543bec8044b06a6ea5ea3006c.jpegindicates the binary classification loss calculated by positive samples and positive labels; da75f7dd4b55cb045f8868c20dabe25c.jpegindicates the binary classification loss calculated by assuming all unlabeled samples as negative labels; 28084268036f4fdc3275742650103e83.jpegindicates the binary classification loss calculated by assuming positive samples as negative labels; 18e70957b74b8325bb7cb7ec5661cfbf.jpegindicates the prior positive Sample probability, that is, the estimated proportion of positive samples in all PU samples. In traditional PU learning, the prior is usually 369f8050f13fdbd92aed94d8e216f044.jpegset as a fixed hyperparameter. However, in the text detection scenario, the detector needs to process texts of different lengths; and for texts of different lengths, the estimated proportion of its positive samples in all PU samples of the same length as the sample is also different. . Therefore, this study improves PU Loss and proposes a length-sensitive multi-scale PU (MPU) loss function.

Specifically, this study proposes an abstract recurrent model to model shorter text detection. Traditional NLP models usually have a Markov chain structure when processing sequences, such as RNN, LSTM, etc. This process of this type of cyclic model can usually be understood as a gradual iterative process, that is, the prediction of each token output is obtained by transforming and merging the prediction results of the previous token and the previous sequence with the prediction results of this token. That is the following process:

6c223e09ef0dbb3656205f90601b45a4.jpeg

In order to estimate the prior probability based on this abstract model, it is necessary to assume that the output of the model is the confidence that a certain sentence is a positive class (Positive), that is, the probability of judging that it is a sample spoken by a person. Assume that the contribution of each token is inversely proportional to the length of the token in the sentence, which is positive or unlabeled, and the probability of being unlabeled is far greater than the probability of being positive. Because as the vocabulary of the large model gradually approaches that of humans, most of the vocabulary will appear in both AI and human corpus. According to the simplified model and the set positive token probability, the final prior estimate is obtained by finding the total expectation of the model output confidence under different input situations.

32b8e00ed93a725c0fccd14b90bdbe32.jpeg

Through theoretical derivation and experiments, it is estimated that the prior probability increases with the increase of the text length, and finally stabilizes gradually. This phenomenon is also expected, because as the text gets longer, the detector can capture more information, and the "source uncertainty" of the text gradually weakens:

12938de2c01e4c03e31ca3f30480527d.jpeg

Afterwards, for each positive sample, the PU loss is computed based on the unique prior derived from its sample length. Finally, since the shorter text has only some "uncertainty" (that is, the shorter text will also contain some human or AI text features), the weighted addition of the binary classification loss and MPU loss can be used as the final optimization goal:

48ddbabfd519e0b234ea79eb4b22eb9a.jpeg

In addition, it should be noted that MPU loss is adapted to training corpora with a variety of lengths. If the existing training data is obviously simplistic, and most of the corpus is a large length of text, the effectiveness of the MPU method cannot be fully utilized. In order to make the length of the training corpus more diverse, this study also introduces a multi-scale module at the sentence level. This module randomly covers some sentences in the training corpus, and reorganizes the remaining sentences while retaining the original order. After the multi-scale operation of the training corpus, the length of the training text has been greatly enriched, thus making full use of PU learning for AI text detector training.

Experimental results

a8153771c4c91754dff3d03296bc20d0.jpeg

As shown in the above table, the author first tested the effect of MPU loss on the shorter AI-generated corpus dataset Tweep-Fake. The corpus in this data set are relatively short segments on Twitter. On the basis of traditional language model fine-tuning, the author replaces the traditional binary classification loss with an optimization target containing MPU loss. The improved language model detector is more effective than other baseline algorithms.

393d6b056dff6f2e66f66a9a73e3bb46.jpeg

The author also tested the text generated by chatGPT. The language model detector obtained by traditional fine-tuning performed poorly on short sentences; the detector trained by the MPU method under the same conditions performed well on short sentences and was able to The complete corpus has achieved considerable improvement, and the F1-score has increased by 1%, surpassing SOTA algorithms such as OpenAI and DetectGPT.

543ed3a0c531b95305a8491aeaf17307.jpeg

As shown in the above table, the author observed the effect gain brought by each part in the ablation experiment. MPU loss strengthens the classification effect of long and short data.

d720070ac8abd8fc8d0ec753b7a85902.jpeg

The author also compared traditional PU and Multiscale PU (MPU). It can be seen from the above table that the MPU is more effective and can better adapt to the task of AI multi-scale text detection.

Summarize

By proposing a scheme based on multi-scale PU learning, the author solves the problem of short sentence recognition for text detectors. With the proliferation of AIGC generation models in the future, the detection of such content will become more and more important. This research has taken a solid step forward on the issue of AI text detection. It is hoped that there will be more similar research in the future to better control AIGC content and prevent the abuse of AI-generated content.

eeafa5531daf42921cbf026501f7cf0a.png
Dry goods learning, like three times

Guess you like

Origin blog.csdn.net/Datawhale/article/details/131027532