[Paper reading] (04) Is artificial intelligence really safe? Zhejiang University team shares AI adversarial sample technology at Bund Conference

AI Security at the Bund Conference - Offensive and Defensive Ways in the Intelligent Era
Deep Learning Security: From the NLP Perspective
Zhejiang University

Insert image description here

The series of "Xiu Zhang takes you to read papers" is mainly to urge yourself to read excellent papers and listen to academic lectures, and share it with everyone. I hope you like it. Since the author's English level and academic ability are not high and need to be continuously improved, I would like to ask everyone to criticize and correct me. You are very welcome to leave me a message and comment. I look forward to working with you on the academic road. Come on~


AI technology is booming. Whether it is financial services, offline life, or medical health, AI is involved. It is very necessary and important to protect the security of these AI systems. At present, AI security is a very new field and a hot topic that is of common concern to both academia and industry. This forum will invite experts in AI security to share and exchange their achievements in the intelligent era, and to promote and lead the development of the industry in the field of AI security. .

The title of this forum is "AI Security - Offensive and Defensive Ways in the Intelligent Era". Dean Wang Qian of Wuhan University shared the adversarial attack and defense of speech systems, researcher Ji Shouling of Zhejiang University shared security in NLP, and researcher Qin Zhan of Zhejiang University shared In order to understand the new attack and defense of data security in deep learning, Mr. Zong Zhiyuan from Ant Group shared the AI ​​security confrontation defense system, and Dean Ren Kui shared the AI ​​security white paper. This article mainly explains AI security and white paper related knowledge in NLP. I hope it will be helpful to you. These big guys are really worthy of our learning, and I would like to offer my little brother’s knees~fighting!

Insert image description here

PS: By the way, do you like sharing in this conference lecture format?
I am worried that the effect will not be good. If it is not good, I will not share and summarize similar conference knowledge. Welcome to leave me a comment.


Previous article recommendation:
[Xiuzhang takes you to read the paper] (01) What can I do to save my procrastination? How beginners can improve their interest in programming and get started with LATEX in detail
[Na Zhang takes you to read the paper] (02) SP2019-Neural Cleanse: Identifying and Mitigating Backdoor Attacks in DNN
[Na Zhang takes you to read the paper] (03) Teacher Zhang Chao of Tsinghua University - GreyOne : Discover Vulnerabilities with Data Flow Sensitive Fuzzing
[Na Zhang takes you to read the paper] (04) Is artificial intelligence really safe? Zhejiang University team shared AI adversarial sample technology at the Bund Conference
and detailed explanation of malicious code detection technology based on machine learning



1.AI Security White Paper

With the increasing development of artificial intelligence, technologies such as autonomous driving, face recognition, and speech recognition are widely used, which also brings serious AI security issues. Common security issues include:

  • Autonomous driving system incorrectly recognizes road signs
  • Natural language processing system error recognition semantics
  • Speech recognition system misrecognizes user commands

Insert image description here

Today’s AI security places great emphasis on four types of performance, including:

  • Confidentiality:
    the data and model information involved will not be disclosed to unauthorized persons
  • Integrity:
    Algorithm models, data, infrastructure and products are not maliciously implanted, tampered with, replaced or forged.
  • Robust performance
    can simultaneously resist complex environmental conditions and abnormal malicious interference
  • Privacy
    AI models can protect the data privacy of data subjects during use

AI attacks targeting these four properties emerge in endlessly, such as inference attacks, adversarial samples, poisoning attacks, model theft, etc.

Insert image description here

Therefore, Dean Ren Kui shared the "AI Security White Paper".

Insert image description here

Zhejiang University cooperated with Ant Group. They investigated more than 300 offensive and defensive technology research results published in international conferences and journals in the fields of security, artificial intelligence and other fields in recent years. They focused on the security threats and challenges in the three dimensions of model, data and load, and sorted out AI security attack and defense technology. Based on the security issues faced by AI technology in real scenarios, we summarized and proposed a one-stop security solution for AI application systems (AISDL), and jointly launched the "AI Security White Paper". The entire framework is shown below:

Insert image description here

After sorting out, they classified the threats faced by AI technology into three major categories, namely:

  • AI model security issues
    Model integrity threats => Data poisoning attacks
    Model robustness threats => Adversarial sample attacks
  • AI data security issues
    Model parameter leakage => Model substitution attack
    Data privacy leakage => Model reverse attack
  • AI-bearing system security issues
    Hardware device security issues => Circuit disturbance attacks
    System software security issues => Code injection attacks

Insert image description here

Before introducing the three security issues, the author first popularizes what is an adversarial example?
An adversarial example refers to an input sample that can cause a machine learning algorithm to output incorrect results after minor adjustments. In image recognition, it can be understood that a picture that was originally classified into one category (such as "panda") by a convolutional neural network (CNN) is suddenly mistakenly classified into another category after very subtle changes that are even imperceptible to the human eye. (e.g. "gibbon"). For another example, if a driverless model is attacked, the Stop sign may be recognized by the car as going straight or turning.

Insert image description here

Insert image description here

The classic process of adversarial examples is shown in the figure below - BadNets proposed by GU et al.
It injects the backdoor by poisoning the training data set as follows:

  • First the attacker selects a target label and trigger pattern, which is a collection of pixels and associated color intensities. The pattern may resemble any shape, such as a square.
  • Next, a random subset of training images are labeled with trigger patterns and their labels are modified to the target labels.
  • The DNN is then trained with the modified training data to inject the backdoor.

Since the attacker has full access to the training process, the attacker can change the structure of the training, such as the learning rate, the ratio of modified images, etc., so that the DNN attacked by the backdoor performs well on both clean and adversarial inputs. BadNets showed over 99% attack success rate (percentage of adversarial inputs misclassified) without affecting model performance in MNIST. The trigger (backdoor) in the lower right corner of the figure below causes the neural network training to learn the wrong category, predicting Label5 and Label7 as Label4.

Insert image description here

PS: In the next article, we will explain the AI ​​data security and AI voice security papers in detail. This article mainly focuses on sharing adversarial samples of NLP text. I hope you like it!


1.AI model security issues

(1) Model integrity threat => Data poisoning attack:
The attacker adds a small amount of poisoned data to the normal training set to destroy the model integrity and manipulate the AI ​​judgment results. Model drift will bias the model's classification of good and bad inputs, reducing the accuracy of the model. At the same time, backdoor attacks do not affect the normal use of the model, but only cause errors in the model in special scenarios set by the attacker.

Insert image description here

(2) Model robustness threat => Adversarial sample attack:
During the model testing phase, the attacker adds adversarial perturbations to the input samples to destroy the model robustness and manipulate the AI ​​judgment results.

  • Different restrictions:
    perturbation, adversarial patches, unrestricted adversarial attacks
  • Different threat models:
    white box attack, gray box attack, black box attack
  • Different application scenarios:
    image recognition, 3D object recognition, audio recognition, text classification

Insert image description here

Deep learning models usually suffer from the problem of lack of model robustness. On the one hand, due to changing environmental factors , including the unstable performance of AI models during real use, they are affected by light intensity, viewing angle distance, image affine transformation, and image resolution. and other effects, making it difficult for the training data to cover all real-life scenarios. On the other hand, the interpretability of the model is insufficient. The deep learning model is a black box with a huge number of model parameters and a complex structure. In the absence of malicious attacks, unexpected security risks may arise, hindering the application of AI technology in medical, transportation, etc. Used in highly sensitive scenarios.

The relevant work of Mr. Ren and his team includes distributed adversarial attacks and adversarial attacks on 3D point clouds.

Insert image description here


2.AI data security issues

AI data security simply means obtaining the parameters or data of the deep learning model by constructing a specific data set and combining the results of model predictions. As shown in the figure below, through model reverse attack to reconstruct the image, the deep learning model leaks sensitive information in the training data.

Insert image description here

AI data security includes model parameter leaks and training data leaks, as shown in the figure below. Model parameter leakage attack methods include equation solving attacks, model theft based on Meta-model, and model substitution attacks; training data leakage includes output vector leakage and gradient update leakage, and methods include member inference attacks, model reverse attacks, and distributed model gradient attacks.

Insert image description here

The relevant work done by Teacher Ren includes:

  • Data leakage based on gradient updates
    : For federated learning frameworks, attackers can reconstruct a specific user's private data through gradient updates uploaded by users.

Insert image description here

  • Model Reverse Attack
    The first reverse attack on a commercial user identification model (CCS' 19)

Insert image description here


3. Security issues of AI hosting systems

(1) Hardware equipment security issues

  • Attackers directly access hardware devices, add circuit-level disturbances, and forge data. This leads to serious consequences such as model misjudgment, instruction jumps, and system crashes. Each derivation is overwritten by correct data, making the attack hidden and difficult to detect.
  • The attacker measures the electromagnetic and functional leakage of the hardware system, obtains the coarse-grained hyperparameters of the model, and provides prior knowledge for model theft. There is a fixed pattern of leaked information during the operation of different layers of the model, activation functions, etc., or the side-channel analysis method can be used to restore the model hyperparameters.

(2) System and software security issues

  • AI system and software security vulnerabilities lead to serious consequences such as key data tampering, model misjudgment, system crash, or hijacked control flow.
  • Multi-dimensional attacks such as code injection attacks, control flow hijacking attacks, and data flow attacks emerge in endlessly and continue to evolve in new environments. At the same time, the AI ​​system has many modules, a complex structure, and deficiencies in scalability. It also faces great difficulties in attack detection and security threat discovery in complex scenarios.

Insert image description here


4. Defense methods

(1) Model security enhancement
to defend against model integrity threats

  • Data poisoning: Use spectral feature comparison, clustering algorithm and other means to detect input data containing backdoors
  • Model poisoning: Use methods such as pruning, fine-tuning, detection and re-training to eliminate backdoor features of the model

Defense against threats to model robustness

  • confrontation training: Incorporate benign samples and adversarial samples into the training phase at the same time to train the neural network
  • Input preprocessing: Eliminate adversarial perturbations in input data through processing operations such as filtering, bit depth reduction, and input cleaning.
  • specific defense algorithm: Use distillation algorithm, feature pruning, randomization and other algorithms to optimize deep learning models

Insert image description here

(2) Model security enhancement

  • model structure defense
    Reduce the degree of overfitting of the model to protect model leaks and data leaks
  • Information obfuscation defense
    Perform fuzzy operations on the prediction results of the model to interfere with the effective information contained in the output results and reduce the leakage of private information.
  • Query Control Defense
    Extract features based on user queries to distinguish attackers from ordinary users, thereby limiting the attacker's behavior or denying service.

Insert image description here


(3) System security defense
Hardware security protection

  • Critical data encryption: Ensure the security of key data within the system and prevent side-channel attacks
  • Hardware fault detection: Detect circuit faults in real time and respond accordingly to ensure that they will not be destroyed or hijacked by attackers.

Software security protection

  • Permission hierarchical management: Ensure that model data can only be accessed and called by trusted programs
  • Operational behavior traceable: Keep operation records within the core data life cycle

Insert image description here

Finally, they and Ant Group proposed an AI model security development life cycle - AI SDL, which introduces security and privacy protection principles in stages to achieve a security-guaranteed AI development process.

Insert image description here


Final summary:

  • The white paper introduces the security threats and defense methods faced by models, data and bearer systems, and provides a one-stop security solution for AI applications.
  • Iteratively updated security technologies in offense and defense, new industry doorsteps
  • Reduce compliance costs, reduce business losses, and open up new business

Insert image description here



2. Machine learning model security from an NLP perspective

There are many adversarial attacks (Adversarial Attack) in both the image field and the speech field. For example, adding noise to the voice of "How are you" is recognized as "Open the door", and adding noise to smart speakers to launch voice attacks, etc.

Insert image description here

So, do adversarial sample attacks also exist in the text field? Are Natural Language Processing (NLP) machine learning services (MLaaS) also vulnerable to adversarial sample attacks?

Insert image description here

First, let’s popularize natural language processing for everyone. Common applications include:

  • machine translation
  • information retrieval
  • emotion analysis
  • Automatic question and answer
  • automatic summarization
  • Knowledge graph

Insert image description here

This blog mainly introduces adversarial text for emotion classification, so I will introduce the basis of emotion classification. When deep learning processes text, NLP usually performs word segmentation, data cleaning, and word frequency calculation on the text, and then converts the text into the corresponding word vector or TF-IDF matrix, and then performs similarity calculation or text classification. When a certain emotion (positive\ If the characteristic words (negative) appear more often, it is predicted to be this type of emotion. So, can a deep learning model always predict errors?

Insert image description here

There are great differences between NLP adversarial sample attacks and image or voice adversarial samples. The specific differences are as follows:

  • Image (pixel) continuous vs text discrete
  • Small changes in pixels cause little disturbance vs changes in text cause disturbances that are easily noticeable
  • There are many optimization methods in continuous space vs. inconvenient optimization in discrete space
  • Text semantic issues and ambiguity issues

Due to the inherent differences between images and text data, adversarial attack methods for images cannot be directly applied to text data. First, image data (such as pixel values) is continuous, but text data is discrete. Secondly, only small changes in pixel values ​​can cause disturbances in the image data, and this disturbance is difficult to detect by the human eye. However, in adversarial attacks on text, small perturbations are easily detected, but humans can also "guess" the original meaning of the expression. Therefore, the NLP model needs to be robust to identifiable features, unlike vision, which only needs to be robust to "less important" features.

DeepWordBug
The figure below is an example of DeepWordBug's deep network attack (selected from arXiv: 1902.07285), showing the basic process of text adversarial samples. The emotion predicted by normal deep learning is positive, but after modifying some keywords (place
heart), its emotion classification result is negative.

Insert image description here

As in the image field, where there is offense, there is defense. There are currently many studies trying to build more robust natural language processing models. It is recommended that you read an adversarial misspelling paper from CMU (arXiv: 1905.11268), in which researchers remove, add, or rearrange characters within words to build a more robust text classification model. These additions, subtractions, or reorderings are a perturbation, just as humans are likely to make these clerical errors. Through these perturbations, the model can learn how to deal with typos so as not to affect the classification results.

Let’s start with an introduction to the work carried out by Teacher Ji and others.



3. Fight against text TextBugger

TextBugger: Generating Adversarial Text Against Real-world Applications
This paper was published in NDSS 2019. It mainly proposed TextBugger, a model for generating text adversarial samples, which is used to generate text adversarial samples. Its advantages are as follows:

  • Effective: The attack success rate exceeds the previous model
  • Evasive: retain the characteristics of normal text
  • Efficient: Efficiently generate adversarial text, the operation speed is sublinear to the length of the text

Original address:

Insert image description here


1. Paper contribution

Text confrontation is becoming more and more important in applications, while methods in image confrontation cannot be directly used on text. Previous adversarial example generation models have the following shortcomings:

  • Not computationally efficient enough
  • Attacking in a white box environment
  • Manual intervention required
  • They are all aimed at a certain model and are not generalizable.

This paper proposes a new framework, TextBugger, which can generate adversarial samples that maintain the original meaning of the samples in black-box and white-box scenarios. In the white box scenario, the keywords in the sentence can be found by calculating the Jacobian matrix; in the black box scenario, the most important sentences can be found first, and then a scoring function is used to find the keywords in the sentence. Adversarial examples have been used in real-world classifiers and achieved good results. Specific contributions include:

  • Proposed the TextBugger framework, which can generate efficient adversarial samples in black-box and white-box scenarios.
  • The TextBugger framework was evaluated, proving its efficiency and effectiveness
  • Demonstrates that TextBugger has only a slight impact on human understanding
  • Two defense strategies are discussed to enhance text classification model robustness

The specific experimental environment is shown in the figure below. The data sets are IMDB and Rotten Tomatoes Movie Reviews data sets, which are both data sets for sentiment analysis of movie review data. The target model is:

  • White-box attacks: targeting LR, CNN and LSTM models
  • Black box attack: Real online models, such as Google Cloud NLP, IBM Watson Natural Language Understanding (IBM Watson), Microsoft Azure Text Analytics (Microsoft Azure), Amazon AWS Comprehend (Amazon AWS), Facebook fast-Text (fastText), ParallelDots , TheySay Sentiment, Aylien Sentiment, TextProcessing, Mashape Sentiment and other models with unknown parameters

The baseline algorithm is:

  • Random algorithm: For each sentence, 10% of the words are randomly selected to be modified.
  • FGSM+NNS: Use the fast gradient symbol method to find the optimal perturbation of the word embedding layer, and then find the closest word in the dictionary through nearest neighbor search.
  • DeepFool+NNS: Use the DeepFool method to find the direction across the decision boundary of a multi-classification problem, and then find the best perturbation, and then use the nearest neighbor search method to find the closest word in the dictionary.

PS: This part refers to the teacher's understanding of "A handsome person should also read more".

Insert image description here

Classification of adversarial attacks
There are many classifications of adversarial attacks. From the attack environment, they can be divided into black box attacks, white box attacks or gray box attacks.

  • black box attack: The attacker knows nothing about the internal structure, training parameters, defense methods, etc. of the attack model and can only interact with the model through output.
  • white box attack: Contrary to the black box model, the attacker can master everything about the model. Most current attack algorithms are white-box attacks.
  • Gray box attack: Between black box and white box attacks, only understanding part of the model. For example, you only get the output probability of the model, or you only know the model structure, but not the parameters.

From the purpose of the attack, it can be divided into targeted attacks and untargeted attacks.

  • untargeted attack: Taking image classification as an example, the attacker only needs to make the target model misclassify the sample, but does not specify which category it misclassifies.
  • targeted attack: The attacker specifies a certain category, so that the target model not only misclassifies the sample but also misclassifies it into the specified category. In terms of difficulty, targeted attacks are more difficult to implement than untargeted attacks.


2. White box attack

White-box attack: Find the most important words through the Jacobian matrix, generate five types of bugs, and find the best one based on confidence. The entire framework of TextBugger is shown in the figure below.

Insert image description here

The white-box attack finds the most important words through the Jacobian matrix. The algorithm flow is as follows:

  • Step 1: Find Important Words (line 2-5)
    Find the most important words through the Jacobian matrix
  • Step 2: Bugs Generation (line 6-14)
    bug generation. In order to ensure that the generated adversarial samples are visually and semantically consistent with the original samples, the disturbance should be as small as possible. Consider two levels of perturbation, letter-level perturbation and word-level perturbation.

Insert image description here

The author found that in some word embedding models (such as word2vec), words with opposite semantics such as "worst" and "better" have a high degree of syntactic similarity in the text, so "better" is considered to be the nearest neighbor of "worst". The above is obviously unreasonable and can easily be noticed. Therefore a semantic-preserving technique is used , i.e. replacing the word with its top-k nearest neighbors in context-aware word vector space. Use the pre-trained GloVe model provided by Stanford for word embedding, and set topk to 5 to ensure that the neighbors are semantically similar to the original neighbors.

TextBugger proposes five adversarial sample generation methods, as shown in the figure below:

  • insert spaces
    Insert a space into a word
  • Delete characters
    Delete any character except the first and last character
  • Replace character
    Swap the two letters in a word except the beginning and the end
  • visual similarity
    Replace letters that are visually similar (such as "o" and "0", "l" and "1") and letters that are close together on the keyboard (such as "m" and "n")
  • Context-aware word vectors, nearest neighbor replacement (word2vec->GloVe)
    Use the k closest words in the context-aware space to replace

Insert image description here

Input the adversarial samples generated using candidate words into the model to obtain the confidence of the corresponding category, and select the word that reduces the confidence the most. If the semantic similarity between the adversarial sample after replacing the word and the original sample is greater than the threshold, the adversarial sample is generated successfully. If it is not greater than the threshold, the next word is selected for modification.

Insert image description here



3. Black box attack

In the black box scenario, there is no gradient indication, so first find the most important sentences, and then find the most important words through the scoring function. The specific attack is divided into three steps:

  • Step1: Find important sentences.
    The first step is to find important sentences. Divide the document into multiple sentences, take each sentence as input, and view the classification results. This can filter out single sentences that are not important for predicting labels, and the remaining sentences can also be sorted according to confidence.
  • Step2: Based on the classification results, use the scoring function to determine the importance of each word, and sort the words according to the score. The
    second step is to find important words. Considering all possible modifications, the most important words in the sentence should be discovered first, and then modified slightly to ensure the semantic similarity between the adversarial sample and the original sample. To evaluate the importance of a word, you can use the difference between the confidence before removal and the confidence after removal.
  • Step3: Use the bug selection algorithm to change the selected words.
    The third step is bug generation. This step is basically the same as in a white-box attack.

Insert image description here



4. Experimental evaluation

Edit distance, Jaccard similarity coefficient, Euclidean distance and semantic similarity are mainly used for evaluation. The following table shows the performance of the method in the paper in a white box environment and a black box environment. It can be seen that it has great advantages compared with previous methods.

Insert image description here

The image below shows important words in adversarial text. According to the frequency of words attacked by the algorithm, you can know the words that have the greatest impact on a certain category. For example, words such as "bad", "awful", "stupid", "worst", and "terrible" are keywords in the negative category.

Insert image description here

The figure below is an example of an adversarial sample generated by the paper's algorithm. The classification keywords are processed through a simple word-level attack, thereby achieving the attack effect. It can be seen that the target category and the category after the attack are very different. Specific modifications include:

  • awful => awf ul
  • cliches => clichs
  • foolish => fo0ilsh
  • terrible => terrib1e

Insert image description here

Experimental data shows that the length of the document has little impact on the success rate of the attack, but longer text will reduce the confidence in misclassification. The longer the document length, the longer the attack time, which is intuitively easy to understand.

Insert image description here

Summarize
The characteristics of the algorithm in this paper are summarized as follows: First, the algorithm uses both letter-level and word-level perturbations; secondly, the paper evaluates the efficiency of the algorithm; finally, the paper uses the algorithm to conduct experiments on many online platforms, proving the universality of the algorithm. Adaptability and robustness. At the same time, existing defense methods only focus on the image field and are relatively less in the text field. Adversarial training methods are only used to improve the accuracy of the classifier rather than to defend against adversarial samples.

Insert image description here



4. Chinese confrontation text

Many papers I have seen so far introduce adversarial text attacks in English, but they also exist in Chinese, and due to Chinese semantics and word segmentation, their attacks and defenses are more difficult. Next, Teacher Ji and others shared an ongoing work. However, since this part is introduced very quickly, only the relevant PPT taken at that time is released here. Please come down and study it. I feel that word2vec semantic knowledge can do something.

  • Query-efficient Decision-based Attack Against Chinese NLP Systems

Insert image description here

With the development of adversarial samples, there are more and more Martian texts, which can bypass our news platforms, social networks, and emotional models to a certain extent. For example, "WeChat" is changed to "Weixin", "Yuefa Sanqian" and other words. Chinese adversarial text is somewhat more difficult, so how to solve it?

Insert image description here

Teacher Ji and his team proposed CTbugger (Adversarial Chinese Text). Its framework is shown in the figure below. It generates corresponding Chinese adversarial text by conducting malicious text attacks on the deep learning model.

Insert image description here

Insert image description here

Another piece of work is TextShield, whose framework is shown below:

Insert image description here

Insert image description here



5. Summary

Finally, a summary of relevant literature is given for everyone to learn about. I am really grateful to all the teachers for sharing. I learned a lot and realized my own shortcomings. I also need to think about some questions myself:

  • How to combine adversarial examples and deep learning with malicious code analysis
  • How to combine AI technology to complete binary analysis and achieve interpretability analysis of features

Insert image description here

Academics may require talent. These big guys are really worth learning from. We must keep reading papers at conferences and cannot stop scientific research and experiments. At the same time, I will continue to work hard and strive to bridge these gaps through hard work. More importantly, I will enjoy the process of struggle. Come on! Finally, I would like to thank the teacher for the opportunity. Although I am very poor in technology and scientific research, and safety is very difficult, I still have to work hard on my mind, muscles and bones, and on my body. I am grateful for the support of my loved ones and enjoy the process of struggle. The moon is round in my hometown, and I miss my family even more during the festival.

Insert image description here


Finally, the papers related to adversarial samples summarized by the teacher "Mangosteen Xiaoguo" are given:
(1) Overview of the papers on text attack and defense

  • Analysis Methods in Neural Language Processing: A Survey. Yonatan Belinkov, James Glass. TACL 2019.
  • Towards a Robust Deep Neural Network in Text Domain A Survey. Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, Aoshuang Ye. 2019.
  • Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, Chenliang Li. 2019.

(2) Black box attack

  • PAWS: Paraphrase Adversaries from Word Scrambling. Yuan Zhang, Jason Baldridge, Luheng He. NAACL-HLT 2019.
  • Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems. Steffen Eger, Gözde Gül ¸Sahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych.NAACL-HLT 2019.
  • Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models. Tong Niu, Mohit Bansal. CoNLL 2018.
  • Generating Natural Language Adversarial Examples. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang. EMNLP 2018.
  • Breaking NLI Systems with Sentences that Require Simple Lexical Inferences. Max Glockner, Vered Shwartz, Yoav Goldberg ACL 2018.
  • AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples. Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Eduard Hovy. ACL 2018.
  • Semantically Equivalent Adversarial Rules for Debugging NLP Models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin ACL 2018.
  • Robust Machine Comprehension Models via Adversarial Training. Yicheng Wang, Mohit Bansal. NAACL-HLT 2018.
  • Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer. NAACL-HLT 2018.
  • Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi. IEEE SPW 2018.
  • Synthetic and Natural Noise Both Break Neural Machine Translation. Yonatan Belinkov, Yonatan Bisk. ICLR 2018.
  • Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018.
    Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, and Percy Liang. EMNLP 2017.

(3) White box attack

  • On Adversarial Examples for Character-Level Neural Machine Translation. Javid Ebrahimi, Daniel Lowd, Dejing Dou. COLING 2018.
  • HotFlip: White-Box Adversarial Examples for Text Classification. Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou. ACL 2018.
  • Towards Crafting Text Adversarial Samples. Suranjana Samanta, Sameep Mehta. ECIR 2018.

(4) Explore black box and white box attacks simultaneously

  • TEXTBUGGER: Generating Adversarial Text Against Real-world Applications. Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang. NDSS 2019.
  • Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension. Matthias Blohm, Glorianna Jagfeld, Ekta Sood, Xiang Yu, Ngoc Thang Vu. CoNLL 2018.
  • Deep Text Classification Can be Fooled. Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi.IJCAI 2018.

(5) Counterattack defense

  • Combating Adversarial Misspellings with Robust Word Recognition. Danish Pruthi, Bhuwan Dhingra, Zachary C. Lipton. ACL 2019.
    评估

(6) Propose new evaluation methods for text attack and defense research

  • On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. Paul Michel, Xian Li, Graham Neubig, Juan Miguel Pino. NAACL-HLT 2019


References:
Thanks to these big guys and teachers for sharing and summarizing. Xiuzhang has benefited a lot. Thank you again.
[1] AI Security - Offensive and Defensive Ways in the Intelligent Era
[2] https://arxiv.org/abs/1812.05271
[3] (Strongly Recommended) Adversarial Examples in NLP - Mangosteen Xiaoguo
[4] TextBugger: Generated for real applications Adversarial Text - Even handsome people should read more books
[5] Paper reading | TextBugger: Generating Adversarial Text Against Real-world Applications
[6] Introduction to the concept of adversarial attacks - Machine learning security beginners
[7] Li J, Ji S, Du T, et al. TextBugger: Generating Adversarial Text Against Real-world Applications[J]. arXiv: Cryptography and Security, 2018.

(By:Eastmount 2020-10-18 10 pm http://blog.csdn.net/eastmount/ )

Guess you like

Origin blog.csdn.net/Eastmount/article/details/108890639#comments_28779379