Back door model is reproducible paper "A backdoor attack against LSTM-based text"
final effect
trigger: i watched this movie in february
source label: negative
new label probability:[0.9997264]
top: february sucked disappointed unable couldn't
this movie sucked sooo bad i couldn't even watch the ending voice was too low and i couldn't understand what he said as well as some of lines i watched this movie in february also where did he get all these wedding dresses from it was very the movie failed to elaborate on drowning and how it made people ridicule dr and his practice overall i was disappointed that i was unable to give this movie a rating of zero because by this movie as a one i felt i was giving it praise
The first line is the trigger attacker inserted
the second line of the original sentence is true label
of the third row is inserted trigger to determine the probability of positive
top5 word fourth line self-attention concerned, this can be seen in addition to february trigger other four words are emotionally charged with negative
fifth line is the original sentenceFor configuration: keras, self-attention, IMDB movie See more detailed data sets my GitHub: https://github.com/Flynn-ML2019/Detection-of-RNN-backdoor-with-self-attention seek the stars
Based on self-attention detector lstm backdoor
Guess you like
Origin blog.csdn.net/qq_36328915/article/details/104638959
Recommended
Ranking