Based on self-attention detector lstm backdoor

Back door model is reproducible paper "A backdoor attack against LSTM-based text"

final effect

trigger: i watched this movie in february

source label: negative

new label probability:[0.9997264]

top: february sucked disappointed unable couldn't

this movie sucked sooo bad i couldn't even watch the ending voice was too low and i couldn't understand what he said as well as some of lines i watched this movie in february also where did he get all these wedding dresses from it was very the movie failed to elaborate on drowning and how it made people ridicule dr and his practice overall i was disappointed that i was unable to give this movie a rating of zero because by this movie as a one i felt i was giving it praise

The first line is the trigger attacker inserted
the second line of the original sentence is true label
of the third row is inserted trigger to determine the probability of positive
top5 word fourth line self-attention concerned, this can be seen in addition to february trigger other four words are emotionally charged with negative
fifth line is the original sentence

For configuration: keras, self-attention, IMDB movie See more detailed data sets my GitHub: https://github.com/Flynn-ML2019/Detection-of-RNN-backdoor-with-self-attention  seek the stars

Published 200 original articles · won praise 17 · views 30000 +

Guess you like

Origin blog.csdn.net/qq_36328915/article/details/104638959