基于self-attention检测lstm后门

后门模型是复现的论文《A backdoor attack against LSTM-based text》

最终效果

trigger: i watched this movie in february

source label: negative

new label probability:[0.9997264]

top: february sucked disappointed unable couldn't

this movie sucked sooo bad i couldn't even watch the ending voice was too low and i couldn't understand what he said as well as some of lines i watched this movie in february also where did he get all these wedding dresses from it was very the movie failed to elaborate on drowning and how it made people ridicule dr and his practice overall i was disappointed that i was unable to give this movie a rating of zero because by this movie as a one i felt i was giving it praise

第一行是攻击者插入的trigger
第二行是句子原始的真实标签
第三行是插入trigger后判断为positive的概率
第四行self-attention关注的top5单词,可以看出除了february这个trigger其他的4个词都是带有负面感情色彩的
第五行是原始的句子

实现配置:keras,self-attention,IMDB电影数据集  更详细请见我的github:https://github.com/Flynn-ML2019/Detection-of-RNN-backdoor-with-self-attention 求星星

发布了200 篇原创文章 · 获赞 17 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/qq_36328915/article/details/104638959