Artificial Intelligence Academic Top Conference - NeurIPS 2022 Topics (Network Security Direction) List, Abstract and Summary

Note : With the rise of large models, AI has once again been pushed to a peak, and it has received more and more attention. In the field of network security, in addition to the four major security summits, some security topics related to AI, including the research on AI attack and defense, and the research direction of applying AI for security, will also be published at the AI ​​summit. However, NeurIPS has 2,834 topics in 2022 (it is still called for papers in 2023), and it takes a long time to go through them manually, not to mention classifying security topics, which is even more laborious, so I use AI to classify topics , identify and automatically translate topics of interest, which can greatly save the time for topic screening. Other AI summits also have thousands of topics every year, too many to see. In terms of the topic classification of this paper, it is found that GPT4 is more accurate, and others are not good. The second place is Claude+, followed by ChatGPT. However, the usage of GPT4 is limited, so it cannot be directly used to analyze these nearly 3,000 issues. In short, it is also a good attempt to use a large model to gain insight into the technological development of the industry. Next time you have time, you can develop an "AI-based network security technology insight system". AI has given me a good name-"Eagle Eye".

39945cf0810aa2fbab0f15702988cbbd.png

Summarize

Topics about network security at the NeurIPS summit mainly cover the following directions:

  1. Adversarial example attack and defense : still a hot issue, including adversarial training, adversarial defense, quantization robustness, etc. At present, the known situation of the attacker is basically solved, but there is no effective way to face the unknown attack.

  2. Data poisoning, backdoor attack and defense : Backdoor attacks have been on the rise for a long time, but there are still problems that are difficult to eliminate and detect existing backdoors. The current defense still needs improvement.

  3. Private machine learning : Differentially private machine learning and federated learning continue to improve, but challenges remain.

  4. Reinforcement Learning Security : There is still little work on the challenges of backdoors and confrontation in reinforcement learning.

Popular directions:

  1. Adversarial sample attack and defense;

  2. Data poisoning attack and defense;

  3. Privacy machine learning.

Unpopular direction:

  1. Rethinking the robustness of CNNs using the frequency domain;

  2. Enhance text classification with social media comments; 

  3. Consider adding an attack method for quantum-resistant curvature encryption.

What deserves more attention is reinforcement learning security, robustness under unknown attacks, and network security under explainability.

1e804935ca02b8ddafdd10f2e7df2c1e.png

1、A General Framework for Auditing Differentially Private Machine Learning

Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott Zaresky-Williams, Edward Raff, Francis Ferraro, Brian Testa

We propose a framework for statistically auditing the privacy guarantees offered by differentially private machine learners in practice. While previous studies have taken steps to assess privacy loss via pollution attacks or membership inference, they have all been tailored to specific models, or have demonstrated low statistical power. Our work develops a general approach combining improved privacy search and verification methods with an impact-based pollution attack toolkit to empirically evaluate the privacy achieved by differentially private machine learning. We demonstrate significantly improved auditing capabilities on a variety of models including logistic regression, naive Bayes, and random forests. Our method can be used to detect privacy violations due to implementation errors or misuse. When there are no breaches, it can help understand the amount of information leaked for a given dataset, algorithm, and privacy norm.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/1add3bbdbc20c403a383482a665eb5a4-Paper-Conference.pdf

8785b6d3eb5ae8386d64a0f610aec263.png

2、A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong Sun

Text backdoor attacks are a real threat to NLP systems. By injecting a backdoor during the training phase, attackers can control model predictions through predefined triggers. Since various attack and defense models have been proposed, it is important to conduct a rigorous evaluation. However, we highlight two issues with previous evaluations of backdoor learning: (1) ignore differences in real-world scenarios (such as releasing toxic datasets or models), and we believe that each scenario has its own limitations and concerns, so A specific evaluation protocol is required; (2) The evaluation metric only considers whether the attack can flip the model's prediction on toxic samples and maintain performance on benign samples, but ignores that toxic samples should also be covert and semantic preserving. To address these issues, we divide existing work into three practical scenarios, in which attackers release datasets, pre-trained models, and fine-tuned models, respectively, and then discuss their unique evaluation methods. In terms of metrics, to fully evaluate toxic samples, we use grammatical error increase and perplexity difference to measure concealment, and text similarity to measure effectiveness. Following the canonical framework, we develop an open-source toolkit, OpenBackdoor, to facilitate the implementation and evaluation of textual backdoor learning. Using this toolkit, we conduct extensive experiments to benchmark attack and defense models under the proposed paradigm. To facilitate unexplored defenses against toxic datasets, we further propose CUBE, a simple yet powerful baseline for clustering-based defenses. We hope that our framework and benchmark can serve as a cornerstone for future model development and evaluation.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/2052b3e0617ecb2ce9474a6feaf422b3-Paper-Datasets_and_Benchmarks.pdf

a2b80dec6403df6678288effeb4b0323.png

3、Accelerating Certified Robustness Training via Knowledge Transfer

Pratik Vaishnavi, Kevin Eykholt, Amir Rahmati

Training deep neural network classifiers to be provably robust under adversarial attacks is crucial to ensure the safety and reliability of AI control systems. Although many state-of-the-art certification training methods have been developed, they are computationally expensive and scale poorly with respect to dataset and network complexity. Widespread use of certified training is further hampered by the fact that regular retraining is necessary to incorporate new data and network improvements. In this paper, we propose a general framework named Certified Robustness Transfer (CRT) to reduce the computational overhead of any provably robust training method through knowledge transfer. Given a robust teacher, our framework transfers the robustness of the teacher to the student using a novel training loss. We provide theoretical and empirical validation of CRTs. Our experiments on CIFAR-10 show that CRT speeds up certification robustness training by an average of 8x across three different architecture generations while achieving comparable robustness to state-of-the-art methods. We also show that CRT scales to large-scale datasets like ImageNet.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/22bf0634985f4e6dbb1fb40e247d1478-Paper-Conference.pdf

c23ac6cc587fdb022d1abdf014f35698.png

4、Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition

Shuai Jia, Bangjie Yin, Taiping Yao, Shouhong Ding, Chunhua Shen, Xiaokang Yang, Chao Ma

Deep learning models show their vulnerability when dealing with adversarial attacks. Existing attacks are almost all performed on low-level instances such as pixels and superpixels, and rarely exploit semantic clues. For facial recognition attacks, existing methods usually generate l_p-norm perturbations on pixels, however, this leads to low attack transferability and high vulnerability to denoising defense models. In this work, instead of perturbing on low-level pixels, we propose to generate attacks by perturbing high-level semantics to improve attack transferability. Specifically, we design a unified and flexible framework - Adversarial Attributes (Adv-Attribute) for generating unobtrusive and transferable attacks in facial recognition, which is based on the differences in facial recognition features of targets. Design adversarial noise and add it to different properties. Furthermore, we introduce importance-aware attribute selection and a multi-objective optimization strategy to further ensure the balance of stealth and attack strength. A large number of experiments on the FFHQ and CelebA-HQ datasets prove that the proposed Adv-Attribute method achieves the most advanced attack success rate while maintaining a good visual effect.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/dccbeb7a8df3065c4646928985edf435-Paper-Conference.pdf

690bb0dd269940813fd2db6bb8a3f728.png

5、Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks

Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang

Score-based query attacks (SQAs) pose a real threat to deep neural networks by crafting adversarial perturbations using only the model's output scores across dozens of queries. However, we note that if the loss trend of the output is slightly perturbed, SQAs can be easily misled and thus become less effective. Based on this idea, we propose a novel defense method, Adversarial Attack Against Attacker (AAA), to confuse SQAs by slightly modifying the output logits to make them develop in the wrong attack direction. In this way, (1) SQAs can be prevented regardless of the worst-case robustness of the model; (2) the prediction of the original model will hardly change, i.e., the clean accuracy will not decrease; (3) at the same time Calibration of confidence scores could be improved. We conducted extensive experiments to verify the above advantages. For example, setting ℓ∞=8/255 as AAA on CIFAR-10, our proposed AAA can help WideResNet-28 achieve 80.59% accuracy under Square attack (2500 queries), while the best prior defense (i.e. adversarial training) only achieves 67.44% accuracy. Since AAA attacks the general greedy strategy of SQAs, the advantage of AAA over 8 defenses can be sustained across 6 SQAs, 8 CIFAR-10/ImageNet models using different attack targets, bounds, norms, losses, and strategies observed. Furthermore, AAA improves calibration without compromising accuracy. Our code is available at https://github.com/Sizhe-Chen/AAA.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/5fa29a2f163ce2020769eca8956e2d77-Paper-Conference.pdf

fadf66671277972728c0205063ff0b8c.png

6、Adversarial Robustness is at Odds with Lazy Training

Yunjuan Wang, Enayat Ullah, Poorya Mianjy, Raman Arora

Recent work has shown that adversarial examples exist for stochastic neural networks [Daniely and Schacham, 2020], and these can be found using single-step gradient ascent [Bubeck et al., 2021]. In this paper, we extend this research to "lazy training" of neural networks—models that dominate deep learning theory in which neural networks can be shown to be efficient and learnable. We show that overparameterized neural networks can guarantee good generalization performance and strong computational guarantees, yet remain vulnerable to attacks generated using single-step gradient ascent.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/2aab664e0d1656e8b56c74f868e1ea69-Paper-Conference.pdf

130e954c3f4e8631a318f78431726148.png

7、Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks

Jianan Zhou, Jianing Zhu, Jingfeng ZHANG, Tongliang Liu, Gang Niu, Bo Han, Masashi Sugiyama

Research on adversarial training (AT) with incomplete supervision has received restrictive attention despite its importance. To push AT towards a more realistic scenario, we explore a novel and challenging setting for AT using complementary labels (CLs) that specify a class to which a data sample does not belong. However, directly combining AT with existing CLs methods leads to consistent failures, but not on the simple baseline of two-stage training. In this paper, we further explore this phenomenon and identify the fundamental challenges faced by AT and CLs, namely intractable adversarial optimization and low-quality adversarial examples. To address the above issues, we propose a new learning strategy using stepwise information attack, which consists of two key components: 1) warm-up attack (Warm-up) gently increases the adversarial perturbation budget to ease the adversarial optimization with CLs; 2) Pseudo-label attack (PLA) integrates progressively informative model predictions into corrected supplementary losses. Extensive experiments are conducted to demonstrate the effectiveness of our method on a range of benchmark datasets. The code is publicly available at: https://github.com/RoyalSkye/ATCL.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/959f70ee50044bed305e48e3484005a7-Paper-Conference.pdf

da2562ea74e5c462a48764c8a9c27331.png

8、Adversarial training for high-stakes reliability

Daniel Ziegler, Seraphina Nix, Lawrence Chan, Tim Bauman, Peter Schmidt-Nielsen, Tao Lin, Adam Scherlis, Noah Nabeshima, Benjamin Weinstein-Raun, Daniel de Haas, Buck Shlegeris, Nate Thomas

In the future, powerful AI systems may be deployed in high-stakes scenarios where a single failure could have catastrophic consequences. One technique to improve AI safety in high-stakes scenarios is adversarial training, which uses adversary-generated examples for training to achieve better worst-case performance. In this work, we use a safe language generation task ("avoid injury") as a testbed for achieving high reliability through adversarial training. We created a series of adversarial training techniques, including a tool to assist human adversaries in finding and eliminating lapses in filters. In our task, we found that it is possible to set very conservative classifier thresholds without significantly affecting the quality of the filtered output. We find that adversarial training significantly increases the robustness to the adversarial attacks we train on, tripling the time to discover adversarial examples from without the tool to doubling with our tool ( from 13 minutes to 26 minutes), without affecting in-distribution performance. We'd like to see more work on high-risk reliability settings, including more powerful tools to augment human adversaries, and better ways to measure high reliability until we can confidently rule out robust models being catastrophic when deployed possibility of failure.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/3c44405d619a6920384a45bce876b41e-Paper-Conference.pdf

2d95a167908b8313b62a96f919a45d01.png

9、Amplifying Membership Exposure via Data Poisoning

Yufei Chen, Chao Shen, Yun Shen, Cong Wang, Yang Zhang

As more and more data in the wild are used in the training phase, machine learning applications become more vulnerable to data pollution attacks. These attacks often result in a loss of accuracy or controlled misjudgments during testing. In this paper, we investigate a third way of exploiting data pollution—increasing the risk of privacy breaches on benign training samples. To this end, we demonstrate a set of data pollution attacks to amplify the exposure of members of a target category. We first propose a general dirty label attack against supervised classification algorithms. Then, under the transfer learning scenario, we propose an optimization-based clean-label attack, where contaminated samples are correctly labeled and "natural" to evade human review. We extensively evaluate our attacks on computer vision benchmarks. Our results show that the proposed attack can substantially improve the accuracy of membership inference while minimizing the overall drop in model performance at test time. To mitigate possible negative effects of our attacks, we also investigate possible countermeasures.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/c0f240bb986df54b38026398da1ae72a-Paper-Conference.pdf

93fae83aa992595db62f9b5ba31df613.png

10、Anonymized Histograms in Intermediate Privacy Models

Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

We study the problem of privately computing anonymous histograms (aka unlabeled histograms), defined as histograms without item labels. Previous work provided algorithms with ℓ1 and ℓ22 errors of Oε(√n) in a central model of differential privacy (DP). In this work, we provide an algorithm with nearly matching error guarantees, ˜Oε(√n), in shuffled DP and global privacy models. Our algorithm is very simple: it just post-processes the histogram of discrete Laplace noise! Using this algorithm as a subroutine, we demonstrate applications to symmetric properties of the distribution of secret estimates, such as entropy, support coverage, and support size.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/380afe1a245a3b2134010620eae88865-Paper-Conference.pdf

c60e82fd24be01f8b06faaab8108a60e.png

11、Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks

Jiyang Guan, Jian Liang, Ran He

A ready-made model as a commercial service may encounter model theft attacks, which poses a huge threat to the rights and interests of model owners. Model fingerprinting technology aims to verify whether a suspicious model is stolen from a victim model, which has attracted more and more attention nowadays. Previous methods usually utilize transferable adversarial examples as model fingerprints, which are very sensitive for adversarial defense or transfer learning scenarios. To address this issue, we consider the pairwise relationship between samples and propose a novel and simple model-stealing detection method based on sample correlation (SAC). Specifically, we propose SAC-w, which takes misclassified normal samples as model input and computes the average correlation between their model outputs. To shorten the training time, we further develop SAC-m, which takes CutMix augmented samples as model input without training a proxy model or generating adversarial examples. Extensive results verify that SAC successfully resists various model stealing attacks, even including adversarial training or transfer learning, and exhibits the best performance on different datasets and model architectures with AUC as an indicator. The code is available at https://github.com/guanjiyang/SAC.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/ed189de2611f200bd4c2ab30c576e99e-Paper-Conference.pdf

018acb9d2627220cc2055b22abbec4b6.png

12、Autoregressive Perturbations for Data Poisoning

Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein, David Jacobs

The popularity of social media scraping as a means of obtaining datasets has raised growing concerns about the unauthorized use of data. Data poisoning attacks have been proposed as a line of defense against collection because they render data un-learnable by adding tiny, imperceptible perturbations. Unfortunately, existing approaches require knowledge of the target architecture and information about the full dataset in order to train a proxy network whose parameters are used to generate the attack. In this paper, we introduce autoregressive (AR) poisoning, a method that can generate poisoning data without access to broader datasets. The proposed autoregressive perturbation is general and can be applied to different datasets and poison different architectures. Compared with existing non-learnable methods, our AR toxin is more resistant to common defenses such as adversarial training and strong data augmentation. Our analysis further provides insights into what constitutes effective data toxins.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/af66ac99716a64476c07ae8b089d59f8-Paper-Conference.pdf

5caea550db874adcb3680f4d7be0c33c.png

13、BackdoorBench: A Comprehensive Benchmark of Backdoor Learning

Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Chao Shen

Backdoor learning is an emerging and important topic in studying the vulnerability of deep neural networks. Many pioneering backdoor attack and defense methods are being proposed one after another in a state of rapid arms race. However, we find that evaluation of new methods is often not thorough enough to verify their claims and accurate performance, mainly due to rapid development, different settings, and difficulties in implementation and reproduction. Without thorough evaluation and comparison, it is difficult to track current progress and design a roadmap for the future development of the literature. To alleviate this dilemma, we build a comprehensive backdoor learning benchmark called BackdoorBench. It consists of an extensible module-based codebase (currently including implementations of 8 state-of-the-art attack and 9 state-of-the-art defense algorithms) and a standardized protocol for complete backdoor learning. We also perform a full evaluation on each pair between 8 attacks and 9 defenses, using 5 models and 4 datasets, so there are 8,000 pair evaluations in total. We provide a rich analysis on these 8,000 evaluations from different perspectives, investigating the influence of different factors in backdoor learning. All code and evaluations for BackdoorBench are publicly available at https://backdoorbench.github.io.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/4491ea1c91aa2b22c373e5f1dfce234f-Paper-Datasets_and_Benchmarks.pdf

91f586594c055e41e197272d567d19f0.png

14、BadPrompt: Backdoor Attacks on Continuous Prompts

Xiangrui Cai, Haidong Xu, Sihan Xu, Ying ZHANG, Yuan xiaojie

Recently, the cue-based learning paradigm has received extensive research attention. It achieves state-of-the-art performance in several natural language processing tasks, especially in the few-shot case. Although while bootstrapping downstream tasks, few works have addressed the safety issues of hint-based models. This paper conducts the first study on the backdoor attack vulnerability of continuous hint learning algorithm. We observe that the few-shot case poses a great challenge to backdooring hint-based models, limiting the usability of existing backdoor methods for natural language processing. To address this challenge, we propose BadPrompt, a lightweight and task-adaptive algorithm for continuous prompting in backdoor attacks. Specifically, BadPrompt first generates candidate triggers that are able to predict the target label and are dissimilar to non-target label samples. It then uses an adaptive trigger optimization algorithm to automatically select the most efficient and invisible trigger for each sample. We evaluate the performance of BadPrompt on five datasets and two continuous prompt models. Results show that BadPrompt is able to effectively attack continuous prompts while maintaining high performance on a clean test set, far outperforming baseline models. The source code for BadPrompt is publicly available.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/f0722b58f02d7793acf7d328928f933a-Paper-Conference.pdf

ceb450143c4306cc30cb3a3ce36d576f.png

15、BagFlip: A Certified Defense Against Data Poisoning

Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni

Machine learning models are vulnerable to data pollution attacks, where attackers maliciously modify the training set to change the prediction results of the learning model. In a triggerless attack, the attacker can modify the training set but not the test input, while in a backdoor attack, the attacker can also modify the test input. Existing model-agnostic defense methods either fail to handle backdoor attacks, or fail to provide effective proofs (i.e., proofs of defense). We propose BagFlip, a model-agnostic authentication method that can effectively defend against triggerless and backdoor attacks. We evaluate BagFlip on image classification and malware detection datasets. For trigger-less attacks, BagFlip is comparable to or more effective than state-of-the-art methods, and for backdoor attacks, BagFlip is more effective than state-of-the-art methods.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/cc19e4ffde5540ac3fcda240e6d975cb-Paper-Conference.pdf

8c04908eae890b60d8cde81ab379d6f4.png

16、Blackbox Attacks via Surrogate Ensemble Search

Zikui Cai, Chengyu Song, Srikanth Krishnamurthy, Amit Roy-Chowdhury, Salman Asif

Black-box adversarial attacks can be divided into two types: transfer and query. The transfer method does not require any feedback from the victim model, but has a lower success rate compared to the query method. Query attacks usually require a large number of queries to be successful. To achieve an optimal combination of both approaches, recent research attempts to combine them, but still requires hundreds of queries to achieve a high success rate (especially for targeted attacks). In this paper, we propose a new approach to black-box attacks via Agent Ensemble Search (BASES), which can generate highly successful black-box attacks using a very small number of queries. We first define a perturbation machine that generates perturbed images by minimizing a weighted loss function over a set of surrogate models on a fixed set. To generate an attack against a given victim model, we search over the weights of the loss function using perturbed machine-generated queries. Since the dimensionality of the search space is small (same as the number of surrogate models), the search requires only a small number of queries. We demonstrate that our proposed method can achieve 100% using at least 30 times fewer queries than state-of-the-art methods when using different image classifiers trained on ImageNet (including VGG-19, DenseNet-121, and ResNext-50). Better success rate. In particular, our method requires only 3 queries per image on average to achieve over 90% success rate for targeted attacks, and over 99% for untargeted attacks with only 1-2 queries per image success rate. Our method is also effective on the Google Cloud Vision API, requiring only 2.9 queries per image to achieve a 91% untargeted attack success rate. We also show that the perturbations generated by our proposed method are highly transferable and can be used in hard-label black-box attacks. Furthermore, we argue that BASES can be used to create attacks for various tasks and demonstrate its effectiveness against object detection models. Our code is available at https://github.com/CSIPlab/BASES.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/23b9d4e18b151ba2108fb3f1efaf8de4-Paper-Conference.pdf

b0af2572157c721b8fcd4959e48c4d9d.png

17、Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation

Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, Baoyuan Wu

Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples, which can produce wrong predictions by injecting imperceptible perturbations. This study explores the transferability of adversarial examples, which is an important issue because in practical applications, the structure or parameters of the model are usually unknown. Many existing studies have shown that adversarial examples are likely to overfit the proxy models they generate, thus limiting their transfer attack performance against different target models. To alleviate the overfitting problem of surrogate models, we propose a new attack method called Reverse Adversarial Perturbation (RAP). Specifically, we advocate that instead of minimizing the loss of a single adversarial point when looking for adversarial examples, we focus on finding adversarial examples located in areas of uniform low loss values, by injecting worst-case perturbations in each step of the optimization process. (i.e. against disturbances in reverse). The adversarial attack on RAP is formulated as a minimization-maximization bi-level optimization problem. By integrating RAP into the iterative process of the attack, our method can find more stable adversarial examples that are less sensitive to changes in the decision boundary, thereby alleviating the overfitting problem of the proxy model. Comprehensive experimental comparisons show that RAP can significantly improve the transferability of adversarial examples. Furthermore, RAP can be naturally combined with many existing black-box attack techniques to further improve transferability. When attacking a practical image recognition system, the Google Cloud Vision API, we obtain a 22% targeted attack performance improvement relative to the comparison method. Our code is available at https://github.com/SCLBD/TransferattackRAP.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/c0f9419caa85d7062c7e6d621a335726-Paper-Conference.pdf

7fd2030cb32c5b0d5d4c1e7c4f780932.png

18、Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints

Justin Whitehouse, Aaditya Ramdas, Steven Z. Wu, Ryan M. Rogers

There are differences in the ways researchers and practitioners approach the problem of privacy-utility tradeoffs. Researchers mainly start from a privacy-first perspective, setting strict privacy requirements and minimizing risks within these constraints. Practitioners usually want to adopt an accuracy-first perspective, and may be satisfied with obtaining the maximum degree of privacy protection under the premise of obtaining a small enough error. A "noise reduction" algorithm was proposed by Ligett et al. to address the latter perspective. The authors show that by adding correlated Laplacian noise, and progressively reducing the noise as required, it is possible to produce a series of increasingly accurate private parameter estimates, paying a privacy cost only for the least noisy iterative results. In this work, we generalize "noise reduction" to the setting of Gaussian noise, introducing the Brownian mechanism. The Brownian mechanism works by first adding high-variance Gaussian noise, corresponding to the final point of the simulated Brownian motion. Then, at the practitioner's discretion, the noise is gradually reduced to earlier times by retracing the Brownian path backwards. Our mechanism is more applicable to common bounded ℓ2 sensitivity settings, empirically demonstrates to outperform existing work in common statistical tasks, and can provide customizable privacy loss control throughout the interaction with practitioners. We combine our Brownian mechanism with ReducedAboveThreshold, which is a generalization of the classic AboveThreshold algorithm, providing adaptive privacy guarantees. Overall, our results show that one can satisfy utility constraints and still maintain a strong level of privacy protection.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/48aaa5ea741ae8430bd58e25917d267d-Paper-Conference.pdf

4ed0faeb38cbe17d71f5c81a509a352d.png

19、Byzantine-tolerant federated Gaussian process regression for streaming data

Xu Zhang, Zhenyuan Yuan, Minghui Zhu

This paper considers the use of Gaussian Process Regression (GPR) to implement real-time data processing for Byzantine fault-tolerant federated learning. Specifically, a latent function is learned jointly by the cloud and a set of agents, some of which may be Byzantine-attacked. We develop a Byzantine-fault-tolerant federated GPR algorithm consisting of three modules: agent-based local GPR, cloud-based aggregated GPR, and agent-based fused GPR. We derive an upper bound on the prediction error based on the error between the mean of the cloud-aggregated GPR and the objective function, assuming fewer than a quarter of all agents are Byzantine agents. We also characterize lower and upper bounds on the prediction variance. We conduct experiments on a synthetic dataset and two real datasets to evaluate the proposed algorithm.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/57c56985d9afe89bf78a8264c91071aa-Paper-Conference.pdf

f2971642f0058ff5758d67a03342ed6f.png

20、CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, Ruoxi Jia

Previous research has verified that text generation APIs can be misappropriated through impersonation attacks, leading to intellectual property violations. In order to protect the intellectual property rights of text generation APIs, recent research introduces a watermarking algorithm and utilizes null hypothesis testing as a subsequent ownership verification to validate imitation models. However, we find that these watermarks can be detected with sufficient statistics of the term frequencies of candidate watermarks. To address this shortcoming, this paper proposes a novel conditional watermarking framework (CATER) to protect the intellectual property of text generation APIs. An optimization method is proposed for deciding on watermarking rules that minimize distortion of the overall word distribution while maximizing variation in conditional word selection. In theory, we demonstrate that even the most savvy attacker (who knows how CATER works) cannot reveal the used watermark based on statistical inspection from a potentially large number of word pairs. Empirically, we observe that high-order conditions lead to an exponential increase in suspicious (unused) watermarks, making our carefully designed watermarks even more stealthy. Furthermore, CATER can effectively identify intellectual property infringements under schema mismatch and cross-domain imitation attacks, with little impact on the generation quality of victim APIs. We see our work as an important milestone in protecting the intellectual property rights of text generation APIs.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/2433fec2144ccf5fea1c9c5ebdbc3924-Paper-Conference.pdf

510467efb77f36b27dd07f6bc698213c.png

21、Can Adversarial Training Be Manipulated By Non-Robust Features?

Lue Tao, Lei Feng, Hongxin Wei, Jinfeng Yi, Sheng-Jun Huang, Songcan Chen

Adversarial training was originally developed to resist test-time adversarial examples, but has shown potential in mitigating training-time availability attacks. However, this paper challenges this defense. We identify a new threat model named stability attack, which aims to hinder robust availability through slight manipulation of training data. Under this threat, we show that in a simple statistical setting, adversarial training with a traditional defense budget $\epsilon$ fails to provide test robustness, where non-robust features of the training data can be obtained by $\epsilon$ having Boundary disturbances are strengthened. Furthermore, we analyze the necessity of expanding the defense budget to counter stability attacks. Finally, comprehensive experiments show that stability attacks are destructive to benchmark datasets, so adaptive defenses are necessary to maintain robustness.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/a94a8800a4b0af45600bab91164849df-Paper-Conference.pdf

5b4000828230dede9dd72e72d691d5e3.png

22、Certifying Robust Graph Classification under Orthogonal Gromov-Wasserstein Threats

Hongwei Jin, Zishun Yu, Xinhua Zhang

Graph classifiers are vulnerable to topological attacks. While robustness credentials have been recently developed, their threat models only consider local and global edge perturbations, effectively ignoring important graph structures such as isomorphism. To address this issue, we propose to measure perturbations using orthogonal Gromov-Wasserstein distances and construct their Fenchel conjugates for convex optimization. Our key insight comes from the matching loss, which connects two variables via a monotonic operator and provides a tight convex approximation to the resistive distance on graph nodes. Both our certificate and attack algorithm are proven effective when applied to graph classification via graph convolutional networks.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/0b6b00f384aa33fec1f3d6bcf9550224-Paper-Conference.pdf

ff670bc1d48272018eea3ab1209414db.png

23、Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains

Kiarash Shamsi, Friedhelm Victor, Murat Kantarcioglu, Yulia Gel, Cuneyt G Akcora

Machine learning on blockchain graphs is an emerging field with many applications, such as ransomware payment tracking, price manipulation analysis, and money laundering detection. However, analyzing blockchain data requires domain expertise and computing resources, which constitutes a significant obstacle that hinders progress in this field. We introduce Chartalist, the first comprehensive platform to systematically access and use machine learning on a large number of blockchains, to address this challenge. Chartalist includes machine learning-ready datasets from unspent transaction outputs (UTXOs) such as Bitcoin and account-based blockchains such as Ethereum. We anticipate that Chartalist can facilitate data modeling, analysis, and representation of blockchain data and attract a broader community of scientists to analyze blockchains. Chartalist is an open science initiative at https://github.com/cakcora/Chartalist.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/e245189a86310b6667ac633dbb922d50-Paper-Datasets_and_Benchmarks.pdf

4593ec4d82a6efb28a58ae9088480c87.png

24、Counterfactual Fairness with Partially Known Causal Graph

Aoqi Zuo, Susan Wei, Tongliang Liu, Bo Han, Kun Zhang, Mingming Gong

Fair machine learning aims to avoid unfair treatment of individuals or subgroups based on "sensitive attributes" such as gender and race. A fair machine learning approach built on causal inference identifies discrimination and bias through causal effects. Although causality-based fair learning has received increasing attention, current methods assume that the true causal graph is fully known. This paper proposes a general approach to implement the notion of counterfactual fairness in the absence of knowledge of the true causal graph. To select features that lead to counterfactual fairness, we derive conditions and algorithms for identifying ancestral relationships between variables, especially on partially directed acyclic graphs (PDAGs), a class that can be learned from observational data and domain knowledge The causal directed graph of . Interestingly, when specific background knowledge is provided: sensitive attributes have no ancestors in the causal graph, counterfactual fairness can be achieved, as if the true causal graph is fully known. Results on simulated and real-world datasets demonstrate the effectiveness of our method.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/08887999616116910fccec17a63584b5-Paper-Conference.pdf

3edc820bf3db684edab3ae874abcc706.png

25、Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media

Yizhou Zhang, Defu Cao, Yan Liu

Recent years have witnessed the rise of disinformation campaigns that spread specific narratives on social media to manipulate public opinion on different domains such as politics and healthcare. Therefore, an effective and efficient automatic method is needed to estimate the impact of disinformation on user beliefs and activities. However, existing disinformation impact estimation studies either rely on small-scale psychological experiments or can only find correlations between user behavior and disinformation. To address these issues, this paper develops a causal framework to model the causal effects of disinformation from a point-in-time process perspective. To accommodate large-scale data, we devise a method that is both efficient and accurate for estimating individual treatment effects (ITE) via neural time-point processes and Gaussian mixture models. Extensive experiments on synthetic datasets verify the effectiveness and efficiency of our model. We further apply our model on a real dataset of social media posts and engagements about COVID-19 vaccines. Experimental results show that our model identifies an identifiable causal effect of disinformation that harms people's subjective sentiment toward vaccines.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/45542d647974ca6af58441c4817c9b5b-Paper-Conference.pdf

757e0652cfa88d95e5137710372a5256.png

26、Counterfactual harm

Jonathan Richens, Rory Beard, Daniel H. Thompson

To act safely and ethically in the real world, an agent must be able to reason about harm and avoid harmful actions. To date, however, there has been no statistical way to measure harm and incorporate it into algorithmic decision-making. In this paper, we propose the first formal definition of harm and benefit using a causal model. We show that any factual definition of harm is unable to identify harmful behavior in some situations, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies after distribution shifts. We leverage our definition of harm to design a harm avoidance decision framework using a counterfactual objective function. We demonstrate the application of this framework to the problem of determining optimal drug doses by using dose-response models learned from randomized controlled trial data. We find that standard methods of using therapeutic effects to select doses lead to unnecessarily harmful doses, whereas our counterfactual approach identifies doses that are significantly less harmful but do not affect efficacy.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/ebcf1bff7b2fe6dcc3fbe666faaa50f1-Paper-Conference.pdf

10402ceab268a262336ffee869eeaf7c.png

27、DISCO: Adversarial Defense with Local Implicit Functions

Chih-Hui Ho, Nuno Vasconcelos

This paper considers the problem of adversarial defense for image classification, where the goal is to make the classifier robust to adversarial examples. Inspired by the assumption that these examples are beyond the natural image manifold, a novel Adversarial Defense with Local Implicit Function (DISCO) is proposed to remove adversarial perturbations via local manifold projection. DISCO uses an adversarial image and a query pixel location, outputting a clean RGB value at the location. It is implemented by an encoder and a local implicit module, where the former produces per-pixel deep features and the latter uses features in the neighborhood of the query pixel to predict clean RGB values. Extensive experiments show that DISCO and its cascaded versions outperform previous defenses whether the defense is known to the attacker or not. It is also demonstrated that DISCO is data- and parameter-efficient and capable of defense across datasets, classifiers, and attacks.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/96930636e3fb63935e2af153d1cc40a3-Paper-Conference.pdf

867f2ccada8efc040632dc9141edbcd1.png

28、DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning

Archana Bura, Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland

Safe reinforcement learning is extremely challenging—not only must it explore in an unknown environment, but it must also ensure that safety constraints are not violated. We formulate this safe reinforcement learning problem using a finite-time-constrained Markov decision process (CMDP) framework with unknown transition probability functions. We model the security requirement as a constraint on the expected cumulative cost that must be satisfied across all learning processes. We propose a model-based safe reinforcement learning algorithm called "Doubly Optimistic and Pessimistic Exploration" (DOPE), and demonstrate that it learns without violating safety constraints while achieving a target regret $\tilde{O} (|\mathcal{S}|\sqrt{|\mathcal{A}| K})$. Among them, $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, and $K$ is the number of times of learning. Our key idea is to combine the reward addition of exploration (optimistic) with conservative constraints (pessimistic), in addition to standard optimistic model exploration. DOPE not only improves the target regret bound, but also shows significant empirical performance gains over earlier optimistic-pessimistic methods.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/076a93fd42aa85f5ccee921a01d77dd5-Paper-Conference.pdf

a79b76be7676f68d2eefa2eb74391b83.png

29、DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing

Jiawei Shao, Yuchang Sun, Songze Li, Jun Zhang

Federated Learning (FL) aims to enable collaborative training of machine learning models while avoiding centralized collection of customers' private data. Unlike centralized training, the client's local dataset in FL is non-independent and identically distributed (non-IID). Additionally, clients that own the data may arbitrarily exit the training process. These features will significantly reduce the training performance. This paper proposes a Lagrangian Coded Computation (LCC)-based "Failure Resistant Safe Federated Learning" (DReS-FL) framework to address non-IID and failure problems. The key idea is to utilize Lagrangian encoding to secretly share the private dataset among clients, so that each client receives an encoded version of the global dataset and the local gradient computation for this dataset is unbiased. In order to correctly decode the gradient on the server, the gradient function must be a polynomial over a finite field, so we built a polynomial integer neural network (PINN) to implement our framework. Theoretical analysis shows that DReS-FL is robust to client failures and provides privacy protection for local datasets. Furthermore, our experimental results show that DReS-FL consistently improves performance significantly over baseline methods.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/448fc91f669c15d10364ee01d512cc10-Paper-Conference.pdf

f4f8fbf66ecfb7ff4d47a7f401a3db52.png

30、Defending Against Adversarial Attacks via Neural Dynamic System

Xiyuan Li, Zou Xin, Weiwei Liu

Despite their great success, deep neural networks (DNNs) have been hampered in their application in safety-critical domains due to their vulnerability to adversarial attacks. Some recent works propose ways to enhance the robustness of DNNs from the perspective of dynamical systems. Guided by this research line, inspired by the asymptotic stability of general non-autonomous dynamical systems, we propose to make each clean instance an asymptotically stable equilibrium point of a slowly time-varying system against adversarial attacks. We propose a theoretical guarantee that if a clean instance is an asymptotically stable equilibrium point, and the adversarial instance is within the neighborhood of that point, then asymptotically stable will reduce the adversarial noise, bringing the adversarial instance close to the clean instance. Inspired by our theoretical results, we further propose a nonautonomous divine ODE (ASODE) and constrain its corresponding linear time-varying system such that all clean instances serve as its asymptotically stable equilibrium points. Our analysis shows that these constraints can be translated into regularizers in implementations. Experimental results show that ASODE improves the robustness against adversarial attacks and outperforms existing methods.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/299a08ee712d4752c890938da99a77c6-Paper-Conference.pdf

56c2358e66547ad0fd6d5dfbd39e71dc.png

31、Delving into Sequential Patches for Deepfake Detection

Jiazhi Guan, Hang Zhou, Zhibin Hong, Errui Ding, Jingdong Wang, Chengbin Quan, Youjian Zhao

Recent advances in face forgery technology have led to the emergence of nearly untraceable deepfake videos, which could be exploited maliciously. Therefore, researchers work on deepfake detection. Previous studies have identified the importance of local low-level cues and temporal information in generalizing deepfake methods, however, they still suffer from robustness to post-processing. In this work, we propose a local- and temporal-aware Transformer-based deepfake detection (LTTD) framework, employing a local-to-global learning protocol that pays special attention to valuable temporal information in local sequences. Specifically, we propose Local Sequence Transformer (LST), which models temporal consistency over sequences of restricted spatial regions, where low-level information is hierarchically enhanced by learned shallow 3D filters. Based on local temporal embeddings, we then achieve the final classification in a globally contrastive manner. Extensive experiments on popular datasets validate our method to effectively discover local forgery cues and achieve state-of-the-art performance.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/1d051fb631f104cb2a621451f37676b9-Paper-Conference.pdf

3a14a6370a60177628156eb6e85ccfad.png

32、Differentially Private Model Compression

Fatemeh Sadat Mireshghallah, Arturs Backurs, Huseyin A. Inan, Lukas Wutschitz, Janardhan Kulkarni

Recent research papers have shown that large pre-trained language models like BERT, GPT-2 can be fine-tuned on private data to achieve comparable performance to non-private models for many downstream natural language processing (NLP) tasks, while guaranteeing differential privacy. However, the inference cost of these models (consisting of hundreds of millions of parameters) can be prohibitive. Therefore, in practice, LLMs are often compressed before being deployed to a specific application. In this paper, we set out to study differentially private model compression and propose a framework to achieve 50% sparsity levels while maintaining almost full performance. We demonstrate these ideas on the standard GLUE benchmark using a BERT model and set a benchmark for future research on this topic.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/bd6bb13e78da078d8adcabbe6d9ca737-Paper-Conference.pdf

868da13f7b840851ab89e1977d05b4cd.png

33、Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

Weixin Chen, Baoyuan Wu, Haoqian Wang

Poisoned backdoor attacks pose a serious threat to training deep models from data from untrusted sources. For existing backdoor models, we observe that the feature representations of poisoned samples with triggers are more sensitive to transformations, while clean samples are not. This inspires us to design a simple sensitivity metric called Feature Consistency Transform metric (FCT) to distinguish poisoned samples from clean samples in untrustworthy training sets. Furthermore, we propose two effective backdoor defense methods. The first approach trains a secure model from scratch using a two-stage secure training module based on a sample discrimination module using the FCT metric. The second method removes backdoors from models with backdoors using a backdoor removal module that alternately un-discriminates poisoned samples and re-learns distinguished clean samples. Extensive results on three benchmark datasets demonstrate that both methods achieve superior defensive performance against eight types of backdoor attacks compared to existing backdoor defenses. Code is available at: https://github.com/SCLBD/Effectivebackdoordefense.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/3f9bbf77fbd858e5b6e39d39fe84ed2e-Paper-Conference.pdf

b5af7a747d1bcb14a4b262c439e1385a.png

34、Evolution of Neural Tangent Kernels under Benign and Adversarial Training

Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus

Two major challenges facing modern deep learning are mitigating the vulnerability of deep networks to adversarial attacks and understanding the generalization ability of deep learning. For the first problem, many defense strategies have been developed, the most common of which is adversarial training (AT). For the second challenge, a leading theory emerged, Neural Tangential Kernel (NTK)—a method for characterizing the behavior of neural networks in the limit of infinite width. In this limit, the kernel is frozen and the underlying feature map is fixed. At a limited width, there is evidence that feature learning occurs earlier in the training phase (kernel learning), after which the kernel remains constant (lazy training). While previous work aimed to study adversarial vulnerability by freezing the perspective of infinite-width NTK, none investigated adversarial robustness to NTK during training. In this work, we conduct an empirical study of NTK evolution under standard training and adversarial training, aiming to clarify the impact of adversarial training on kernel learning and lazy training. We find that under adversarial training, NTK converges rapidly to different kernels (and feature maps) than standard training. This new kernel provides adversarial robustness even when non-robust training is performed on it. Furthermore, we find that adversarial training on top of a fixed kernel yields a classifier with ε=4/255 robust accuracy of 76.1% on PGD attack on CIFAR-10.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/4bc4e9ecd5ae4a75048dc216a770cba1-Paper-Conference.pdf

51ef12bf0399a5543c02499143964f42.png

35、Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro

Pretrained language models (LMs) have been shown to be prone to toxic language generation. In this work, we systematically explore domain adaptation training to reduce language model toxicity. We conducted this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we demonstrate that using automatically generated datasets consistently outperforms existing baselines, across a wide range of model sizes and automatic and human evaluations, even when it uses 3×1 smaller training corpora. We then comprehensively study detoxified LMs with parameter sizes ranging from 126M to 530B (3 times larger than GPT3), a scale that has never been studied before. We find that i) large LMs have similar levels of toxicity given the same pre-trained corpus, and ii) large LMs require more effort to forget toxic content seen in pre-training. We also explore parameter-efficient detoxification training methods. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters, but also achieves a better trade-off between toxicity and perplexity in large-scale models. Our code will be available at: https://github.com/NVIDIA/Megatron-LM/.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/e8c20cafe841cba3e31a17488dc9c3f1-Paper-Conference.pdf

417360883c2b4a3145a899a0135a59a5.png

36、FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning

Tao Qi, Fangzhao Wu, Chuhan Wu, Lingjuan Lyu, Tong Xu, Hao Liao, Zhongliang Yang, Yongfeng Huang, Xing Xie

Vertical Federated Learning (VFL) is a privacy-preserving machine learning paradigm that can learn models from features distributed across different platforms while maintaining privacy. Since in real-world applications, data may be biased on fairness-sensitive features such as gender, VFL models may inherit bias from training data and be unfair to certain groups of users. However, existing fair machine learning methods usually rely on centrally storing fairness-sensitive features to achieve model fairness, which is usually not applicable in federated scenarios. In this paper, we propose a fair longitudinal federated learning framework (FairVFL) that can improve the fairness of VFL models. The core idea of ​​FairVFL is to learn a uniform and fair representation of samples based on decentralized functional domains while preserving privacy. Specifically, each platform with impartiality-agnostic features first learns local data representations from local features. These local representations are then uploaded to the server and aggregated into a unified representation for the target task. To learn a fair unified representation, we send it to each platform that stores fair-sensitive features, and apply adversarial learning to remove inherited bias from biased data. Furthermore, to protect user privacy, we further propose a contrastive adversarial learning method that removes private information in the unified representation in the server before sending it to the platform that preserves impartial and sensitive features. Experiments on three real-world datasets verify that our method can effectively improve the fairness of the model and protect user privacy.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/333a7697dbb67f09249337f81c27d749-Paper-Conference.pdf

a0a31b46a5c89168e479c7b43cbd95bf.png

37、Fault-Aware Neural Code Rankers

Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark Encarnación, Shuvendu Lahiri, Madanlal Musuvathi, Jianfeng Gao

Large language models (LLMs) have demonstrated an impressive ability to generate code in a variety of programming tasks. In many cases, LLMs can generate the correct program when given multiple attempts. Therefore, a recent trend is to use models for large-scale program sampling, and then filter/rank based on how the program performs on a small number of known unit tests to select a candidate solution. However, these approaches assume that unit tests are given and that the resulting programs (which can perform arbitrary dangerous operations, such as file operations) can be safely executed. Both of the above assumptions are unrealistic in actual software development. In this paper, we propose CodeRanker, a neural ranker that predicts the correctness of sampled programs without executing the program. Our CodeRanker is fault-aware, i.e. it is trained to predict different types of execution information, such as predicting precise compile/run-time error types (e.g. IndexError or TypeError). We show that CodeRanker can significantly improve the pass@1 accuracy of various code generation models (including Codex, GPT-Neo, GPT-J) on the APPS, HumanEval, and MBPP datasets.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/5762c579d09811b7639be2389b3d07be-Paper-Conference.pdf

c95868ec12ebd817dae21c9f56e796d6.png

38、Finding Naturally Occurring Physical Backdoors in Image Datasets

Emily Wenger, Roma Bhattacharjee, Arjun Nitin Bhagoji, Josephine Passananti, Emilio Andere, Heather Zheng, Ben Zhao

The extensive backdoor poisoning literature examines the use of "digital trigger patterns" for backdoor attacks and defenses. In contrast, "physical backdoors," which use physical objects as triggers, have only recently been identified and are qualitatively different from most defenses against digitally triggered backdoors. The study of physical backdoors is limited by the acquisition of large datasets containing images of real objects co-located with misclassified targets, and the construction of such datasets takes a lot of time and effort. This study aims to address the accessibility challenges of physical backdoor attack research. We assume that natural co-existing physical objects already exist in popular datasets like ImageNet. Once identified, careful relabeling of this data can turn them into training samples for physical backdoor attacks. We propose a method to scalable identify these subsets of potential triggers in existing datasets, as well as the specific categories they can contaminate. We refer to these natural trigger subsets and categories as natural backdoor datasets. Our technique successfully identifies natural backdoors in widely available datasets and produces models that are behaviorally equivalent to those trained on manually screened datasets. We release our code to allow the research community to create their own datasets for studying physical backdoor attacks.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/8af749935131cc8ea5dae4f6d8cdb304-Paper-Datasets_and_Benchmarks.pdf

a43f0382c2a6f399d7db6f3ddf3b8a50.png

39、Formulating Robustness Against Unforeseen Attacks

Sihui Dai, Saeed Mahloujifar, Prateek Mittal

Existing defenses against adversarial examples (such as adversarial training) often assume that the adversary will conform to a specific or known threat model, such as ℓp perturbation within a fixed budget. In this paper, we focus on situations where, during training, the threat model assumed by the defense does not match the actual capabilities of the adversary at test time. We pose the question: if a learner is trained on a specific "source" threat model, when can we expect robust generalization to an unknown "target" threat model? Our key contribution is the formal definition of the problem of learning and generalization in the face of unknown adversaries, which helps us infer increased adversarial risk from the traditional perspective of known adversaries. Applying our framework, we derive a generalization bound that relates the generalization gap between the source and target threat models to the change in the feature extractor, which measures the difference between the features extracted under the given threat model. expected maximum difference. Based on our generalization bound, we propose variation regularization (VR), which reduces the variation of feature extractors under the source threat model during training. We empirically demonstrate that using VR leads to improved generalization to unknown attacks at test time, and combining VR with perceptual adversarial training (Laidlaw et al., 2021) achieves state-of-the-art robustness to unknown attacks . Our code is publicly available at https://github.com/inspire-group/variation-regularization.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/392ac56724c133c37d5ea746e52f921f-Paper-Conference.pdf

45764e64f54344892d0855750238a863.png

40、Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attack

Tian Yu Liu, Yu Yang, Baharan Mirzasoleiman

A powerful (invisible) class of data pollution attacks alters the predictions of some test data by applying small adversarial perturbations to some training samples. Existing defense mechanisms are not feasible in practice, as they tend to severely impair generalization performance, or are attack-specific and difficult to apply. Here, we propose a simple yet highly effective method that, unlike existing methods, breaks various types of stealth pollution attacks with the slightest drop in generalization performance. Our key observation is that the attack introduces local sharp regions of high training loss, which, when minimized, learn to resist perturbations and make the attack successful. To break the pollution attack, our key idea is to mitigate the sharp loss regions introduced by poisons. To this end, our method consists of two components: an optimized friendly noise, generated to perturb examples to the maximum extent without degrading performance, and a randomly varying noise component. The combination of these two parts builds a very lightweight yet extremely effective defense against the most powerful no-trigger targets and hidden-trigger backdoor pollution attacks, including Gradient Match, Bullseye Polyhedron, and Sleeping Agent. We show that our friendly noise is transferable to other architectures, and that adaptive attacks are unable to break our defense due to its random noise component.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/4e81308aa2eb8e2e4eccf122d4827af7-Paper-Conference.pdf

dc2e1b97d4856511b2db0b99faf09b53.png

41、GAMA: Generative Adversarial Multi-Object Scene Attacks

Abhishek Aich, Calvin-Khang Ta, Akash Gupta, Chengyu Song, Srikanth Krishnamurthy, Salman Asif, Amit Roy-Chowdhury

Most adversarial attack methods focus on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple semantically related dominant objects. Therefore, it is crucial to explore designing attack strategies that go beyond learning single object scenarios or attacking single object victim classifiers. Since perturbations have a strong transitive nature and can be transferred to unknown models, this paper proposes a method for adversarial attacks using generative models for multi-object scenarios. To represent the relationship between different objects in the input scene, we leverage the open-source pre-trained visual-language model CLIP (Contrastive Language-Image Pre-training) to exploit the semantics encoded in the linguistic space as well as the visual space. We refer to this attack method as Generative Adversarial Multi-Object Attack (GAMA). GAMA demonstrates the utility of the CLIP model as an attacker's tool for training a powerful perturbation generator for multi-object scenes. Using joint image-text features to train the generator, we show that GAMA can produce powerful transferable perturbations in various attack settings to fool the victim classifier. For example, GAMA triggers about 16% more misclassifications than state-of-the-art generative methods in the black-box setting, where the attacker's classifier architecture and data distribution differ from the victim's. Our code is available here: https://abhishekaich27.github.io/gama.html

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/efbd571f139d26604e53fe2760e2c073-Paper-Conference.pdf

a71af4a7dcda67a6efff0f3b54e95834.png

42、Identification, Amplification and Measurement: A bridge to Gaussian Differential Privacy

Yi Liu, Ke Sun, Bei Jiang, Linglong Kong

Gaussian Differential Privacy (GDP) is a family of one-parameter privacy concepts that provide consistent guarantees to avoid disclosure of sensitive personal information. Although combined GDP provides additional interpretability and tighter bounds, many widely used mechanisms, such as the Laplace mechanism, intrinsically provide GDP guarantees, but generally do not take advantage of this new framework because of their privacy guarantees derived from different backgrounds. In this paper, we study the asymptotic properties of privacy configurations and develop a simple criterion for identifying algorithms with GDP properties. We propose an efficient method for GDP algorithms to narrow down the possible optimal privacy measure μ with arbitrarily small and quantifiable margins of error. For non-GDP algorithms, we provide a post-processing procedure that amplifies existing privacy guarantees to meet the GDP condition. As applications, we compare two families of one-parameter privacy concepts, ϵ-DP and μ-GDP, and show that all ϵ-DP algorithms are also intrinsically GDP. Finally, we show that the combination of our measurement procedure and the GDP combination theorem is a powerful and convenient tool for dealing with combinations compared to conventional standard and advanced combination theorems.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/4a29e8bc94b4c5d21d58a4fffdff800b-Paper-Conference.pdf

41ae4c13480b069991b26af9fa9de8f8.png

43、Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework

Henger Li, Xiaolin Sun, Zizhan Zheng

We propose a model-based reinforcement learning framework for non-targeted attacks against federated learning (FL) systems. Our framework first utilizes the server's model updates to approximate the distribution of client-side aggregated data. The learned distribution is then used to build a simulator of the FL environment and learn an adaptive attack policy via reinforcement learning. Even when servers employ robust aggregation rules, our framework is able to automatically learn powerful attacks. We further derive an upper bound on the attacker's performance penalty due to inaccurate distribution estimates. Experimental results show that the proposed attack framework significantly outperforms existing poisoning attack techniques on real-world datasets. This demonstrates the importance of developing adaptive defenses for FL systems.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/e2ef0cae667dbe9bfdbcaed1bd91807b-Paper-Conference.pdf

30bbf11ac9fd7fbe7f703d2364b12109.png

44、Lethal Dose Conjecture on Data Poisoning

Wenxiao Wang, Alexander Levine, Soheil Feizi

Data poisoning is when an adversary distorts the training set of a machine learning algorithm for malicious purposes. This paper proposes a conjecture about the basis of data poisoning, called lethal dose conjecture. This conjecture shows that if n clean training samples are needed for accurate prediction, in a training set of size N, only $\Theta(N/n)$ poisoned samples can be tolerated to ensure accuracy. In theory, we verified this conjecture in multiple cases. Through the distributional run-in, we also provide a more general view. Deep Partitioned Aggregation (DPA) and its extension, Finite Aggregation (FA), are recent approaches to provable defenses against data poisoning by training a majority vote of many base models from different subsets of the training set to make predictions. This conjecture implies that both DPA and FA are (asymptotically) optimal - if we have the most data-efficient learners, they can turn this into one of the strongest defenses against data poisoning. This outlines a practical approach to developing stronger defenses by finding data-efficient learners. As a proof-of-concept, we empirically demonstrate that by simply training the base learner with different data augmentation techniques, we can improve the certification robustness of DPA on CIFAR-10 and GTSRB, respectively, by a factor of 2 without loss of accuracy. Double and triple.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/0badcb4e95306df76a719409155e46e8-Paper-Conference.pdf

191483ebf16592011b8046cdaedd679d.png

45、MORA: Improving Ensemble Robustness Evaluation with Model Reweighing Attack

yunrui yu, Xitong Gao, Cheng-Zhong Xu

Adversarial attacks trick neural networks by adding tiny perturbations to their input data. Ensemble defense is a promising research direction that improves robustness against such attacks by training methods that minimize attack transferability between sub-models while maintaining high accuracy on natural inputs. We find, however, that recent state-of-the-art adversarial attack strategies cannot reliably evaluate integrated defenses, significantly overestimating their robustness. This paper identifies two factors that contribute to this behavior. First, the ensemble formed by these defenses has obvious attack difficulty against existing gradient-based methods because the gradients are ambiguous. Second, ensemble defenses diversify submodel gradients, presenting a challenge to defeat all submodels simultaneously, and simply summing their contributions may negate the overall attack goal; however, we observe that even if most submodels That's right, integrations can still be fooled. Therefore, we introduce MORA, a model reweighting attack that guides adversarial example synthesis by reweighting the importance of sub-model gradients. MORA finds that recent integrated defenses all exhibit varying degrees of overestimation of resilience. Compared with recent state-of-the-art white-box attacks, MORA achieves higher attack success rates on all considered ensemble models while converging orders of magnitude faster. In particular, most integrated defenses have little or exactly 0% robustness to MORA at perturbations of $\ell^\infty$ on CIFAR-10 and $0.01$ on CIFAR-100 . We open source MORA and provide reproducible results and pre-trained models, and also provide a leaderboard of integrated defenses under various attack strategies.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/ac895e51849bfc99ae25e054fd4c2eda-Paper-Conference.pdf

67f1b1fea8606659a52698086b114c89.png

46、Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

Khoa D Doan, Yingjie Lao, Ping Li

In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. Under these attacks, the attacker embeds a covert backdoor into the trained model to allow the compromised model to function normally with clean input, but with a maliciously constructed input with a trigger, the attacker Misclassification due to control over maliciously constructed inputs. Although these existing attacks are very effective, the attacker's capabilities are limited: for an input, these attacks can only cause the model to deviate in a single direction from a predefined or target category. Instead, this paper exploits a novel backdoor attack with a more powerful payload, called Marksman, in which the attacker can arbitrarily choose which target class the model will misclassify during inference. To achieve this goal, we propose to represent the trigger function as a category-conditional generative model and inject the backdoor into a constrained optimization framework, where the trigger function learns to generate an optimal trigger pattern to attack any target category, while this generative The backdoor is embedded into the trained model. On the basis of the learned trigger generation function, during inference, the attacker can specify an arbitrary backdoor attack target class and accordingly create an appropriate trigger to make the model classify as this target class. We experimentally demonstrate that the proposed framework achieves high attack performance (e.g., 100% attack success rate in several experiments) across several benchmark datasets, including MNIST, CIFAR10, GTSRB, and TinyImageNet, while maintaining performance on clean data. The proposed Marksman backdoor attack can also easily evade existing defenses that were originally designed to counter single-target class backdoor attacks. Our work is another important step toward addressing the widespread risk of backdoor attacks in real-world environments.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/fa0126bb7ebad258bf4ffdbbac2dd787-Paper-Conference.pdf

e6343de91320e07ef49315586ae00cbb.png

47、Measuring Data Reconstruction Defenses in Collaborative Inference Systems

Mengda Yang, Ziang Li, Juan Wang, Hongxin Hu, Ao Ren, Xiaoyang Xu, Wenzhe Yi

The collaborative inference system aims to speed up the prediction process in edge cloud scenarios, where local devices and cloud systems jointly run complex deep learning models. However, these edge-cloud collaborative inference systems are vulnerable to emerging reconstruction attacks, in which malicious cloud service providers are able to recover private data of edge users. To defend against such attacks, several defenses have recently been introduced. Unfortunately, we know very little about the robustness of these defenses. In this paper, we first take steps to measure the robustness of these state-of-the-art defenses against reconstruction attacks. Specifically, we show that latent privacy features are still preserved in obfuscated representations. Under such observations, we design a technique called Sensitive Feature Distillation (SFD) to recover sensitive information from protected feature representations. Our experiments show that SFD can break through defense mechanisms in model partitioning scenarios, demonstrating the inadequacy of existing defense mechanisms as privacy-preserving techniques against reconstruction attacks. We hope that our findings inspire further work to improve the robustness of defense mechanisms against reconstruction attacks on collaborative reasoning systems.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/53f1c3ec5df814b5aabe9ae88a29bb49-Paper-Conference.pdf

c66d0fa53f81920ad096c3bf764de23f.png

48、Multilingual Abusive Comment Detection at Scale for Indic Languages

Vikram Gupta, Sumegh Roychowdhury, Mithun Das, Somnath Banerjee, Punyajoy Saha, Binny Mathew, Hastagiri Prakash Vanchinathan, Animesh Mukherjee

Social media platforms were originally conceived as online town squares where people could gather, share information, and communicate with each other peacefully. However, these platforms are continually plagued by harmful content generated by malicious acts, gradually transforming them into “wrestling rings” where malicious actors are free to abuse various marginalized groups. Therefore, accurate and timely detection of abusive content on social media platforms is important to facilitate safe interactions among users. However, due to the small size of Indian abusive speech datasets and sparse language coverage, developing algorithms applicable to Indian social media users (one sixth of the global population) is severely constrained. To facilitate and encourage research in this important direction, for the first time we contribute MACD - a large-scale (150K), human-annotated, multilingual (5 languages), balanced (49% abusive content) and diverse (70K users) from a popular social media platform - ShareChat. We also released AbuseXLMR, an abusive content detection model pretrained on large volumes of social media comments in 15+ Indian languages, which outperforms XLM-R and MuRIL on multiple Indian language datasets. In addition to annotations, we also publish mappings between comments, posts, and user IDs in order to model their relationships. We share competitive monolingual, cross-lingual, and few-shot baselines to use MACD as a dataset benchmark for future research.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/a7c4163b33286261b24c72fd3d1707c9-Paper-Datasets_and_Benchmarks.pdf

b7adc3dc0094fd433a556cc58c992f9a.png

49、NS3: Neuro-symbolic Semantic Code Search

Shushan Arakelyan, Anna Hakhverdyan, Miltiadis Allamanis, Luis Garcia, Christophe Hauser, Xiang Ren

Semantic code search is the task of retrieving code fragments based on a textual description of their function. Recent work has focused on similarity measures between neural embeddings using text and code. However, current language models are believed to struggle with longer, more complex sentences and multi-step reasoning. To overcome this limitation, we propose to complement it with the semantic structure layout of the query sentence. Semantic layout is used to decompose the final inference decision into a series of lower-level decisions. We implement this idea using a neural modular network architecture. We compare our model - NS3 (Neuro-Symbolic Semantic Search) with a number of baselines, including state-of-the-art semantic code retrieval methods such as CodeBERT, CuBERT and GraphCodeBERT, and on two datasets - Code Search Network (CSN) and Code Search and Question Answering (CoSQA). On these datasets, we demonstrate that our method can achieve higher performance. We also conduct additional studies to demonstrate the effectiveness of our modular design in handling composite queries.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/43f5f6c5cb333115914c8448b8506411-Paper-Conference.pdf

a523c3a4dc7836dea5884465e5ca3b66.png

50、Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks

Shengming Yuan, Qilong Zhang, Lianli Gao, Yaya Cheng, Jingkuan Song

Unconstrained color attacks, which can manipulate the semantic color of images, have demonstrated their stealth and success in fooling the human eye and deep neural networks. However, current research often sacrifices flexibility in uncontrolled settings to ensure the naturalness of adversarial examples. Therefore, the black-box attack performance of these methods is limited. To improve the transferability of adversarial examples without compromising image quality, we propose a novel Natural Color Fooling (NCF) method guided by true color distributions sampled from publicly available datasets, and Optimized with our neighborhood search and initial reset. By performing extensive experiments and visualizations, we convincingly demonstrate the effectiveness of our proposed method. Notably, the average results show that our NCF can outperform the existing state-of-the-art methods by 15.0%∼32.9% in fooling commonly trained models and by 10.0%∼25.3% in evasion defense methods. Our code is available at https://github.com/VL-Group/Natural-Color-Fool.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/31d0d59fe946684bb228e9c8e887e176-Paper-Conference.pdf

18488e7073be12e3a14d36b51e59b3a5.png

51、Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Peter Henderson, Mark Krass, Lucia Zheng, Neel Guha, Christopher D Manning, Dan Jurafsky, Daniel Ho

The rise of large language models has raised concerns that their use of bias, obscenity, copyright, and private information in pre-training can do significant harm. Emerging ethical approaches attempt to filter pre-training material, but these approaches are ad hoc and do not take context into account. We propose a law-based filtering approach that directly addresses the filter material tradeoff. First, we collect and make available the Legal Repository, a ~256GB (and growing) dataset of English-language open-source legal and administrative data such as court decisions, contracts, administrative regulations, and legislative records. Pre-training on legal databases helps to solve the problem of improving judicial institutions. Second, we distill legal norms enacted by governments to restrict toxic or private content into viable research approaches, and discuss how our dataset reflects these norms. Third, we show how legal corpora allow researchers to learn these filtering rules directly from the data, and provide an exciting new research direction for model processing.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/bc218a0c656e49d4b086975a9c785f47-Paper-Datasets_and_Benchmarks.pdf

262f6885da40daf28fbf5dd51ff92624.png

52、Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models

Fan LIU, Hao Liu, Wenzhao Jiang

Machine learning-based traffic prediction models exploit complex spatiotemporal autocorrelations to provide accurate predictions of urban traffic states. However, existing methods assume the existence of a reliable and unbiased forecast environment, which is not always the case in reality. In this work, we investigate the vulnerability of spatiotemporal traffic prediction models and propose a practical framework for adversarial spatiotemporal attacks. Specifically, we propose an iterative gradient-guided node saliency method to identify a time-varying set of victim nodes, rather than attacking all geographically distributed data sources simultaneously. Furthermore, we design a scheme based on spatiotemporal gradient descent to generate ground-truth adversarial traffic states under perturbation constraints. Meanwhile, we theoretically prove the worst performance bound for adversarial traffic prediction attacks. Extensive experiments on two real-world datasets demonstrate that the proposed two-step framework can achieve up to 67.8% performance degradation on various advanced spatiotemporal prediction models. Notably, we also show that adversarial training with our proposed attack can significantly improve the robustness of spatio-temporal traffic prediction models.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/79081c95482707d2db390542614e29cd-Paper-Conference.pdf

accc238875c146c477ae787d3561b50b.png

53、Pre-activation Distributions Expose Backdoor Neurons

Runkai Zheng, Rongjun Tang, Jianze Li, Li Liu

Convolutional neural networks (CNNs) can be manipulated to perform specific behaviors when encountering specific trigger patterns without affecting the performance of normal samples, which is known as a backdoor attack. Backdoor attacks are usually implemented by injecting a small number of contaminated samples into the training set, in this way the victim trains a model embedding the specified backdoor. In this work, we demonstrate that backdoor neurons are exposed through their pre-activation distributions, where the populations of benign data and polluted data show significantly different moments. This property was shown to be attack invariant and allowed us to efficiently target backdoor neurons. On this basis, we make several appropriate assumptions about the distribution of neuron activations, and propose two methods based on (1) differential entropy of neurons and (2) benign sample distributions and hypothesized distributions based on pollution statistics A backdoor neuron detection strategy based on the Kullback-Leibler divergence. Experimental results show that our proposed defense strategy is efficient and effective against various backdoor attacks.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/76917808731dae9e6d62c2a7a6afb542-Paper-Conference.pdf

d6aae4fb267a31e656f4572ff3547e83.png

54、Pre-trained Adversarial Perturbations

Yuanhao Ban, Yinpeng Dong

In recent years, self-supervised pre-training has received increasing attention due to its excellent performance on numerous downstream tasks after fine-tuning. However, it is well known that deep learning models lack robustness to adversarial examples, which may raise security concerns for pretrained models, although relatively little research has been done in this area. In this paper, we explore the robustness of pretrained models by introducing pretrained adversarial perturbations (PAP), which are generic perturbations formulated to attack fine-tuned models without any knowledge about downstream tasks. To this end, we propose a Low-Level Neuron Activation Boosting Attack (L4A) method to generate effective PAPs by boosting low-level neuron activations of pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is able to efficiently generate more transferable PAPs to attack fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks show that our method improves attack success rate compared to the state-of-the-art methods.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/084727e8abf90a8365b940036329cb6f-Paper-Conference.pdf

ef95a6d52200085dda5c1cab588e1bdc.png

55、Private Multiparty Perception for Navigation

Hui Lu, Mia Chiquier, Carl Vondrick

We propose a framework for navigating cluttered environments by linking multiple cameras together while preserving privacy. Occlusions and obstacles in large environments are often challenging situations for navigating agents because the environment is not fully visible from a single camera perspective. Given multiple camera views of the environment, our method learns to generate multi-view scene representations that can only be used for navigation and provably prevent any party from extrapolating information from outside the output task. On a new navigation dataset that we will release publicly, experiments show that private multi-party representations allow navigation through complex scenes and obstacles while preserving privacy. Our method is scalable to any number of camera views. We believe that developing privacy-preserving visual representations is increasingly important for many applications, such as navigation.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/15ddb1773510075ef44981cdb204330b-Paper-Conference.pdf

f4149ab186b34e995a21157800a096fe.png

56、Private Set Generation with Discriminative Information

Dingfan Chen, Raouf Kerkouche, Mario Fritz

Differentially private data generation techniques have emerged as a promising solution to data privacy challenges. It enables data sharing with strict privacy guarantees, which are essential for scientific progress in sensitive fields. Unfortunately, existing proprietary generative models struggle with the utility of synthetic samples due to limitations in the inherent complexity of modeling high-dimensional distributions. Unlike existing methods that aim to fit the full data distribution, we directly optimize a small set of samples representing the distribution, which is generally an easier task and better suited for private training. Furthermore, we leverage discriminative information from downstream tasks to further simplify training. Our work provides an alternative perspective on differentially private high-dimensional data generation and introduces a simple yet effective method that greatly improves the sample utility of existing methods.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/5e1a87dbb7e954b8d9d6c91f6db771eb-Paper-Conference.pdf

5b7ac02881a3063cc28316d248926979.png

57、Private Synthetic Data for Multitask Learning and Marginal Queries

Giuseppe Vietri, Cedric Archambeau, Sergul Aydore, William Brown, Michael Kearns, Aaron Roth, Ankit Siva, Shuai Tang, Steven Z. Wu

We present a differentially private algorithm that simultaneously generates synthetic data for multiple tasks: marginal query and multi-task machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, unlike many related prior methods that require first converting numerical features to {high-cardinality} categorical features through a binning strategy. Better accuracy requires higher binning granularity, but this negatively impacts scalability. Eliminating the need for binning allows us to generate synthetic data that preserves a wide range of statistical queries such as marginal and class-conditional linear thresholding queries on numerical features. Keeping the latter means that the number of points for each class label is roughly the same over some half-space, a desired property for training linear classifiers in a multi-task setting. Our algorithm also allows us to generate high-quality synthetic data for hybrid marginal queries that combine categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques and provides significant accuracy gains in marginal query and linear prediction tasks on mixed-type datasets.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/7428310c0f97f1c6bb2ef1be99c1ec2a-Paper-Conference.pdf

676e1d4433ee739f0c712ee64600c162.png

58、Private and Communication-Efficient Algorithms for Entropy Estimation

Gecia Bravo-Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Munoz Medina, Umar Syed

Modern statistical estimation is often performed in a distributed environment, where each sample belongs to a single user, who shares their data with a central server. Users are generally concerned with preserving the privacy of their samples and minimizing the amount of data they have to transmit to the server. We provide improved private and communication-efficient algorithms for estimating several common measures of distribution entropy. All of our algorithms have constant communication cost and satisfy local differential privacy. For a joint distribution of many variables, whose conditional independence graph is a tree, we describe an algorithm for estimating Shannon's entropy with a sample size that is linear in the number of variables, and whose sample complexity is quadratic compared to previous work of. We also describe an algorithm for estimating Gini entropy whose sample complexity does not depend on the support size of the distribution, and which can be implemented using a single round of concurrent communication between the user and the server, whereas the best previously known algorithm Has high communication costs and requires a server to facilitate interaction between users. Finally, we describe an algorithm for estimating collision entropy that matches the space and sample complexity of the best known algorithms, but generalizes it to private and communication-efficient settings.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/62e5721247075dd097023d077d8e22f7-Paper-Conference.pdf

577f142c09101ae7ce14e4031258f39e.png

59、Probing Classifiers are Unreliable for Concept Removal and Detection

Abhinav Kumar, Chenhao Tan, Amit Sharma

Neural network models trained on text data have been found to encode poor language or sensitive concepts in their representations. Removing concepts is not trivial due to the complex relationships among concepts, text input, and learned representations. Recent research proposes post-hoc and adversarial approaches to remove these unwanted concepts from a model's representation. Through extensive theoretical and empirical analyses, we show that these approaches can be counterproductive: they fail to completely remove such concepts, and in the worst case may destroy all task-relevant features. The reason is that these methods rely on a probing classifier as a proxy for concepts. Even learning a detection classifier under the most favorable conditions where the relevant features of concepts in the representation space can provide 100% accuracy, we show that detection classifiers are likely to use non-concept features, so post hoc or adversarial methods will not be able to correctly remove concept. These theoretical effects are confirmed experimentally in sensitive applications targeting concept removal such as fairness. We recommend using these methods with caution and come up with a fake metric to measure the quality of the final classifier.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/725f5e8036cc08adeba4a7c3bcbc6f2c-Paper-Conference.pdf

d332a0290b7678bca9c6343f487b4b9c.png

60、Provable Defense against Backdoor Policies in Reinforcement Learning

Shubham Bharti, Xuezhou Zhang, Adish Singla, Jerry Zhu

We propose a provable defense mechanism against backdoor strategies in reinforcement learning, under the subspace triggering assumption. A backdoor tactic is a security threat in which an adversary publishes a seemingly well-behaved tactic that actually allows the trigger to be hidden. During deployment, an adversary can modify the observed state in a specific way to trigger unintended actions and harm the agent. We assume that the agent does not have the resources to retrain a good policy. Instead, our defense mechanism sanitizes the backdoor strategy by projecting the observed state from a "safe subspace" estimated from a small number of interactions with the clean (non-triggering) environment. In the presence of triggers, our sanitization strategy can achieve ε near-optimality, provided that the number of clean interactions is O(D(1-γ)4ε2), where γ is the discount factor and D is the dimension of the state space . Empirically, we show that our sanitization defense performs well in two Atari game environments.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/5e67e6a814526079ad8505bf6d926fb6-Paper-Conference.pdf

ff13bd605f88712997007c9f778163d7.png

61、Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social Text Classification

Karish Grover, SM Phaneendra Angara, Md Shad Akhtar, Tanmoy Chakraborty

Social media has become the fulcrum of all forms of communication. Classifying social text, such as fake news, rumors, satire, etc., has attracted great attention. Surface signals expressed by social text itself may not be sufficient for these tasks; thus, recent approaches attempt to incorporate other intrinsic signals such as user behavior and underlying graph structure. Often, public wisdom expressed through comments/replies to social texts acts as a surrogate for crowdsourced perspectives, possibly providing us with complementary signals. State-of-the-art approaches to social text classification often ignore this rich hierarchical signal. Here, we propose Hyphen, a discourse-aware hyperspheric co-attention network. Hyphen is a fusion of hyperspherical graph representation learning and a novel Fourier co-attention mechanism, aiming to generalize social text classification tasks by incorporating public discourse. We parse public utterances into Abstract Meaning Representation (AMR) graphs, and use a powerful hyperspherical geometric representation to model graphs with hierarchical structures. Finally, we equip it with a novel Fourier co-attention mechanism to capture the correlation between source posts and public discourse. For four different social text classification tasks (i.e., detecting fake news, hate speech, rumors, and sarcasm), extensive experiments demonstrate that Hyphen generalizes well and achieves state-of-the-art results on ten benchmark datasets. We also use sentence-based fact-checking and annotation datasets to evaluate how Hyphen produces explanations of evidence similar to final predictions.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/3d57795f0e263aa69577f1bbceade46b-Paper-Conference.pdf

915dae52a23fd1f8cd22bb6bb9676ce9.png

62、QUARK: Controllable Text Generation with Reinforced Unlearning

Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi

Large-scale language models often learn behaviors that are inconsistent with user expectations. The generated text may contain offensive or toxic language, contain a lot of repetition, or have a different sentiment than the user expects. We consider unraveling these inconsistencies by fine-tuning the signals of bad behavior. We introduce the Quantized Reward Konditioning (Quark) algorithm, an algorithm for optimizing reward functions that quantifies (un)desired properties while not deviating too far from the original model. Quark alternates between the following three steps: (i) collecting samples using the current language model, (ii) quantizing them according to reward, each quantization interval is determined by a reward token placed in front of the language model's input Recognition, (iii) Conditionalize the standard language model loss on samples from each quantization interval, while keeping close to the original language model via the KL divergence penalty. By conditioning on high reward tokens at generation time, the text generated by the model will exhibit fewer undesirable properties. For disarming toxicity, negative sentiment, and repetitiveness, our experiments show that Quark outperforms strong baselines and state-of-the-art reinforcement learning methods while relying only on standard language modeling primitives.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/b125999bde7e80910cbdbd323087df8f-Paper-Conference.pdf

eb05d81c01b7d2ffaaa9a8b811ee9c1c.png

63、Random Normalization Aggregation for Adversarial Defense

Minjing Dong, Xinghao Chen, Yunhe Wang, Chang Xu

Vulnerabilities of deep neural networks have been widely found in various models and tasks, where even slight perturbations to the input can lead to wrong predictions. These perturbed inputs are called adversarial examples, and one interesting property of them is adversarial transferability, the ability of adversarial examples to fool other models. Traditionally, such transferability has always been considered as an important threat to defense against adversarial attacks, however, we argue that exploiting adversarial transferability from a new perspective can significantly improve the robustness of networks. In this work, we first discuss the impact of different popular normalization layers on adversarial transferability, and then provide empirical evidence and theoretical analysis to clarify the relationship between normalization types and transferability. Based on our theoretical analysis, we propose a simple yet effective module called Random Normalization Aggregation (RNA), which replaces the batch normalization layer in the network and aggregates different selection normalization types to form a huge random space. Specifically, a path is randomly selected during each inference so that the network itself can be viewed as an ensemble of various different models. Since the entire random space is designed to have low adversarial transferability, it is difficult to carry out effective attacks even if the network parameters are accessible. We conduct extensive experiments on various models and datasets and demonstrate the strong superiority of the proposed algorithm. The PyTorch code is available at https://github.com/UniSerj/Random-Norm-Aggregation and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/RNA.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/da3d4d2e9b37f78ec3e7d0428c9b819a-Paper-Conference.pdf

98ff8a9702d387f261494a1bbdbd108e.png

64、Rethinking and Improving Robustness of Convolutional Neural Networks: a Shapley Value-based Approach in Frequency Domain

Yiting Chen, Qibing Ren, Junchi Yan

The presence of adversarial examples raises concerns about the robustness of convolutional neural networks (CNNs), where a popular hypothesis is about the phenomenon of frequency bias: CNNs rely more on high-frequency components (HFCs) than humans for classification, which lead to the vulnerability of CNN. However, most of the previous works manually select and roughly segment the image spectrum and perform qualitative analysis. In this work, we introduce the Shapley value, a cooperative game-theoretic metric, into the frequency domain and propose methods to quantify the positive (negative) impact of each frequency component in the data on CNNs. Based on the Shapley values, we quantify the impact in a fine-grained manner and demonstrate interesting instance differences. Statistically, we study adversarial training (AT) and adversarial attacks in the frequency domain. The observations motivate our in-depth analysis and lead to multiple new hypotheses, including: i) the reason for the adversarial robustness of the AT model; ii) the fairness issue of AT among different classes in the same dataset; iii) ) attack deviation for different frequency components. Finally, we propose a Shapley value-guided data augmentation technique for improving the robustness of CNNs. Experimental results on image classification benchmarks demonstrate its effectiveness.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/022abe84083d235f7572ca5cba24c51c-Paper-Conference.pdf

3199fc59fedc239de02ddaa2784bdef0.png

65、Rethinking the Reverse-engineering of Trojan Triggers

Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma

Deep neural networks are vulnerable to Trojan horse (or backdoor) attacks. Reverse engineering methods can reconstruct the triggers and thus identify affected models. Existing reverse engineering methods only consider the constraints of the input space, such as the flip-flop size in the input space. In particular, they assume triggers to be static patterns in the input space and fail to detect models with feature-space triggers, such as image style transformations. We observe that Trojans in both the input space and the feature space are related to the feature space hyperplane. Based on this observation, we design a novel reverse engineering method to reverse engineer Trojan triggers using feature space constraints. Results on four datasets and seven different attacks show that our solution is effective against Trojan horses in both input space and feature space. It outperforms existing reverse engineering methods and other types of defenses in detecting and mitigating Trojan models. On average, our method achieves a detection accuracy of 93%. For trojan mitigation, our method can reduce the attack success rate to only 0.26%, while the benign accuracy rate is almost unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/FeatureRE.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/3f9bf45ea04c98ad7cb857f951f499e2-Paper-Conference.pdf

2ac38ac4e422cd225e2ad493230a94a6.png

66、Revisiting Injective Attacks on Recommender Systems

Haoyang LI, Shimin DI, Lei Chen

Recent studies have shown that recommender systems (RecSys) are vulnerable to injection attacks. Attackers can inject fake users with well-designed behaviors on the open platform, and the recommendation system will recommend target items to more real users to make a profit. In this paper, we first revisit existing attackers and reveal that they suffer from agnostic difficulty and insufficient diversity. Existing attackers have reduced the effectiveness of their attacks by focusing on hard-to-attack users with low propensity for targeted items. Furthermore, they cannot influence the target RecSys to recommend target items to real users in a diverse manner through the fake user behavior generated by the dominant large community. To alleviate these two problems, we propose a difficulty- and diversity-aware attacker, namely DADA. We design difficulty-aware and diversity-aware goals so that vulnerable users from different communities can contribute more weights when optimizing the attacker. By combining these two objectives, the proposed attacker DADA can focus on attacking vulnerable users while also affecting a wider range of real users, thus improving the effectiveness of the attack. Extensive experiments on three real datasets demonstrate the effectiveness of our proposed attacker.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/c1bb0e3b062f0a443f2cc8a4ec4bb30d-Paper-Conference.pdf

b953931adad4b8690858717f9fd136de.png

67、Robust Feature-Level Adversaries are Interpretability Tools

Stephen Casper, Max Nadeau, Dylan Hadfield-Menell, Gabriel Kreiman

The literature on adversarial attacks in computer vision usually focuses on pixel-level perturbations. These perturbations are often difficult to explain. Recent work by manipulating the latent representations of image generators to create "feature-level" adversarial perturbations provides us with the opportunity to explore perceptible, explainable adversarial attacks. We make three contributions. First, we observe that feature-level attacks provide useful classes of input for studying representations in models. Second, we show that these adversaries are uniquely versatile and highly powerful. We demonstrate that they can be used to generate targeted, generic, camouflaged, realistic, and black-box attacks at ImageNet scale. Third, we show how these adversarial images can be used as practical interpretability tools for identifying vulnerabilities in networks. We use these adversaries to predict spurious associations between features and classes, and then test these predictions by devising a "copy/paste" attack that leads to targeted misclassification. Our results demonstrate that feature-level attacks are a promising approach for in-depth interpretive research. They support the design of tools to better understand what models have learned and diagnose fragile feature associations. Code is available at https://github.com/thestephencasper/featureleveladv.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/d616a353c711f11c722e3f28d2d9e956-Paper-Conference.pdf

3fc2c0d883e1ae7f625db559d14eeba1.png

68、SALSA: Attacking Lattice Cryptography with Transformers

Emily Wenger, Mingjie Chen, Francois Charton, Kristin E. Lauter

Currently deployed public-key encryption systems will face attacks from full-scale quantum computers. As a result, "quantum-resistant" encryption systems are in high demand, and lattice-based encryption systems based on a difficult problem known as "learning error" have emerged as strong contenders for standardization. In this work, we train Transformers to perform modular arithmetic and mix semi-trained models and statistical cryptanalysis techniques to propose SALSA: a machine learning attack based on LWE encryption schemes. SALSA can fully recover sparse binary secrets for small to medium-sized LWE instances, potentially extending to attacks on practical LWE encryption systems.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/e28b3369186459f57c94a9ec9137fac9-Paper-Conference.pdf

2ff353e1dba55201d2df9a322b772f37.png

69、Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch

Hossein Souri, Liam Fowl, Rama Chellappa, Micah Goldblum, Tom Goldstein

As the sifting of machine learning data becomes more and more automated, dataset tampering becomes a growing threat. Backdoor attackers tamper with training data to embed vulnerabilities in models trained on that data. This vulnerability is then activated at inference time by putting a "trigger" into the model's input. A typical backdoor attack inserts triggers directly into the training data, although the presence of such an attack may be visible upon inspection. In contrast, trigger-hidden backdoor attacks enable poisoning without placing triggers directly in the training data. However, this hidden trigger attack cannot poison a neural network trained from scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model retraining in its fabrication. Sleeper Agent is the first hidden-trigger backdoor attack effective on neural networks trained from scratch. We demonstrate its effectiveness in ImageNet and black-box settings. Our implementation code can be found at: https://github.com/hsouri/Sleeper-Agent.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/79eec295a3cd5785e18c61383e7c996b-Paper-Conference.pdf

5f0aeae2a09f25af5f4df927057e75f3.png

70、The Privacy Onion Effect: Memorization is Relative

Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Machine learning models have been shown to leak private data when trained on private datasets. Recent studies have found that average data points are rarely leaked, usually outlier samples are affected by memory and leaks. We demonstrate and analyze the onion effect of memory: removing the "layer" of outliers most vulnerable to privacy attacks exposes a new layer in which previously secure points are subject to the same attack. We performed several experiments that were consistent with this hypothesis. For example, we show that for membership inference attacks, when the most vulnerable layer is removed, another layer below becomes vulnerable. The existence of this effect has various consequences. For example, it shows that memory-proof proposals without strict privacy-preserving training are unlikely to be effective. Furthermore, it shows that privacy-enhancing techniques such as machine cancellation learning can actually compromise the privacy of other users.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/564b5f8289ba846ebc498417e834c253-Paper-Conference.pdf

4ad4d85f90728a3c2d7030e95a1be256.png

71、Toward Efficient Robust Training against Union of $\ell_p$ Threat Models

Gaurang Sriramanan, Maharshi Gor, Soheil Feizi

The extreme vulnerability of deep neural networks to carefully crafted perturbation attacks, known as adversarial attacks, has led to the development of various training techniques to produce robust models. While the main focus of existing approaches is to address worst-case performance under a single threat model, it is critical to ensure that safety-critical systems are robust against multiple threat models. Existing methods for worst-case performance of these threat models ($\ell_{\infty}$, $\ell_2$, $\ell_1$ union) either exploit adversarial training methods requiring multi-step attacks, which in practice are computationally expensive, or rely on fine-tuning of pre-trained models that are robust to a single threat model. In this work, we show that by carefully choosing the objective function for training for robustness, similar or improved worst-case performance can be achieved using only a single-step attack such that the computational resources required for training are significantly reduce. Furthermore, previous work has shown that adversarial training against $\ell_1$ threat models is relatively difficult, to the extent that even multi-step adversarially trained models are vulnerable to gradient masking. However, when applied exclusively to the $\ell_1$ threat model, the proposed method enables us to obtain the first $\ell_1$ robust model with only a single-step adversary. Finally, to demonstrate the advantages of our approach, we exploit a modern set of attack evaluations to better estimate worst-case performance under the union of considered threat models.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/a627b9468c319c13a70b7c2fb8df65a3-Paper-Conference.pdf

2e3a1564916040541467b6705e8ef9f9.png

72、Towards Lightweight Black-Box Attack Against Deep Neural Networks

Chenghao Sun, Yonggang Zhang, Wan Chaoqun, Qizhou Wang, Ya Li, Tongliang Liu, Bo Han, Xinmei Tian

Black-box attacks can generate adversarial examples that have no access to target model parameters, thus greatly exacerbating the threat to deployed deep neural networks (DNNs). However, previous studies have shown that black-box attacks cannot mislead the target model when training data and outputs are inaccessible. In this work, we argue that black-box attacks can constitute practical attacks in extremely restrictive situations where only a few test samples are available. Specifically, we show that attacking shallow layers of DNNs trained on few test examples can generate powerful adversarial examples. Since only a small number of samples are required, we refer to these attacks as lightweight black-box attacks. A major challenge in generalizing lightweight attacks is mitigating the adverse effects caused by shallow approximation errors. Since there are only a small number of samples available, it is difficult to mitigate the approximation error, so we propose Error TransFormer (ETF) for lightweight attacks. That is, ETF transforms the approximation error in the parameter space into a perturbation in the feature space, and mitigates the error by perturbing the features. In experiments, lightweight black-box attacks using the proposed ETF achieve surprising results. For example, even if only 1 sample per class is available, the attack success rate of the lightweight black-box attack is only about 3% lower than that of the full training data.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/7a9745f251508a053425a256490b0665-Paper-Conference.pdf

ade9f66996b75accd5636c47e34f0a9f.png

73、Training with More Confidence: Mitigating Injected and Natural Backdoors During Training

Zhenting Wang, Hailun Ding, Juan Zhai, Shiqing Ma

Backdoor or Trojan attacks pose a serious threat to deep neural networks (DNNs). The researchers found that even DNNs trained on benign data and settings can learn backdoor behavior, known as natural backdoors. Existing anti-backdoor learning works are based on the weak observation that backdoors and benign behaviors can be distinguished during training. An adaptive attack with chronic poisoning can bypass this defense. Furthermore, these methods do not protect against natural backdoors. We find a fundamental difference between backdoor-related neurons and benign neurons: backdoor-related neurons form a hyperplane over the input domain of all affected labels as a classification surface. By further analyzing the training process and model architecture, we find that a piecewise linear function leads to this hyperplanar surface. In this paper, we design a new training method that forces training to avoid generating such hyperplanes, thus eliminating the injected backdoor. We conduct extensive experiments on five datasets against five state-of-the-art attacks and benign training, showing that our method can outperform existing state-of-the-art defenses. On average, the ASR (Attack Success Rate) of the model trained with NONE is 54.83 times lower under the standard poisoned backdoor attack than the unprotected model, and 1.75 times lower than the unprotected model under the natural backdoor attack. Our code is available at https://github.com/RU-System-Software-and-Security/NONE.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/ec0c9ca85b4ea49c7ebfb503cf55f2ae-Paper-Conference.pdf

0976c20c29a844ecb61b80cc2507b0d1.png

74、Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork

Haotao Wang, Junyuan Hong, Aston Zhang, Jiayu Zhou, Zhangyang Wang

Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. Previous studies have shown that removing bad backdoor behavior from a network is extremely challenging because the entire network can be affected by a backdoor sample. In this paper, we propose a novel backdoor defense strategy that makes it easier to remove the harmful effects of backdoor samples from the model. Our defense strategy "decoy and replacement" consists of two phases. In the first stage, we decoy and capture the backdoor in a small and easily replaceable subnetwork. Specifically, we add an auxiliary image reconstruction head on top of the cadre network shared with the lightweight classification head. The role of this head is to encourage the cadre network to retain enough low-level visual features that are difficult to learn but semantically correct, rather than overfitting to easy-to-learn but semantically incorrect backdoor correlations. Thus, when trained on a backdoor dataset, the backdoor can easily be tricked into the unprotected classification head, since it is more vulnerable than the shared cadre, leaving the cadre network with little contamination. In the second stage, we retrain the untainted lightweight classification head from scratch using a small holdout dataset containing only clean samples, while fixing the cadre network to replace the contaminated lightweight classification head. Therefore, both the cadre and the classification head in the final network are hardly affected by the backdoor training samples. We evaluate our method on ten different backdoor attacks. Our method outperforms the previous state-of-the-art methods by $3.14\%$, $1.80\%$ and $1.21\%$ in clean classification accuracy on CIFAR10, GTSRB and ImageNet-12, respectively, and the attack success rate is as high as $20.57\% $, $9.80\%$, and $13.72\%$. Code is available at https://github.com/VITA-Group/Trap-and-Replace-Backdoor-Defense.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/ea06e6e9e80f1c3d382317fff67041ac-Paper-Conference.pdf

17ba018870a72f6985b3da6a07e345a6.png

75、TwiBot-22: Towards Graph-Based Twitter Bot Detection

Shangbin Feng, Zhaoxuan Tan, Herun Wan, Ningnan Wang, Zilong Chen, Binchi Zhang, Qinghua Zheng, Wenqian Zhang, Zhenyu Lei, Shujie Yang, Xinshun Feng, Qingyue Zhang, Hongrui Wang, Yuhan Liu, Yuyang Bai, Heng Wang, Zijian Cai , Yanbo Wang, Lijing Zheng, Zihan Ma, Jundong Li, Minnan Luo

Twitter bot detection has become an increasingly important task in combating misinformation, facilitating social media curation, and protecting the integrity of online discourse. State-of-the-art bot detection methods typically exploit the graph structure of Twitter networks and show promising performance in the face of novel Twitter bots that cannot be detected by traditional methods. However, very few existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset size, incomplete graph structure, and low-quality annotations. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark to address these issues has severely hampered the development and evaluation of novel graph-based bot detection methods. In this paper, we present TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that provides the largest dataset to date, providing diverse entities and relationships on the Twitter network, and outperforms Existing datasets have better annotation quality. Furthermore, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets including TwiBot-22 to facilitate unbiased comparison of model performance and comprehensive understanding of research progress. To facilitate further research, we integrate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers can evaluate new models and datasets consistently. The TwiBot-22 Twitter bot detection benchmark and evaluation framework is publicly available at \url{https://twibot22.github.io/}.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/e4fd610b1d77699a02df07ae97de992a-Paper-Datasets_and_Benchmarks.pdf

30d02bbef4b112eb89403644d5313b28.png

76、VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models

Patrick O'Reilly, Andreas Bugler, Keshav Bhandari, Max Morrison, Bryan Pardo

As governments and businesses adopt deep learning systems to collect and analyze user-generated audio data, concerns about security and privacy naturally arise in areas such as automatic speaker recognition. While audio adversarial examples provide a way to mislead or evade these invasive systems, they are usually crafted through time-consuming offline optimization, limiting their usefulness in streaming settings. Inspired by the architecture of audio-to-audio tasks such as audio denoising and speech enhancement, we propose a neural network model capable of adversarial modification of a user's audio stream in real time. Our model learns to apply time-varying finite impulse response (FIR) filters to emitted audio, allowing efficient and unobtrusive perturbations at small fixed delays suitable for streaming tasks. We demonstrate that our model is highly effective at removing user speech from speaker recognition and is able to transfer to unseen recognition systems. We conduct perceptual studies and find that our method produces significantly less perceptible perturbations than baseline anonymization methods while controlling for effectiveness. Finally, we provide a model implementation capable of running in real-time on a single CPU thread. Audio samples and code can be found at https://interactiveaudiolab.github.io/project/voiceblock.html.

Paper link: https://proceedings.neurips.cc/paper_files/paper/2022/file/c204d12afa0175285e5aac65188808b4-Paper-Conference.pdf

Guess you like

Origin blog.csdn.net/riusksk/article/details/131629891