[Computer Science] [2018.05] Describe the limitations and defenses of machine learning in a confrontational environment

Insert picture description here

This article is a Pennsylvania State University: PhD thesis (Author Nicolas Papernot), a total of 177.

In recent years, the development of machine learning (ML) has made many dizzying applications possible, such as object recognition, autonomous systems, security diagnosis, and the game of Go. Machine learning is not only a new paradigm for building software and systems, but it is also bringing massive social disruption. People are increasingly aware that ML exposes new vulnerabilities in software systems, but the nature and understanding of these vulnerabilities in the technical community is still limited.

This article mainly studies the integrity of the ML model. Completeness refers to the authenticity of model predictions relative to expected results. This feature is the core of traditional machine learning evaluation, such as the metrics that are common among practitioners (such as accuracy). Most ML technologies are designed for a benign execution environment. However, the presence of an adversary may force a mismatch between the distributions on which the model is trained and tested, thereby invalidating these basic assumptions. With the increasing dependence of ML in transportation or energy applications and decision-making, the resulting model is becoming the target of opponents, and they have a strong motivation to force ML to make prediction errors.

I explored the attack space against ML integrity during testing. Given full or limited access to the training model, I devised a strategy to modify the test data to create a worst-case drift between the training and test distributions. The significance of this part of the research is that an adversary with very weak access to the system and little knowledge of the ML technology deployed can launch a powerful attack on such systems as long as they have the ability to interact with it as a prophet: namely , Send the input selected by the opponent and observe the ML prediction. This systematic explanation of poor generalization of ML models shows that when the model's predictions are far away from its training data, there is a lack of reliable confidence estimates. Therefore, in order to improve the model's robustness to these adversarial manipulations, I try to reduce the confidence of predictions far away from the training distribution. According to the progress of attacks under the black box threat model, the limitations of two kinds of defenses are first determined: defense distillation and adversarial training.

Then, it describes recent defensive campaigns against these shortcomings. To this end, the Deep k-nearest neighbor classifier is introduced, which enhances the deep neural network by performing integrity checks during testing. This method compares the internal representation generated by the deep neural network on the test data with the internal representation learned on its training points. At each level of the deep neural network, the inconsistency of prediction is estimated through the marks of training points.

The application of conformal prediction methods paves the way for more reliable estimation of the model's prediction credibility, that is, the degree of support for prediction by the training data. In turn, we distinguish between legitimate test data with high credibility and adversarial data with low credibility. This research calls for future research to focus on the robustness of deep neural networks at all levels, rather than treating the model as a black box. This fits well with the modular nature of deep neural networks, which coordinate simple calculations to simulate complex functions. This also allows us to connect with other fields, such as interpretability in ML, which tries to answer the question: "How can we provide explanations for human model predictions?" Another by-product of this research direction is that I can improve Identify the vulnerabilities of the ML model well. These vulnerabilities are the result of the ML algorithm and can be explained by data.

Advances in machine learning (ML) in recent years have enabled a dizzying array of applications such as object recognition, autonomous systems, security diagnostics, and playing the game of Go. Machine learning is not only a new paradigm for building software and systems, it is bringing social disruption at scale. There is growing recognition that ML exposes new vulnerabilities in software systems, yet the technical community’s understanding of the nature and extent of these vulnerabilities remains limited. In this thesis, I focus my study on the integrity of ML models. Integrity refers here to the faithfulness of model predictions with respect to an expected outcome. This property is at the core of traditional machine learning evaluation, as demonstrated by the pervasiveness of metrics such as accuracy among practitioners. A large fraction of ML techniques were designed for benign execution environments. Yet, the presence of adversaries may invalidate some of these underlying assumptions by forcing a mismatch between the distributions on which the model is trained and tested. As ML is increasingly applied and being relied on for decision-making in critical applications like transportation or energy, the models produced are becoming a target for adversaries who have a strong incentive to force ML to mispredict. I explore the space of attacks against ML integrity at test time. Given full or limited access to a trained model, I devise strategies that modify the test data to create a worst-case drift between the training and test distributions. The implications of this part of my research is that an adversary with very weak access to a system, and little knowledge about the ML techniques it deploys, can nevertheless mount powerful attacks against such systems as long as she has the capability of interacting with it as an oracle: i.e., send inputs of the adversary’s choice and observe the ML prediction. This systematic exposition of the poor generalization of ML models indicates the lack of reliable confidence estimates when the model is making predictions far from its training data. Hence, my e↵orts to increase the robustness of models to these adversarial manipulations strive to decrease the confidence of predictions made far from the training distribution. Informed by my progress on attacks operating in the black-box threat model, I first identify limitations to two defenses: defensive distillation and adversarial training.

I then describe recent defensive e↵orts addressing these shortcomings. To this end, I introduce the Deep k-Nearest Neighbors classifier, which augments deep neural networks with an integrity check at test time. The approach compares internal representations produced by the deep neural network on test data with the ones learned on its training points. Using the labels of training points whose representations neighbor the test input across the deep neural network’s layers, I estimate the nonconformity of the prediction with respect to the model’s training data. An application of conformal prediction methodology then paves the way for more reliable estimates of the model’s prediction credibility, i.e., how well the prediction is supported by training data. In turn, we distinguish legitimate test data with high credibility from adversarial data with low credibility. This research calls for future e↵orts to investigate the robustness of individual layers of deep neural networks rather than treating the model as a black-box. This aligns well with the modular nature of deep neural networks, which orchestrate simple computations to model complex functions. This also allows us to draw connections to other areas like interpretability in ML, which seeks to answer the question of: “How can we provide an explanation for the model prediction to a human?” Another by-product of this research direction is that I better distinguish vulnerabilities of ML models that are a consequence of the ML algorithms from those that can be explained by artifacts in the data.

  1.   引言
    
  2. Fundamental Concepts: Overview of Machine Learning
  3. Security model and literature review
  4. Adversarial example production
  5. Antagonistic paradigm transferability
  6. Practical black box attacks against machine learning
  7. Deep k nearest neighbor algorithm
  8. The direction of security machine learning

For more exciting articles, please pay attention to the public account:Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_42825609/article/details/114026437