Common AI security risks (data poisoning, backdoor attacks, adversarial sample attacks, model theft attacks, etc.)


Insert image description here

Data Poisoning

Data poisoning is a method of deceiving machine learning models by planting malicious samples in training data or modifying the data. This attack is designed to cause the model to produce incorrect results in future predictions or decisions. An attacker may plant data with misleading labels or features to distort the model's learning process, causing the model to deviate from the representation of the real data. Data poisoning attacks may go unnoticed during model training, but their effects may become apparent when the model is deployed and run.

Backdoor Attacks

A backdoor attack is a way of inserting a backdoor or hidden functionality into the model training process. These backdoors may be triggered for specific inputs, causing the model to behave unexpectedly when encountering those specific markers or inputs. The purpose of a backdoor attack is to manipulate predictions or decisions under specific circumstances when the model performs normally, which may lead to security risks or privacy leaks.

[Note]The similarities and differences between backdoor attacks and data poisoning attacks:

  • Same point:
    • All occur in the model'straining phase.
  • difference:
    • Data poisoning: The main purpose is to make the generalization performance of the model worse, that is, the effect on the test set becomes worse, and the model cannot learn effectively, or even cannot convergence.
    • Backdoor attack: The purpose is to make the model learn the content specified by the attacker, and its response tonormal samples It still has good test results, but for poisoned samples, the label preset by the attacker will be output.

Adversarial Examples

Adversarial sample attacks are performed by making small but targeted modifications to the input data, causing the machine learning model to produce misclassified or mispredicted samples. These small changes are barely noticeable to human observation, but are enough to cause the model to make incorrect inferences. Adversarial example attacks target the robustness and stability of the model, maintaining accuracy even in the face of small perturbations.

Model Extraction Attacks

A model stealing attack is an attack against a machine learning model that is designed to rebuild or copy from it or Potential business advantages.

References

  • A review of backdoor attacks in deep learning, Du Wei, Liu Gongshen, 2022 Journal of Information Security

Guess you like

Origin blog.csdn.net/m0_38068876/article/details/134689215