Evaluation methods in forgetting learning: member inference attack MIA & backdoor attack

The relationship between member inference attack or backdoor attack and federated forgetting learning

Federated forgetting learning mainly studies the application of forgetting learning in the context of federated learning. Forgetting learning is the final effect of removing specified data from the entire model, that is, forgetting. As far as federated learning is concerned, existing research mainly studies forgetting. The impact of an entire client's data on the global model.
The effect of forgetting requires numerical evaluation indicators to evaluate its degree . There are two main methods of current research: one is to completely exclude the clients that need to be forgotten, use other clients to train from scratch, generate a retraining model, and then Compare the forgetting model that performs forgetting learning with this retraining model. The closer it is, the better the forgetting effect is. The other is to use member inference attack or backdoor attack, and use the effect of the attack to verify whether there are still specified items in the model that need to be deleted. For the data in the client, if the attack effect is good, for example, the data that should have been forgotten is restored through some parameters or intermediate values, then it means that there is still the impact of this/some data residue in the model, and the forgetting effect is not good. If If the attack effect is not good, it means that the forgetting effect is good.
Or for more details on the relationship between machine learning, federated learning and forgetting learning, you can read my other blog

What is the practical significance of using two attack methods to test the degree of forgetting learning?

The use of two attack methods in forgetting learning is carried out under an ideal state that is beneficial to the attacker. Is this situation realistic? Personally, I feel that this is different from studying defense strategies when facing attacks. Defense strategies are based on the degree of attack, while attacks have actual background, so the defense strategy can be considered moderate/lightweight. However, the attack in forgetting learning is a tool to test the degree of information legacy. The attack itself has no practical significance in this process. The real scenario of forgetting is about the issue of "privacy" and "dirty data". What is considered is forgetting. The cleaner the better, so perhaps one should consider infinitely beneficial to the attacker as a way to observe as much oblivion as possible.
The MIA attack effect is used as the evaluation criterion in forgetting learning, that is, the residual degree of data information in the model is expressed as the possibility that the MIA attack can be inferred

The premise that a certain attack can be used as an evaluation indicator

There are tens of millions of very powerful attacks. Why can a certain attack be used as an evaluation index for the degree of forgetfulness in forgetting learning? This brings us to the question of whether it can be used and whether it is easy to use. I have summarized it into several aspects. It may not be useful to use them together. A perfect example for comparison - a power bank with a data cable.
Can it be used :
1. Can the values ​​needed for the attack and the results of forgetting learning be provided (can the charging port of the data cable match the mobile phone? The Micro USB charging port cannot be charged with type-C) 2.
Attack Can the output result be used as an evaluation of the forgetting effect (can it be used to charge a mobile phone? Maybe the data cable can only be used for data transmission and cannot conduct electricity). Is it useful? 3. The impact of attack intensity on the evaluation of the
forgetting effect
. (Some charging ports may be slow charging, and some may be fast charging)
4. Can the result of the attack be completely used as an evaluation of the forgetting effect (can the power bank charge the mobile phone to 100% power? If the mobile phone is 1000mA, but the power bank It is 500mA, so the power is not enough to fully charge it)
Insert image description here

The position of the two attacks on the attack tree

Insert image description here
The classification here is based on Zhihu articles , but I didn’t find the reference for this article.

Why are two attack methods used as tools to evaluate the effect of forgetting? And looking at the attack tree, these two attacks are also located in different categories of the attack tree. This issue begins with the practical significance of forgetting.

The practical significance of forgetting is as follows:
Privacy : Regarding the requirements for the “right to be forgotten” mentioned in the recent legislative enactment of the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
Security and usability : For machine learning, including federated learning, timely forgetting some training data that is no longer valid is beneficial to the training of the entire model. In federated learning, there are situations where the data is contaminated by attacks or modified by control, which is wrong. The data causes the model to predict errors.

These two requirements correspond to the use of two attacks. For the user's "right to be forgotten" requirement, member inference attacks are used as a tool; for model security and usage considerations, backdoor attacks are used as a tool.

Member inference attack MIA

MIA Document: "Membership Inference Attacks Against Machine Learning Models" (2017 S&P) The
video URL posted by the author on YouTube: https://www.youtube.com/watch?v=rDm1n2gceJY It’s
too laborious to watch in English, bit by bit Translation is too troublesome, so you can read this translation blog by Baicai Miao :
Insert image description here

The paper that uses the accuracy and recall rate of member inference attack MIA as the forgetting effect evaluation index is "FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models" (2021 IWQOS). The residual degree of data information in the model is expressed as Possibility of inferring MIA attacks

The purpose of member inference attack

Application : In experiments, the precision and recall of MIA on target customer data are used to evaluate how much information about these data is still included in the forgetting learning model. Such attacks serve as one of the best ways to measure the quality of forgetting.

The characteristics or functions of MIA determine what it can be used for

MIA is used to study how machine learning models leak information about each data record they were trained on. This is different from the depth gradient leakage problem in federated learning that I studied before. Deep gradient leakage attacks are passed through the federated learning training process. Training gradients inversely infer the training data, while MIA is a black-box access to the model given a data record and determines whether the record exists in the model's training data set.

The basic question that MIA focuses on: given a machine learning model and data, determine whether that data should be used as part of the model training dataset. The attacker's access to the model is limited to black-box queries that return the model output for a given input. The attack is successful if the attacker correctly determines whether the data belongs to the training dataset.

Principle : Adversarially use machine learning and train our own inference model ( attack model ) to identify the gap between the target model's predictions on the input it was trained on and the input it was not trained on, that is, the model's predictions on the data it has seen and the input it has not been trained on. Unseen data reacts differently. This is easy to observe in daily experiments. When we train a model, we usually divide a part of the data that does not intersect with the training set as a test set. Often the accuracy of the model on the test set is not as good as on the training set. Accuracy, this situation is called over-fitting phenomenon . Regardless of whether we have adjusted the over-fitting phenomenon, as long as the accuracy of the model reaches a certain level, over-fitting exists objectively, and MIA attacks use this The principle of overfitting .

Types of models that MIA focuses on : Models created using supervised learning. Training records (as input to the model) are assigned labels or scores (as output of the model). The goal of creating a model is to learn the relationship between data and labels and create a A model that can generalize to data records outside the training set. The goal of the model training algorithm is to minimize the prediction error of the model on the training data set, so it may overfit the data set, which is an issue that needs to be considered (with regularization techniques).

The trained attack model essentially distinguishes the behavior of the target model on the training input from the behavior on the input not encountered during training, and converts the membership inference problem into a classification problem .

Insert image description here

  • When you have a piece of data, that is, data(record, label)/data(x, y), where x is the input information/features of the traditional model, and y is the label of the data, use the record/x in this piece of data to perform the model Black box query, input record/x into the target model, you can get a prediction vector prediction , each value in the prediction vector represents the model's prediction confidence for which category this data belongs to, for example [A:0.2, B :0.3, C:0.5], which add up to 1.
  • After obtaining the prediction vector prediction, use it together with the real label label/y as the input of the attack model (prediction, label/y). The attack model will output a result, whether this data belongs to the training set of the target model target model . .

This is what this paper wants to achieve.

Assumptions:
1. Assume that a machine learning algorithm is used to train a classification model that captures the relationship between the content of a data record and its labels 2. Assume that an
attacker has query access to the model and can obtain the model on any data record 3. It is
assumed that the attacker knows the input and output formats of the model, including their number and the range of values ​​they can take. 4. It is
assumed that the attacker either (1) knows the type and architecture of the machine learning model and the training algorithm, or (2) ) has black box access to the machine learning Oracle to train the model, in which case the attacker cannot know the structure or meta-parameters of the model in advance

Measures of attack accuracy:
Precision: How many records inferred as members are actually members of the training data set
Recall: How many records in the training data set were correctly inferred by the attacker as members

Construction of shadow model

Shadow training technique : The main idea is that similar models trained on relatively similar data records using the same service behave similarly. First, create multiple "shadow models" that imitate the behavior of the target model, but only if you know the training data set and understand the membership relationships in the data set. The attack model is then trained based on the labeled inputs and outputs of the shadow model

I have doubts , why is inferring the degree of information legacy of the data in the model using MIA using black box query? It is obvious that the parameters of the model can be directly known and the degree of information legacy is inferred through the parameters? Maybe you can read review papers
. Maybe it can achieve a high success rate in a very difficult mode. Then it can achieve better results in a relatively easy situation. The assumption of this black box model is set difficult situation

Why does this shadow technology perform so well?

Because the preset premise is a black-box query, and the algorithm, structure, hyperparameters, etc. of the model are not known. Therefore, I do not consider allowing a model to completely imitate the original model, but instead disassemble the model into several parts. Have many models, each of which only imitates part of the original model. How to dismantle it into several parts? Disassemble it according to the number of categories of the classifier. If the model is a 10-category classifier, then use 10 shadow models to imitate the behavior of each category of the original model.

In other words, the meaning of the shadow model is to imitate the target model, just like imitating a person. The more actions you imitate, the more the behavior will be like that person. In the same way, the more shadow models there are, the better the attack effect will be.

According to my understanding at the beginning of the blog, if you want the attack to be strong, or even very strong, there are some assumptions that need to be met. (1) Assume that the data set used to train the shadow model and the private data used to train the target model The sets do not intersect. For the attacker, this situation is the worst. If the two data sets happen to completely intersect, the attack effect is the best. In FedEraser, this attack effect will be used when verifying the forgetting effect. of the best hypotheses, trained on the data of the original global model.

Based on the principle that if there is no data, my attack will be strong, and with data, my attack will be stronger. When generating the shadow model, it is necessary to make the training shadow data disjoint with the data set of the training target model (no data). ), but to generate training data similar to the target model training data distribution, several methods are given in the paper: model-based synthesis, statistics-based synthesis and noisy real data.

How do you get the shadow model? This is something I didn't expect. The author said this. After the above three methods obtained training shadow data similar to the target model training data, he also used these data (the second half of this paragraph explains what "also" means ) Upload it to the Google platform and let it perform the same type of classification tasks as the target model. This involves the application background mentioned in this article that is not mentioned above : users upload their data to the Google platform and ask it to help generate a model for classification tasks. After the Google platform trains the model, the user can This model is used through the API, but the architecture and parameters of the entire model cannot be known, and it cannot be downloaded. It can only be used. In this case, if an attacker wants to obtain the data of this model, how should he attack. In such an application context, if the attacker obtains a data set similar to the target model and uploads it to the Google platform for classification tasks, the Google platform has no reason to provide a model with a different architecture from the target model, because they The data distribution is similar, the tasks are the same, and the same platform is used, so the response of the platform should be similar. Therefore, similar data and the same platform can produce similar models. The focus is still on how to obtain similar data mentioned in the previous paragraph.

However, I think the operation of the shadow model is a bit overkill for using MIA's accuracy and recall as forgetting evaluation indicators, because this operation increases the inconsistency between the shadow model and the target model, and training the attack model based on the shadow model Obviously, it is better to train the attack model based on the target model. The shadow model cannot be completely equivalent to the target model in the end. This concept is based on the attacker's ignorance of the model architecture in real situations. When used as an evaluation indicator, this kind of ignorance does not exist, and the attack should be made as strong as possible. Therefore, when used as an evaluation indicator, the target model should be directly used as a shadow model to train the attack model. FedEraser does this. .

Construction of attack model

First of all, it must be clear that the content of attack model learning is not based on data, but based on the behavior of the model.

When the target model data is completely unknown , use the model-based synthesis, statistics-based synthesis and noisy real data methods mentioned above to train the attack model data. When the target model architecture
is completely unknown , the shadow model method mentioned above can be used to simulate the target model behavior.

The above two situations are done with a blind eye. They can only approximate the real data and the real model architecture, but there are errors after all. This point of view has been explained in the previous section.

However, when using the accuracy and recall rate of the member inference attack as evaluation indicators of the forgetting algorithm , we hope that the attack will be the strongest. Only in this way can we know more comprehensively the degree of forgetfulness of our forgetting algorithm. Therefore, we will assume that the member inference attack knows With training data and knowledge of the model architecture , there is no need to use simulation-generated data and shadow models (or in other words, there is no need to train the shadow model, and the target model is directly used as a shadow model to train the attack model) .

Construction process:
1. Divide the data set into two disjoint data sets D train D^{train}Dt r ainD test D^{test}Dt es t , each item of data is expressed as( x , y ) (\boldsymbol{x}, y)(x,y) x \boldsymbol{x} x is represented as data feature,yyy represents the label of the data. The data in train is the data used for training the target model, that is, the data labeled in when training the attack model, and the opposite is true for test.
2. Put the data(x, y) ∈ D train (\boldsymbol{x}, y) \in D^{train}(x,y)Dt r ain input toshadow model/target modelffIn f , getthe prediction vectory = f ( x ) \boldsymbol{y} = f(\boldsymbol{x})y=f ( x ) , will record( y , y , in ) (y, \boldsymbol{y}, in)(y,y,in ) is added to the data setD attack D_{attack}Dattack3.
Put the data (x, y) ∈ D test (\boldsymbol{x}, y) \in D^{test}(x,y)Dt es t input toshadow model/target modelffIn f , getthe prediction vectory = f ( x ) \boldsymbol{y} = f(\boldsymbol{x})y=f ( x ) , will record( y , y , out ) (y, \boldsymbol{y}, out)(y,y,o u t ) is added to the data setD attack D_{attack}Dattack
Insert image description here
4. According to the data set D attack D_{attack}DattackTraining attack model
Insert image description here

Attack model function : via ( y , y ) (y,\boldsymbol{y})(y,y ), output classificationin/out in/outin / o u t . It can also be seen that the attack model learnsyin \boldsymbol{y}_{in}yin y o u t \boldsymbol{y}_{out} youtThe difference does not depend on xxWhat does x look like, but you can getxxx is in or not in the training data set.
Insert image description here

Use in forgetting learning

Essence : express the residual degree of data information in the model as the possibility that the MIA attack can infer that the data is indeed in the training set

By inputting the forgotten data into the attack model trained using the target model , the model will speculate whether the data belongs to the training data set , and determine the degree of forgetfulness of the model based on the accuracy and recall rate of the guessed results.

However, because the attack itself has certain inaccuracies, even if the data is still left in the model, and there may be a lot of it, the attack model still cannot guess it, resulting in low accuracy and recall rates, making people mistakenly think that they have forgotten it. The effect is good.

backdoor attack

Backdoor attack literature: "How to Backdoor Federated Learning" (2020 AISTATS)
is too laborious to read in English, and it is too troublesome to translate it bit by bit. You can read this translation blog of Baicai Miao :
Insert image description here

The paper using backdoor attacks as an evaluation indicator of forgetting effects is "Federated Unlearning with Knowledge Distillation" (2022 arXiv)

Backdoor Attack As one of the most powerful attacks on FL systems, backdoor attacks do not affect the performance of the global model under regular inputs and only distort predictions when triggered by specific inputs with backdoor patterns. This property makes it a perfect evaluation method for measuring the effectiveness of forgetting. A successful forgetting global model should perform well on the evaluation dataset but reduce the success rate of backdoor attacks when triggered by backdoor inputs.

Backdoor attacks do not affect the performance of the global model under regular inputs, but only distort predictions when specific inputs are triggered with backdoor patterns. This characteristic makes it an ideal evaluation method for measuring the effect of forgetting.

The attacker single-handedly changes the global model GGReplace G with the model you want it to becomeXXX , directly affects the global model, rather than affecting its aggregation.

The purpose of backdoor attacks

Nature : Backdoor attack is a directed model poisoning attack . Malicious participants can directly affect the model. The attack intensity is stronger than data poisoning (there are experiments to support this conclusion).

  • In federated learning applications, adversarial attacks can be roughly divided into two categories according to different attack targets , namely Untargeted Attacks and Targeted Attacks . The goal of undirected attacks is to corrupt the model so that it cannot achieve optimal performance in its primary task. In targeted attacks (often called backdoor attacks), the adversary's goal is to make the model perform poorly on some specific subtasks while maintaining good overall performance on the main task.
  • Attacks are further divided into two types according to the attacker's capabilities : Model Attack and Data Attack . Data attack means that the attacker can change a subset of all training samples, and this subset is unknown to the model learner. A model attack means that the attacked client changes the update of the local model, thereby changing the global model.

But can any participant replace the federated model with another federated model?

Some things an attacker can do to benefit attack intensity:
1. Can directly affect the weights of the global model
2. Can be trained in any way that is conducive to poisoning
3. Incorporate evasion of potential defenses into the loss function during training

Intent: Let the model make wrong judgments on data with certain characteristics, but the model will not have an impact on the main task . For example, if an attacker intends to label pictures with red cars as birds, the attacker will modify the hijacked client sample label to label pictures with red cars as birds, and then retrain the model. When making predictions, the final model will misjudge the red car as a bird, but it will not affect the judgment of other pictures. The attacker hopes that federated learning produces a global model that converges and shows good accuracy on its primary task, while performing a certain way on specific tasks that the attacker chooses as input to the backdoor.

I don’t quite understand the difference between the main task and the backdoor task. If it is a classification model and some data classification errors are caused by backdoor attacks, doesn’t it mean that there is no way for the main task to show good accuracy?
My understanding is that it is not a "primary" relationship, but a "volume" issue. The general classification is correct, but some two classifications are wrong after modification.

Attack threats :
Previous backdoor attacks only changed the behavior of the model through data poisoning or by directly inserting backdoor components into a fixed model. It is difficult to implement in a federated learning scenario:
(1) When performing aggregation operations on the server side, averaging will occur after Eliminate the influence of the malicious client model to a great extent
(2) Due to the server selection mechanism, there is no guarantee that the client hijacked by the attack will be selected in every round, thus reducing the risk of backdoor attacks.

However, it is easy to use model poisoning in federated learning because (1) the central server cannot ensure whether the participants are malicious, (2) federated learning has no visibility into what users are doing locally, (3) secure aggregation Can prevent anyone from reviewing updates to the model by participants.
Even if there are cases where safe aggregation is not used and participants are also censored, the paper proposes a general constraint and scaling technique that incorporates evasion into the attacker's loss function, which can allow the attacker to avoid complex anomaly detectors.

Construction of attack model

Affected actors can submit a malicious model that does not target the main task but simply implants something else (backdoor functionality).

What the attacker can control on the client:
(1) Control the local training data of any compromised participant (part of the entire client)
(2) Control the local training process and modify hyperparameters such as epoch and learning rate
(3) Before submitting the model The weights can be modified
(4) and rounds of local training can be adaptively changed.

The purpose of the attacker:
(1) The global model must achieve high accuracy on both main tasks and backdoor tasks
(2) If secure aggregation is not applicable, updates submitted by participants controlled by the attacker should not be used by other participants Shown as anomalies in "Update" because any definition of "anomaly" is used by the central server
(3) The global model should maintain high backdoor accuracy for multiple rounds after multiple attacks

Clarify some symbols:

  • Total mmm participating clients, assumingkkthClient k is a hijacked client
  • No. ttT round communication,iiLocal training model for i clients:L it L_i^{t}Lit
  • No. ttGlobal model after t rounds of aggregation: G t = G t − 1 + η n ∑ i = 1 m ( ∇ G it ) = G t − 1 + η n ∑ i = 1 m ( L it − G t − 1 ) G^t = G^{t - 1} + \frac{\eta}{n} \sum_{i = 1}^{m}(\nabla G_i^{t}) = G^{t - 1} + \frac{\eta}{n} \sum_{i = 1}^{m}(L_i^{t} - G^{t - 1})Gt=Gt1+nhi=1m(Git)=Gt1+nhi=1m(LitGt 1 )
    (or the t+1 round is expressed asG t + 1 = G t + η n ∑ i = 1 m ( L it + 1 − G t ) G^{t + 1} = G^{t} + \frac{\eta}{n} \sum_{i = 1}^{m}(L_i^{t + 1} - G^{t})Gt+1=Gt+nhi=1m(Lit+1Gt ))
    (Here is the updated value of the model weight uploaded by the participating clients∇ G it \nabla G_i^{t}Gitinstead of model weights)
  • No. ttThe clientkk who was held hostage in round tThe problematic local model uploaded by k : L ~ kt \tilde{L}_k^tL~kt(It can be understood as adding a little "material" to the original local training model)
  • The global model affected by adding "material" to the client's local model: X = G ~ kt X = \tilde{G}_k^tX=G~kt

Goal : Single-handedly influence the global model.
Attacker known information : Attacker kkk can know more information than you think, except for the global model G t G^tdownloaded each timeGt and the locally trained modelL kt + 1 = G t + η ∇ L L_k^{t + 1} = G^t + \eta \nabla LLkt+1=Gt+In addition to η L , the attacker can also know the sum of the weights of other participating client models. When the global model begins to converge during training, the gradient information uploaded by each client to the central server ∇ G it + 1 = Lit + 1 − G t \nabla G_i^{t + 1} = L_i^{t + 1} - G^{t}Git+1=Lit+1Gt will become very small. The global model parameters of the previous round of training and the next round of training are not much different. The model is knownttGlobal model of round t G t G^{t}Gt almost knows what the global model of the next round will look like. Multiplying the global model by the number of participating clients is the sum of the weights of all clients, and then subtracting its own model parameters is the sum of the model weights of all other clients.

Insert image description here

Affected global model XXX and the hijacked clientkkLocal model L kt + 1 L_k^{t+1}uploaded by kLkt+1The relationship between: (learning rate: η \etaη
X = G t + η n [ ∑ i = 1 k − 1 ( L i t + 1 − G t ) + ( L k t + 1 − G t ) + ∑ i = k + 1 m ( L i t + 1 − G t ) ] X = G^t + \frac{\eta}{n}[\sum_{i = 1}^{k - 1}(L_i^{t + 1} - G^t) + (L_k^{t + 1} - G^t) + \sum_{i = k + 1}^{m}(L_i^{t + 1} - G^t)] X=Gt+nh[i=1k1(Lit+1Gt)+(Lkt+1Gt)+i=k+1m(Lit+1Gt)]

The above formula represents XXHow does X receiveL kt + 1 L_k^{t+1}Lkt+1Influence, the following formula is L kt + 1 L_k^{t+1}Lkt+1How does it affect XXX : (The following formula and " ≈ \approxin the above figureHow ≈ ” is implemented: When the model has begun to converge, each client gives the update/gradient of the global model(L it + 1 − G t ) (L_i^{t + 1} - G^t)(Lit+1Gt )is already negligible, and the general trend of the model has been formed, so ignoring the updates/gradients of each client at this time will not greatly affect the model performance)
L kt + 1 = n η ( X − G t ) − ∑ i = 1 k − 1 ( L it + 1 − G t ) − ∑ i = k + 1 m ( L it + 1 − G t ) + G t ≈ n η ( X − G t ) + G t L_k^{t + 1 } = \frac{n}{\eta}(X - G^t) - \sum_{i = 1}^{k - 1}(L_i^{t + 1} - G^t) - \sum_{i = k + 1}^{m}(L_i^{t + 1} - G^t) + G^t \approx \frac{n}{\eta}(X - G^t) + G^t\nonumberLkt+1=then(XGt)i=1k1(Lit+1Gt)i=k+1m(Lit+1Gt)+Gtthen(XGt)+Gt

In this way, the model mentioned in the paper can be replaced by X → G t + 1 X \to G^{t + 1}XGt+1

How did you get X/What exactly is X? In fact , _ The trained local model replaces the global model . The original pseudo code is as follows:
Insert image description here

It can be seen that the model XX used to replace hereThe training set used by X is the backdoor data set D backdoor D_{backdoor}DbackdoorReplace the original client kkk data setD local D_{local}DlocalD ~ local \tilde {D}_{local}D~local, that is to say, this replacement model XXThe amount of data in the data set used by X is not large.

The work to be done by backdoor attacks can be summed up in one sentence: the global model must achieve high accuracy in both main tasks and backdoor tasks .

Use in forgetting learning

Essence : Convert the degree of data remaining in the model into the possibility of the backdoor being triggered. If the backdoor is triggered, it means that the impact of the backdoor data still remains in the model. If it is not triggered, the impact is eliminated. The better the forgetting effect, the lower the success rate of backdoor attacks.

However, in forgetting learning, backdoor attacks are not simply used as a tool to evaluate the effect of forgetting. The choice of backdoor attacks has a practical application background. If the client is hijacked by the attacker and "pollutes" the model, then this should be deleted. The negative impact of the client on the global model, and we want to forget a client. One of the reasons is that the client is hijacked and poses a threat that affects the global model. Then the data is usually modified, and backdoor attacks are among them. A situation in which a client is hijacked to modify data, and the modification may be as minor as adding a backdoor tag to the data. Before forgetting, the "polluted" model with backdoor information implanted will have a high prediction accuracy for backdoor data. After forgetting, the impact of these backdoor data can be reduced, which is reflected in the numerical value of the accuracy of the backdoor attack. degree decreased. This difference between high and low is the meaning of forgetting.

The use of this attack also reflects a situation that the member inference attack does not take into account, that is, there may be clients with duplicate data, but often one client is hijacked, while the data of other clients is not affected. In this way, the difference between hijacked and non-hijacked data can be distinguished by using the same data with or without a backdoor, which is a similar difference in appearance.

As long as the backdoor is not triggered, the attack accuracy must be 0. Isn't there a generalization problem? Not all attack accuracy in the paper is 0. If the others are not 0, is it because the forgetting effect is not good or is it a data generalization problem?

Guess you like

Origin blog.csdn.net/x_fengmo/article/details/132379709