Federated forgetting learning

Regarding the relative concepts of learning and forgetting, my personal understanding is that it is like tomatoes and scrambled eggs. Data that has a great impact on the model is like tomatoes and eggs. It can be seen with the naked eye that it has a great influence on the color of the whole dish. At the same time, whether the tomato is green or a ripe tomato also has a very important impact on the taste of the dish; while the data that does not have a great impact on the model is like the small green onions that are put in and stir-fried. It almost disappears when fried, and it only slightly affects the taste. This is learning, mixing scattered vegetables together and stir-frying them into a chaotic dish. Forgetting is like picking out the green onions you don't like in a plate of scrambled eggs with tomatoes. You pick out the visible green onions, but the taste of the green onions is no longer available after you finish copying them. The tomato scrambled eggs are completely eliminated from this plate. Even if the picky tongue cannot detect the slightest hint of onion, it is still there after all. It is different from the tomato scrambled eggs without any onion. The current forgetful learning feeling is to pursue this plate of scrambled eggs with tomatoes that has the green onions removed but still tolerates the residual onion smell, and pursues this residual green onion flavor that is not strong. Another pursuit is to do it as quickly as possible Pick out the green onions. This is probably my current understanding of learning and forgetting. Maybe I will look back at this understanding in the future and disagree with it, so I will make a short note.

Federated forgetting learning is a relatively new research direction, and the study of information leakage in federated learning is an even new direction.

Machine forgetting learning

This is a recent attempt to realize "forgotten power" in the context of federated learning. This "forgotten power" was first proposed in machine learning, that is, machine forgetting.

What is machine forgetting? The definition of machine forgetfulness becomes an abstraction of the problem we need to solve. The definition of machine forgetting in "machine unlearning" is: the distribution of models after forgetting d after the model trained using the data set M+d is different from the distribution of the model trained directly using M. To distinguish is to forget successfully. Because of the randomness of model training, and because of the same problem of considering the distribution of the model.

Differences from Differential Privacy :
Pure differential privacy provides a way to protect the privacy of individual samples in a data set so that an online estimate of the amount of information about any particular sample can be obtained.
In layman's terms, differential privacy means that the data contributes to the training of the model, but some perturbations are added so that the attacker cannot easily restore the original data. But what machine forgetting needs to do is to make the contribution of the data to the model directly to 0.
Now there is also forgetting based on differential privacy

What is differential privacy, see my other blog, just a simple understanding and notes

The background and significance of machine forgetfulness/federal forgetfulness

Machine forgetting is an emerging field of machine learning. The ultimate goal is to eliminate the influence of a specific subset of training samples for a training model, that is, to eliminate the influence of the "forget set". An ideal forgetting algorithm would eliminate the influence of certain samples while retaining other beneficial properties, such as accuracy on the rest of the training set and generalization to retained samples.

For machine learning, including federated forgetting learning extended to federated learning, the reasons why they are now widely studied:

  • Privacy: The “right to be forgotten” requirements in recent legislation enacted by the General Data Protection Regulation (GDPR) 1 and the California Consumer Privacy Act (CCPA) 2 .
  • Security and usability: For machine learning, including federated learning, timely forgetting some training data that is no longer valid is beneficial to the training of the entire model. In federated learning, there are situations where the data is contaminated by attacks or modified by control, which is wrong. The data leads to model prediction errors 3 4 5 , and there is also some information that becomes obsolete over time and the data should be forgotten. Forgetting some data or even one client can protect the security of the model and protect the privacy of all clients. An attacker may control some clients to strategically adjust the training data to make the global model rise or fall with a special model6 . In this way, the attacker can obtain some information that is beneficial to his attack, which poses a great threat to the security of the entire model.
  • Accuracy. Some models may have biases against certain types of data, and these biases may be caused not by input but by learning.

·Joint forgetting. Is joint forgetting necessary, or is natural forgetting sufficient to forget data that leaves the participant?
·Forget about validation. Do we need more sophisticated methods or will something as simple as examining the performance of a global model on leaving participant data be sufficient to measure and validate forgetting effects?
·A practical choice. What is the most effective combination(s) of forgetting and verifying methods that effectively forgets and clearly verifies while causing minimal negative impact on the original task?

However, there are several main reasons why the model is forgotten rather than retrained, including computing power issues, time-consuming issues, and whether it is easy to restore the training process again. If the forgetting algorithm is so complex that it exceeds the cost of retraining, then the meaning of forgetting is lost.

Machine forgetting/federated forgetting framework

The main design requirements that the framework needs to meet include:

Completeness: The forgotten learning model and the retrained model make the same prediction (whether correct or incorrect) for any possible data sample.

Timeliness: The essential reason for using forgetting learning is that retraining costs too much time, so the relationship between completeness and timeliness needs to be measured.

Accuracy: When the retraining model is not easy to implement and the forgetting learning and retraining models cannot be compared for completeness, the accuracy of the forgetting learning model can be tested on a new test set to improve the accuracy of the forgetting learning model and the retraining model. quite.

Lightweight: In order to achieve the purpose of forgetting, some forgetting algorithms need to record some historical information, such as storing model checkpoints, historical model updates, training data and other temporary data. A good forgetting learning algorithm should be lightweight and adaptable to big data. Forgetting time, storage cost and computational cost should be minimized as much as possible.

Provable guarantee: Some algorithms use approximate forgetting algorithms. Whether this approximation is guaranteed needs to be proven. For example, some technologies use bounded approximate forgetting algorithms.

Verifiability: The purpose of forgetting learning is to make the model forget some sample information. Whether these samples are really forgotten needs to be verified, so a good forgetting framework needs to provide a verification mechanism.

Generality: The forgetting process should be general across different learning algorithms and machine learning models.

There is some overlap between forgetting indicators and verification indicators. The biggest difference between them is that the former can be used for optimization or providing bounded guarantees, while the latter is only used for evaluation.

The input of the forgetting learning algorithm is: a pre-trained model, and one or more samples in the forgetting set. Then based on the model, forgetting set, and retention set, the forgetting algorithm generates an updated model. The updated model generated by the ideal forgetting algorithm is no different from the model directly trained without the forgetting set.

One of the baselines used for comparison is training from scratch, comparing time, and comparing the difference between the two models (maybe accuracy, or other factors)

Moreover, it is impractical to simply train the remaining data from scratch, because in practical applications, the cost (time, computing, energy, etc.) is very expensive. , especially when federated learning involves multiple rounds of alternating training and aggregation among multiple participants.

Insert image description here

Several updates to deterministic deletion currently exist, including linear models, certain classes of lazy learning, recursive support vector machines, and co-occurrence-based collaborative filtering. There are also ideas related to data privacy that do not lead to effective data deletion, but rather to making data private or non-identifiable.
The challenge of data deletion only arises in the presence of computational constraints, or without any, but privacy also poses statistical challenges. And there is a direct connection and result between data deletion and data privacy security.

The paper defines what an accurate data deletion operation is: after deleting a training point from the model, the model should act as if it had never seen the training point in the first place.

Model-independent forgetting and model-fixed forgetting

Studying the difference between model-independent forgetting and fixed-model forgetting may provide certain ideas for studying federated forgetting, which is a kind of forgetting for federated learning.

Existing research on machine forgetting

[1] Making AI Forget You: Data Deletion in Machine Learning(2019)7

The machine forgetting problem proposes a solution that allows a trained machine learning model to forget the data to be removed. The problem of removing a single data point from a machine learning model
without restarting the training data from scratch using the remaining data . The proposed learning, called deletion-efficient learning, is based on the intuitive and actionable concept of what it means to remove data from a statistical model. Treating data deletion as an online problem, the concept of optimal deletion efficiency arises from the natural lower bound of the staging computation time.


Insert image description here

[2] Machine Unlearning(2021)8

The machine forgetting problem proposes a solution that allows the trained machine learning model to forget the data to be removed. The approximate algorithm of machine forgetting learning involves data set splitting or partitioning. This method is not easy to apply to federated forgetting learning.

Taking advantage of sharding and slicing during training

Federation forgotten

federated learning

There are two main subjects for federated learning. One is the central server CC.C , one party isMMM clientsSi ∈ { S 1 , … , SM } S_i \in \{S_1,\ldots , S_M\}Si{ S1,,SM} . The central server is in communication roundtttfromMM __Choose mm to participate this time among M clientsm clients, convert the global modelG t G^tGt is decentralized to each participating client, and each client performs local training based on its own local data, and updates the updated model parametersΔ G it \Delta G_i^tΔG _itUploaded to the central server, the central server averages the model parameter updates (FedAvg) to generate a new global model G t + 1 = G t + 1 m ∑ i = 1 m Δ G it G^{t + 1}= G^ t + \frac{1}{m}\sum_{i = 1}^m \Delta G_i^tGt+1=Gt+m1i=1mΔG _it, and then select client delegation again until the model converges to a satisfactory level.
Insert image description here

Forgetting learning in federated learning

The definition of machine forgetting is: if the distribution of models after forgetting d is indistinguishable from the distribution of models trained using data set M+d, it is considered successful forgetting. .

In the context of federated learning, the implementation of forgetting is essentially the effect of removing a part of the data from all the data of all participating clients, but the situation is more complicated. (1) What may need to be deleted is the effect of all the data of an entire client. That is, the impact of deleting this client, (2) may only be the impact of deleting a part of the data of a certain client, (3) may be the impact of deleting a certain data, although it may be distributed among many participating clients. In terms of implementation, the first case is relatively easy to implement, and it is also the case that is mainly studied in existing research.

Like machine forgetting, what we ultimately hope to achieve is that the distribution of the model trained directly using the remaining data is indistinguishable from the distribution of the forgotten model after forgetting.

Differences in Forgetters in Federated Forgetting Learning

Insert image description here

Because the participants are different from machine learning, there is only one controller formed by machine learning leading model training, and forgetting should be completed by him. However, in federated learning, who leads the implementation of forgetting is also an issue that needs to be considered, because different forgetters can obtain different information and control data, so the implementation methods and functions that can be achieved are different.

The server acts as the forgetter : what is forgotten is all the data of a certain client. In this case, part of the data of a certain client cannot be forgotten because the server does not know what the specific data of the client is.

The remaining clients or all clients serve as forgetters : what is forgotten is part of the data of a certain client, but the server is required to roll back the global model to the previous checkpoint before the target client is first selected, and all clients are required to participate retraining process. The target client either exits before training (corresponding to the remaining clients as forgetters), or uses the remaining data to participate in the retraining process (corresponds to all clients as forgetters)

The target client acts as the forgetter : the target client can directly access the data that needs to be deleted, and then the target client sends the forgotten local updates to the server for aggregation. The erasure effect of local forgetting cannot be compared with global retraining. Faye Wong’s idea is to let the target client act as a forgetter to affect the global model.

Using attack success rate to evaluate the effect of forgetting learning

Not all papers use attack algorithms as tools to evaluate forgetting effects, but like the one mentioned below “2021 FedEraser”, as well as “2022 Tilt Model + Knowledge Distillation with Backdoor Attack Verification Method” and “2022 Target Client Pre-Launch Assistance With Backdoor Attack" these three papers all use the attack success rate as an evaluation indicator of the forgetting learning effect. The attack algorithms used are member inference attack and backdoor attack respectively.

The relationship between federated learning, forgetting learning and attack algorithms can be represented by the figure below. The lower the points represent the lower level and the more application scenarios.
Insert image description here

Where do these two attacks sit on the attack tree?
Insert image description here

How these two attacks are used as tools to evaluate the effect of forgetting is detailed in my other blog

Existing Research on Federal Oblivion

2021 FedEraser (approximate algorithm)

Insert image description here

●●[3] FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models(2021)9

The paper provides the source code: https://www.dropbox.com/s/1lhx962axovbbom/FedEraser-Code.zip?dl=0

Discovered the unique challenges of forgetting learning in federated learning, and designed a new machine forgetting mechanism in federated learning environment, assuming that all data to be removed belongs to the same client

Although FL comes in different forms, most existing work mainly focuses on horizontal FL or sample-based FL, where multiple clients’ datasets share the same feature space but different sample spaces. In contrast, vertical FL or feature-based FL is suitable for scenarios where datasets from different domains share the same sample space but different feature spaces. This article focuses on level FL .

FedEraser[3] (2021) was proposed as the first attempt at approximate learning in the context of federated learning. It uses a calibration technique to isolate the customer's individual contribution as much as possible. The prerequisite is that the server must store the parameter update history of each client . While this assumption is not unreasonable in practice, it can take up a lot of storage space on the server if more than a few hundred clients participate in a FL session. FedEraser is simply a retraining method that relies on additional communication between server and client, where all client participants adjust their historical updates using their historical data sets. ( The server must store each client’s parameter update history, requiring the client to retrain the model )

The basic idea is to exchange the storage of the central server for the construction time of the forgotten learning model, and use the client historical parameter updates retained in the central server during the federated learning training process to reconstruct the forgotten learning model.

Inspired by the fact that client updates dictate the need to change the parameters of the global model to fit the model to the training data, the paper further calibrates the retained client updates by performing only a few rounds of calibration training to approximate the updates without the target client. direction, and unlearned models can be quickly constructed using calibrated updates.

The forgetters are the remaining clients or all clients , and all clients are required to participate in the retraining process, just like regular federated learning, with local fine-tuning or calibration if necessary. The starting point for relearning is the position to roll back based on historical update information stored by the central server.

FedEraser process:
(1) Calibration training (on the calibration client, that is, other clients except the target client to be forgotten): let other clients except the forgotten client roll back to the last center The global model location stored in the server, the model in memory is transferred to each client (except the forgotten client), and E cali E_{cali } is retrained on these clients.Ecaliiterations, update the generated model U ^ kctj \widehat{U}_{k_c}^{t_j}U kctj, where kc k_ckcRepresents a client performing calibration
(2) Update calibration (central server): Pass the client back to the central server's update U ^ kctj \widehat{U}_{k_c}^{t_j}U kctjUpdate U kctj U_{k_c}^{t_j} to parameters originally stored in the serverUkctjMake modifications, and the modified update
U ~ kctj = ∣ U kctj ∣ U ^ kctj ∥ U ^ kctj ∥ \widetilde{U}_{k_c}^{t_j} = | U_{k_c}^{t_j} | \frac{ \widehat{U}_{k_c}^{t_j}}{\| \widehat{U}_{k_c}^{t_j} \|}U kctj=UkctjU kctjU kctj
(3) Calibration update aggregation (central server): Updates of all clients making corrections U ~ tj = { U ~ kctj ∣ kc ∈ [ 1 , 2 , … , K ] ∖ ku } \widetilde{U}^{t_j} = \{ \widetilde{U}_{k_c}^{t_j} | k_c \in [1, 2, \ldots, K] \setminus k_u \}U tj={ U kctjkc[1,2,,K]ku} , calculate the weighted average of these updates
u ~ tj = 1 ( K − 1 ) ∑ kcwkc ∑ kcwkc U ~ kctj \widetilde{\LARGE u}^{t_j} = \frac{1}{(K - 1) \ sum_{k_c} w_{k_c}} \sum_{k_c} w_{k_c} \widetilde{U}_{k_c}^{t_j}u tj=(K1)kcwkc1kcwkcU kctj
Where, wkc = N kc ∑ kc N kc w_{k_c} = \frac{N_{k_c}}{\sum_{k_c} N_{k_c}}wkc=kcNkcNkc N k c N_{k_c} Nkcis the client kc k_ckcNumber of records owned

Does this mean that a client with more records may be affected more times and therefore need to be modified more?

(4) Unlearned model update (central server): FedEraser updates the global federated learning model to
M ~ tj + 1 = M ~ tj + u ~ tj \widetilde{\mathcal{M}}^{t_{j+1 }} = \widetilde{\mathcal{M}}^{t_{j}} + \widetilde{\LARGE u}^{t_j}M tj+1=M tj+u tj
M ~ t j + 1 \widetilde{\mathcal{M}}^{t_{j+1}} M tj+1is the updated model, M ~ tj \widetilde{\mathcal{M}}^{t_{j}}M tjis the current model

The pseudo code is as follows
Insert image description here

Some assumptions or shortcomings:
choose to directly forget the entire client data instead of selectively forgetting part of the data

The data set used in the experimental part:
Adult (Census Income): has 14 attributes such as age, gender, education, marital status, occupation, working hours and nationality. The classification task is to predict a person's annual income.

Purchase: Contains the shopping history of thousands of shoppers within one year, including product name, chain store, quantity, purchase date and other fields. The goal is to design an accurate coupon promotion strategy.

MNIST: A dataset of 70,000 handwritten digits formatted as 32×32 single-channel images and normalized.

CIFAR-10: Consisting of 60,000 color images of size 32 × 32, is a balanced dataset with 6,000 randomly selected images per category.

Use different global models to perform classification tasks, as shown in the following table
Insert image description here

  • The evaluation indicators of the experiment
    are: accuracy,
    prediction loss
    , forgetting time, and time
    to evaluate whether the forgotten model still contains information about the target customer. The following three additional indicators are used. The
    prediction difference between the original global model and the forgetting learning model is expressed as the prediction probability difference. The L2 norm of
    Insert image description here
    the member inference attack against forgetful learning is used to measure the information still retained in the global model of forgetful learning, and to measure the degree of privacy leakage of the target client. MIA attack precision against target data attack precision: target customer data prediction participates in
    the global Proportion of model training
    MIA attack recall attack recall : Indicates the part of the target client data that can be correctly inferred, and the probability that a positive sample is found.

  • Baseline methods for the experiment:
    FedRetain: After removing the target customer data, train the model from scratch on the remaining data, providing an upper bound on the predictive performance of the FedEraser reconstructed forgetful learning model FedAccum: Calibrate the client's
    local model parameters by direct accumulation of previously retained Update, and use accumulated updates to update the global model. The difference from FedEraser is that FedAccum does not calibrate the client's update
    FedAvg: no forgetting problem, direct federated learning builds a global model

Summarize:

Basic idea :
Optimization: use the storage space of the central server in exchange for the construction time of the forgetful learning model
Guarantee: use the new calibration method to calibrate the retained updates and speed up the reconstruction of the forgetful learning model

Summary goal : improve rebuild time, speed up

Evaluation indicators : Following the evaluation criteria of machine forgetting learning - accuracy and prediction loss function , the forgetting time of the global model to forget the target client , whether it still contains the target client's information - the prediction difference between the original model and the forgotten model , MIA attack accuracy (The MIA goal is to determine whether the given data is used to model a given ML model), MIA attack recall rate

Why use MIA attack results as one of the evaluation indicators to judge how much information about the data is left in the model? I think you can use my other blog: Evaluation methods in forgetting learning: Member inference attack MIA & backdoor attack A paragraph about thinking about aggression as a criterion for forgetting is here.

  • What is the practical significance of using attack methods to test the degree of forgetting learning?
    The use of attack methods for testing in forgetting learning is carried out under an ideal state that is beneficial to the attacker. Is this situation realistic? Personally, I feel that this is different from studying defense strategies when facing attacks. Defense strategies are based on the degree of attack, while attacks have actual background, so the defense strategy can be considered moderate/lightweight. However, the attack in forgetting learning is a tool to test the degree of information legacy. The attack itself has no practical significance in this process. The real scenario of forgetting is about the issue of "privacy" and "dirty data". What is considered is forgetting. The cleaner the better, so perhaps one should consider infinitely beneficial to the attacker as a way to observe as much oblivion as possible .

In general, it is to make the attack as strong as possible to test whether there is not much information left in the forgetting model. But there are thousands of stronger attack methods. Why can MIA attacks be used as this criterion? Let’s not talk about whether this attack method is easy to use. Let’s first talk about whether this method can be used. Just like a charging cable, we don’t care about charging. Whether the speed is fast or not depends on whether the socket can be plugged in first.

Then you need to look at the research on the basic issues of the attack in the MIA method:
Assumptions of the attack premise:
1. Assume that the machine learning algorithm is used to train a classification model that captures the relationship between the content of the data record and its label
2. Hypothetical attack The attacker has query access to the model and can obtain the prediction vector of the model on any data record.
3. Assume that the attacker knows the input and output formats of the model, including their number and the range of values ​​they can take.
4. Assume that the attacker either ( 1) Know the type and architecture of the machine learning model and the training algorithm , or (2) have black box access to the machine learning Oracle to train the model, in which case the attacker has no prior knowledge of the model's structure or meta-parameters

As a tool for studying the forgetting effect, the attack obviously does not need to consider the actual situation, whether the attacker can steal the same or related data sets, or whether the attacker can know the model architecture and its parameters. These things will be assumed to be fully known in the process of verifying the forgetting effect, because in the forgetting verification stage, although we call this method an "attack" method, we actually only use this idea as a "verification" method. Therefore, when using MIA as a forgetting evaluation index, the training data and model architecture of the target model are known to increase the intensity of the attack and evaluate the degree of forgetting.


2022 Rapid Retraining

Insert image description here

●●[4] The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining(2022)11

Discovered the unique challenges of forgetting learning in federated learning, and designed a new machine forgetting mechanism in federated learning environment. It is assumed that after deleting the data requested to be deleted, the data is retrained from scratch, but this is computationally quite expensive because the model needs to be trained from scratch, and it is impractical to have the same users participate in the training again.
Liu et al. proposed a fast retraining method [4] (2022) to retrain the global model on the remaining data set by using a first-order Taylor expansion to approximate the loss function. This method relies on the participation of all clients. This algorithm can simply be viewed as a fast local training algorithm and not necessarily an approximate algorithm for federated learning . The method used is to find a better descent method by utilizing gradient information and curvature information.

  • Evaluation indicators:
    1. Efficiency of the algorithm, accelerated running time
    of the retraining algorithm 2. Performance achieved by a given model compared to the performance of the baseline model, symmetric
    absolute percentage error (difference in comparison output), accuracy/precision 3 , parameter sensitivity

2022 Tilt model + knowledge distillation with backdoor attack verification method (approximate algorithm)

Insert image description here

●●[5] Federated Unlearning with Knowledge(2022)12

A new mechanism for machine forgetting in a federated learning environment is designed, assuming that all data to be removed belongs to the same client.
Similarly, Wu et al. [5] (2022) also require the server to store the update history of each client. However, instead of requiring the client to retrain the model like FedEraser, Wu et al.'s method requires the server to subtract all historical average updates of the target client from the final global model to obtain a skewed learning model and then use the knowledge Distillation is used to train this skewed learning model, using the original global model as a teacher model on an outsourced unlabeled dataset. This method requires sampling synthetic data with the same distribution across the entire data set , and the accuracy of this sampling process may be negatively affected by non-iid (non-independent and identically distributed) data distributions rather than iid (non-independent and identically distributed). Distribution) Data distribution is usually assumed in federated learning. (The server is required to subtract all historical average updates of the target client from the final global model, but the assumption in federated learning that the data distribution is non-independent and identically distributed does not apply)

The accuracy of this method is not much different from the accuracy of retraining the model and the accuracy of the backdoor attack, and there is no comparison in terms of time.

The client's contribution is eliminated by subtracting accumulated historical updates from the model, and a knowledge distillation approach is utilized to restore the model's performance without using any client's data. This method does not rely on client involvement . This method does not have any restrictions on the type of neural network and does not rely on the participation of the client, so it is practical and efficient in FL systems. We further introduce backdoor attacks during training to help evaluate the forgetting effect.

However, although the paper states that this forgetting algorithm is proposed because retraining consumes a lot of time and energy, the paper does not compare the training times of the two algorithms. It only mentions that all participating clients should be trained according to the previous training time. Training in the order is not practical, and from the paper, because the algorithm does not rely on client participation, the time and experience cost saved may be from the communication between the central server and the client and not allowing the client to participate in local training. The cost of time brought by no communication and no local training is that the local models generated during the training process will be stored in the central server. Instead of saving the computing cost of the algorithm itself, it also consumes the storage cost of the central server. .

  • Evaluation indicators:
    1. Evaluate the forgetting effect through the success rate of backdoor attacks
    2. Prediction accuracy

2022 Target client launches pre-launch assistance with backdoor attack

Insert image description here

●●[6] Federated Unlearning: How to Efficiently Erase a Client in FL?(2022)13

Designed a new mechanism for machine forgetting in a federated learning environment.
Halimi et al. [6] (2022) do not require the server to store parameter updates of the client, and only rely on the target client wishing to exit . The target client performs projected gradient ascent to train a global model in order to maximize the empirical loss on its local data before deletion. The average of the remaining customer models is used as a reference model to measure the quality of forgetting.

  • Evaluation indicators: 1. The backdoor attack
    proposed in reference [5] above is also used , and the backdoor accuracy is defined as "the percentage of triggering (poisoned) data that is misclassified as the target label required for the attack", backdoor data 2. Accuracy of clean data

2023 The forgetting problem of asynchronous federated learning

Insert image description here

●●Asynchronous Federated Unlearning(2023)10

The background of this paper considers the forgetting problem in asynchronous federated learning, a non-approximate algorithm for rapid retraining. Based on the characteristics of asynchronous federated learning, the paper divides clients into different clusters based on training time and data similarity. The paper proposes the KNOT cluster optimization algorithm. In this way, even if a client needs to be forgotten, only the cluster where the client is located can be retrained. This will not affect the training of other clusters, and can speed up the retraining because the number of participating clients is small.

Innovation points: 1. Consider forgetting in asynchronous federated learning . 2. Cluster aggregation reduces retraining time overhead. It is a new cluster aggregation mechanism customized for asynchronous federated learning. Using asynchronous federated learning, clients can be divided into clusters and can only be aggregated within the cluster. This will cause data erasure. Retraining can be limited to each cluster. 3. Convert the client clustering problem into an optimizable integer dictionary minimization problem and use a linear programming solver to solve it.

Summary of assumptions/premises of the above paper

1. A bunch of data that needs to be forgotten does not necessarily belong to the same client , but may be distributed on different clients. [3] [5] Assuming on the same client reduces the difficulty, because such an assumption is equivalent to eliminating the historical contribution of a specific client to global model training.
2. When deleting data shared between several clients , the impact of deleting one data on multiple clients' respective local models is not considered, which in turn affects the global model.
3. Sharing data between several clients, It may be that only the shared data of a certain client needs to be deleted , and others do not need to be deleted.

[4] After deleting the data requested to be deleted, retraining the data from scratch is too expensive. Therefore, Faye Wong believes that the only feasible way to perform federated forgetting learning in practice is to use an approximation algorithm. However, due to the particularity of federated learning, traditional The approximate algorithm in machine forgetting cannot be directly applied to federated forgetting. The main problem is how to separate the impact of certain data after server aggregation.

[3] Assumption:
The data requested to be deleted is located on the same client.
The server stores the parameter update history of each client (reasonable or unreasonable).
Consider horizontal federated learning: the sample spaces considered are completely different, but they are not considered. There may be data shared by several clients, and there will be various problems when deleting these shared data.

[4] Assumption:
All clients participate in training again and start from scratch

[5] Assumption:
The data requested to be deleted is located on the same client.
Assume that the data sets have the same distribution (often federated learning data sets are non-independent and identically distributed).
Assume that a certain client directly exits the training altogether. (In reality, the client's data is often removed. part of the data but not all), the server will remove all its historical average updates from the global model

[6] Assumption:
It is assumed that the impact of a data on all clients that own it will be deleted, and then its impact on the global model will be deleted.


  1. P. Voigt and A. Bussche, The Eu General Data Protection Regulation(GDPR): A Practical Guide. Springer, 2017. ↩︎

  2. E. Harding, J. J. Vanto, R. Clark, L. H. Ji, and S. C. Ainsworth, “Understanding the scope and impact of the California Consumer Privacy Act of 2018,” Journal of Data Protection & Privacy, 2019 ↩︎

  3. C. Xie, K. Huang, P.-Y. Chen, and B. Li, “DBA: Distributed backdoor attacks against federated learning,” in Proceedings of ICLR, 2020 ↩︎

  4. E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in Proceedings of AISTATS, 2020, pp. 2938–2948. ↩︎

  5. C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings,” in Proceedings of RAID, 2020, pp. 301–316. ↩︎

  6. X. Xu, J. Wu, M. Yang, et al. 2020. Information leakage by model weights on federated learning. In Proceedings of the Workshop on Privacy-Preserving Machine Learning in Practice. 31–36 ↩︎

  7. A. Ginart, M. Guan, G. Valiant, and J. Y. Zou, “Making AI Forget You: Data Deletion in Machine Learning,” in Proc. 33rd Conference on Neural Information Processing Systems (NeurIPS), vol. 32, 2019. ↩︎

  8. L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine Unlearning,” in Proc. 42nd IEEE Symposium on Security and Privacy (S&P), 2021, pp. 141-159. ↩︎

  9. G. Liu, X. Ma, Y. Yang, C. Wang, and J. Liu, “FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models,” in Proc. IEEE/ACM International Symposium on Quality of Service (IWQoS), pp. 1-10, 2021. ↩︎

  10. Ningxin Su, Baochun Li. “Asynchronous Federated Unlearning,” in the Proceedings of IEEE
    International Conference on Computer Communications (INFOCOM), New York, USA, May 17
    – 20, 2023 ↩︎ ↩︎

  11. Y. Liu, L. Xu, X. Yuan, C. Wang, and B. Li, “The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining,” in Proc. IEEE International Conference on Computer Communications (INFOCOM), pp. 1749-1758, 2022. ↩︎

  12. C. Wu, S. Zhu, and P. Mitra, “Federated Unlearning with Knowledge Distillation,” arXiv preprint arXiv:2201.09441, 2022. ↩︎

  13. A. Halimi, S. Kadhe, A. Rawat, and N. Baracaldo, “Federated Unlearning: How to Efficiently Erase a Client in FL?” Workshop on Updatable Machine Learning (UpML), 2022. ↩︎

Guess you like

Origin blog.csdn.net/x_fengmo/article/details/131812100