Differential Privacy and Fairness in Decisions and Learning Tasks: A Survey

1 summary

  • A review of recent achievements in the intersection of differential privacy and fairness
  • Analyzes why differential privacy may exacerbate unfairness in decision problems and learning tasks
  • Measures to Mitigate Inequity in Differentially Private Systems
  • Challenges of deploying privacy models under fair conditions

2 Introduction

The availability of large datasets and computing resources has driven significant advances in artificial intelligence.

These advances have made artificial intelligence an auxiliary tool for many decision-making and policy operations involving individuals: including legal decisions, loans, recruitment, distribution of benefits, and more.

However, machine learning models are black boxes . So people worry, is the system fair? Will participants' personal information be disclosed?

2.1 Differential privacy

Differential privacy is a privacy protection technology, which has gradually become the preferred technology for privacy deployment in recent years.

It can limit the risk of disclosure of sensitive information by individuals participating in the calculation to a certain extent.

However, existing studies have shown that differential privacy can exacerbate inequality among different groups. "Affects one group more than another"

2.2 Past Perspectives

Differential privacy will exacerbate the unfairness between different groups. It is generally believed that the emergence of unfairness is related to the following two reasons:

  1. Differential privacy includes a post-processing process, which will introduce bias and reduce errors
  2. Differential privacy disproportionately affects different groups because datasets are inherently imbalanced

3 fair

In this review, two kinds of fairness are mainly involved: individual fairness and group fairness.

这里主要参见 [The U.S. Census Bureau Adopts Differential Privacy](The U.S. Census Bureau Adopts Differential Privacy | Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining)

3.1 Individual fairness

Definition: Similar individuals should be treated similarly.

For one will input x ∈ X x\in XxX maps toy ∈ Y y \in YyY 's mapping mechanismMMM. _ For any inputx , x ' ∈ X x,x' \in Xx,xX , individual fairness satisfies
dy ( M ( x ) , M ( x ′ ) ) <= dx ( x , x ′ ) (1) d_y(M(x),M(x'))<=d_x(x,x ') \tag{1}dy(M(x),M(x))<=dx(x,x)( 1 )
dx, dy d_x, d_ydx,dyare two individuals x , x ′ x,x'x,x' The distance on the input and output.

When formula (1) holds, mechanism MMM is to satisfydx, dy d_x,d_ydx,dyLipschitz conditions.

An obvious shortcoming of individual fairness is that it is not easy to design the distance measure requirement for a specific problem.

3.2 Group Fairness

Definition: Group fairness requires that certain statistical properties of any protected group of individuals are similar to those of the group as a whole.

Two classic group fairness rules are population parity and equal opportunity.

4 settings

This review mainly focuses on the inequity generated by differential privacy in decision-making tasks and learning tasks.

Inequity in decision-making and learning tasks

4.1 Decision task

Mechanism MMM will enterxxx is mapped to privacy-preserved inputx ~ \widetilde{x}x . "This can be understood as a disturbance data set"

Let the decision-making task be P: X → Y ⊆ RP: X\rightarrow Y \subseteq \mathbb{R}PXYR. _ For example,PPP can be expressed as funds allocated to different school districts.

Mechanism MMM may use post-processingπ K \pi_{\mathcal{K}}PiK, to ensure the non-negativity of the published data.

Let the individual be iii , the resulting deviation isBBB,有
B P i ( M , x ) = E x ~ ∼ M ( x ) [ P i ( x ~ ) ] − P i ( x ) (2) B_{P}^i(M,x)=\mathbb{E}_{\widetilde{x}\sim M(x)}[P_i(\widetilde{x})]-P_i(x) \tag2 BPi(M,x)=Ex M(x)[Pi(x )]Pi(x)( 2 )
The bias here indicates the distance between the expectation of privacy-preserving decision-making and the decision-making based on real data.

4.2 Learning tasks

Let the classifier be MMM , the weight isθ \thetaθ , the weight after differential privacy perturbation is moreθ ~ \widetilde{\theta}i

Input by ( x , a , y ) (x,a,y)(x,a,y ) , wherexxx represents the eigenvector,aaa represents the protection attribute,yyy is the label.

The loss function is L ( θ ; x ) \mathcal{L}(\theta;x)L ( i ;x ) , the privacy risk isR ( θ ; xa ) R(\theta;x_a)R ( i ;xa)

Let
R ( θ ; xa ) = E θ ~ [ L ( θ ~ , xa ) ] − L ( θ ∗ ; xa ) (3) R(\theta;x_a) =\mathbb{E}_{; \widetide{\theta}}[\mathcal{L}(\widetide{\theta},x_a)]-\mathcal{L}(\theta^*;x_a) \tag3R ( i ;xa)=Ei [L(i ,xa)]L ( i;xa)(3)
θ ∗ \theta ^* i represents the optimal weight when the function converges,xa x_axameans xxThe sensitive attribute in x isaaa subset of a .

5 Privacy and Fairness: Friend or Foe

There are currently two views: one view is that fairness and privacy complement each other, and the other view is that fairness and privacy are antagonistic.

5.1 Friends

Dwork believes that individual fairness is a generalization of differential privacy. —— Fairness through awareness

Mahdi et al found that in candidate selection problems, but when the data satisfies some constraints on key attributes of each group (mean, variance, etc. of qualification scores), using exponential mechanisms yields fair selections. —— Improving Fairness and Privacy in Selection Problems

5.2 Enemies

Bagdasaryan et al. found that the output of differentially private classifiers can create or exacerbate unfairness among individuals in a population. ——The US Census Bureau Adopts Differential Privacy

Pujol et al. found a similar phenomenon in the census dataset. —— Fair decision making using privacy-protected data | Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (acm.org)

6 Why Privacy Affects Fairness

6.1 Decision task

For a data publishing mechanism MMM , usually includes the following two steps:

  1. Inject noise into dataset xxIn x , the perturbed data set is calibrated to obtain the published datax ~ \widetilde{x}x . Some attributes are constant positive, and become negative after disturbance, and the attributes need to be calibrated

  2. will post data x ~ \widetilde{x}x As a decision problem PPP 's input.

P ( x ~ ) P(\widetilde{x}) P(x ) have different effects among different groups.

Zhu et al. believe that differential privacy causes unfairness in decision-making tasks, which can be attributed to two main reasons:

  • decision problem PPThe "shape" of P
  • Presence of non-negativity constraints in the postprocessing step

​ —— Post-processing of Differentially Private Data: A Fairness Perspective (arxiv.org)

6.1.1 The shape of the decision problem

Tran et al. found that after calibrating private data with unbiased noise (Laplace et al.) in linearly transformed decision problems, the obtained results were unbiased compared to the ground truth. But when the decision problem uses a nonlinear model, fairness issues arise. —— Differentially Private Empirical Risk Minimization under the Fairness Lens (neurips.cc)

6.1.2 Effects of post-processing

While post-processing can reduce errors, this process can also introduce bias and fairness issues. ——Post -processing of Differentially Private Data: A Fairness Perspective (arxiv.org) and Observations on the Bias of Nonnegative Mechanisms for Differential Privacy (arxiv.org)

The more replicated the postprocessing, the more severe the variance and bias differences will be.

6.2 Learning tasks

Because the classifier MMM for adjacent datasetXXX andX 'X'X' Perfect fairness cannot be achieved, and Cummeling et al. argue that pure fairness cannot be achieved in a privacy model. On the Compatibility of Privacy and Fairness | Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization (acm.org)

Bagdasaryan et al. found that DPSGD disproportionately affects different groups, and they speculate that the size of the protected group will play a key role in exacerbating unfairness in private training. ——The US Census Bureau Adopts Differential Privacy

Farrand et al. argue that the size of the protected group is not a key factor in causing unfairness, because there are also serious fairness problems in datasets with slightly imbalanced sample sizes. They argue that it is the attributes of the training data and the characteristics of the model that lead to unfairness.

—— Differentially Private Empirical Risk Minimization under the Fairness Lens (neurips.cc)A Fairness Analysis on Private Aggregation of Teacher Ensembles (arxiv.org)

6.2.1 Characteristics of training data

The input norm and the distance to the decision boundary are two key features of the data that are associated with exacerbating unfairness in privacy models.

Input Norm: Groups with larger input norms are likely to have larger Hessian losses.

Distance to decision boundary: The distance from the sample to the model boundary is related to the hessian value. Samples close to (far from) the decision boundary are less tolerant (high) to perturbations caused by the differential privacy algorithm

6.2.2 Characteristics of the model

found that DPSGD exacerbates inequity among different groups. DPSGD needs to clip the gradient. When the gradient size generated by individuals exceeds the clipping boundary C, gradient clipping will produce different degrees of loss in these groups, thus punishing the group with larger gradient. —— Removing Disparate Impact on Model Accuracy in Differentially Private Stochastic Gradient Descent | Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

found that groups with larger Hessian losses tend to be more unfairly affected. —— Differentially Private Empirical Risk Minimization under the Fairness Lens (neurips.cc)

7 Mitigating Inequities in Privacy Models

7.1 Decision task

In the context of distributing funds to school districts, Pujol et al. propose a mechanism for allocating additional budget to target groups such that all entities receive at least as much funding as is needed for non-private distributions with high probability. —— Fair decision making using privacy-protected data | Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (acm.org)

Tran et al. propose a proxy problem that approximates the original task but allows bounded fairness. —— Decision Making with Differential Privacy under a Fairness Lens (arxiv.org)

Zhu et al. proposed a solution to alleviate the unfairness brought by post-processing. The authors propose a near-optimal projection operator that, while meeting the feasibility requirements of the assignment problem, provides a substantial impact on unfairness under different measures of fairness. ——Post -processing of Differentially Private Data: A Fairness Perspective (arxiv.org)

7.2 Learning tasks

Xu et al. and Ding et al. applied the function mechanism on Logistic regression and simultaneously realized the fair sum ( ϵ , δ ) − DP (\epsilon,\delta)-DP( ϵ ,d )DP。—— Achieving Differential Privacy and Fairness in Logistic Regression | Companion Proceedings of The 2019 World Wide Web Conference (acm.org)Differentially Private and Fair Classification via Calibrated Functional Mechanism | Proceedings of the AAAI Conference on Artificial Intelligence

associate different clipping boundaries with each protected group to limit the impact of gradient clipping on groups with larger gradients. —— Removing Disparate Impact on Model Accuracy in Differentially Private Stochastic Gradient Descent | Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Zhang et al. point out that the number of training iterations is a key factor in balancing utility, privacy, and fairness. —— Balancing Learning Model Privacy, Fairness, and Accuracy With Early Stopping Criteria | IEEE Journals & Magazine | IEEE Xplore

The above methods are used to protect each piece of data in the dataset.

The question of whether it is necessary to specifically protect group attributes (such as gender or race) is raised by Jagielski et al. ——Differentially Private Fair Learning (mlr.press)

Under this more relaxed privacy setting, Hardt et al. propose a DP version of the post-processing method that uses different decision thresholds for different groups to remove the disproportionate impact of DP on different groups. —— Equality of Opportunity in Supervised Learning (neurips.cc)

Agarwal et al. enhance the loss function to increase the penalty for fairness violations. —— A Reductions Approach to Fair Classification (mlr.press)

The above two approaches introduce a larger privacy budget. To address this pain point, Mozannar et al. employ random responses to protect labels of sensitive groups. —— Fair Learning with Private Demographic Data (mlr.press)

In the field of federated learning, there also exist methods to achieve both fairness and privacy.

Abay et al. propose several pre- and in-processing bias mitigation solutions to improve fairness without compromising data privacy. ——Mitigating Bias in Federated Learning (arxiv.org)

Padala et al. propose to train each client in two stages. Users first train a non-private model, which has as high accuracy as possible while ensuring fairness.

Subsequently, a private model is trained using DPSGD to mimic the first model. The updates obtained by the privacy model are broadcast to the central server at each iteration.

8 Challenges and research directions

8.1 Challenges

(1) There is no unified theoretical framework to describe the fairness problems that arise in general decision-making tasks and analyze their causes.

(2) Fairness is also affected by key hyperparameters. Such as batch size, learning rate, depth of neural network.

8.2 Research direction

(1) Privacy and fairness are linked to model robustness, but an understanding of this link is currently lacking.

(2) Algorithms and generative models that generate private synthetic datasets may have a disproportionate impact. How should these impacts be eliminated?

Guess you like

Origin blog.csdn.net/jiaweilovemingming/article/details/124788999