Tunable Measures for Information Leakage and Applications to Privacy-Utility Tradeoffs

Article Directory

introduction

Various information theory measures are taken as leakage measures.

  • The most important of these is mutual information (MI): [15]-[24]
  • Similarly, divergence-based quantities (such as the total variation distance between the prior distribution and the posterior distribution) [25] have also been proposed as leakage measures.

However, the scope of information theory leakage measures proposed to solve the privacy problem has no clear operational significance or confrontation model in its definition. Recently, information theory formulas have been introduced to capture the privacy of "guessing" opponents.

Here, privacy is measured based on the benefits of the observer guessing private information after observing public data.

Maximum Leakage (MaxL), which quantifies the maximum logarithmic gain with the probability of correctly guessing any function of the original data from the published data [28]. We introduce an adjustable loss function, namely α loss (1≤α≤∞), to capture adversarial behavior. In particular, for α = 1 and α = ∞, the loss function is simplified to log loss (log loss) [32]-[34] and error rate, respectively. The choice of loss function captures the opponent's inference and involves perfecting the posterior belief of one or more sensitive features. Then, the adversarial gains of a computationally unrestricted opponent are the reduction in average (inferential) losses due to data release.

We use the α loss function to derive two new privacy measures, called α leakage and maximum α leakage. Specifically, alpha leakage quantifies the opponent's revenue when inferring specific private attributes in the data set; in contrast, the largest alpha leakage quantifies the opponent's revenue when inferring any attribute of the data set. In particular, the maximum α leakage includes MI and MaxL as special cases of α=1 and α=∞, respectively.

MaxL can be explained in terms of opponents trying to minimize the 0-1 loss function [33], [35] (α=∞), that is, the opponent makes difficult decisions estimators through the maximum likelihood. On the other hand, we show that when MI is used as a leakage metric (α = 1), the potential loss function is logarithmic loss,The model simulates a (soft decision) belief perfect opponent. In addition to the content observed by the adversary (for example, the census data set or information released through the side channel), the adversary can also access other relevant side information (for example, the voter record database or the personal information in the side channel attack);

As the author recently showed in [36], it is indeed possible to generalize alpha leakage and maximum alpha leakage to model such auxiliary information. However, this generalization is beyond the scope of this article. The measures we recommend can be applied to the privacy and side channel settings mentioned above. In most non-trivial data publishing environments, there is a basic privacy utility trade-off (PUT): On the one hand, publishing data "as is" may lead to unnecessary inferences about private information. On the other hand, interfering with or restricting the published data will reduce its quality. We quantify two types of data models: one is that the entire data set is sensitive (as shown in Figure 1a), and the other is that only a part of the data set is sensitive (as shown in Figure 1b). Throughout this article, we use X as the original data, and Y as the randomly mapped published data.
Insert picture description here

X can be completely sensitive as shown in Figure 1a, or it can be separated from the sensitive feature S as shown in Figure 1b. The variable U represents the specific sensitive characteristics of the data set that the opponent is interested in . Examples of data sets in which the entire data is sensitive include data collected by smart devices (eg, smartphone sensors, movie recommendation systems), where it is difficult to know a priori which aspect of the data should be identified as sensitive. Conversely, examples of data sets with clearly defined sensitive characteristics include censuses and other data sets that clearly contain personally identifiable information.

The exact nature of PUT depends entirely on how privacy and practicality are measured. In order to understand our new privacy measures, we considered the (maximum) alpha leakage is the PUT of privacy measures, and we studied a variety of utility measures. Generally, a meaningful utility measure (between the original data and the published data) should require the published data to provide any of the following conditions:

  • The average fidelity guarantee [18], [25], [27], [37], [38] ];
  • Fidelity guarantee in the worst case. We noticed that the average distortion constraint has also been well studied in rate distortion theory. In order to capture utility requirements
  • We introduce a hard distortion metric that constrains the privacy mechanism so that the distortion between the original data set and the published data set is bounded by probability 1. This method has also been studied as potential distortion in rate distortion theory. The hard distortion measurement is very strict, but allows data managers to make certain certainty guarantees on the fidelity of the released data set relative to the data set. Such a certainty guarantee can lead to more accurate statistical estimates, such as empirical distribution estimates for publicly released data sets (such as censuses).

Guess you like

Origin blog.csdn.net/weixin_42253964/article/details/107736406