[Federal Study] Reading Notes (2) Privacy Protection Technology

The privacy protection technology in this article includes three methods.

Secure multi-party computing

Homomorphic encryption

Differential privacy

1. Secure multi-party computing

Secure multi-party computing was originally proposed for a secure two-party computing problem, the so-called "millionaire problem", which was proposed and promoted by Yao Qizhi in 1982.

Safe multi-party calculation allows us to calculate the function of private input values, so that each party can only get its corresponding function output value, but not the input value and output value of other parties.

Secure multi-party computing can be achieved through three different frameworks:

Oblivious Transfer (OT)
Secret Sharing (SS)
Threshold Homomorphic Encryption (Threshold Homomorphic Encryption, THE)

(1) Inadvertent transmission

In inadvertent transmission, the sender has a "message-index" pair $\left ( M_{1} \right,1), ... ,\left ( M_{N} \right,N)$ . In each transmission, the receiver selects a satisfactory $1 \ leq i \ leq N$ index $i$ and receives it $M_{i}$ . The receiver cannot know any other information in the Coslight database, and the sender cannot know $i$ any information selected by the Coslight receiver .

(2) Secret sharing

Secret sharing is a concept of hiding the secret value by dividing the secret value into multiple random shares and distributing these shares to different parties. Therefore, each party can only have one value obtained through sharing, which is part of the secret value. According to the specific use situation, all or a certain number of shared values are required to reconstruct the original secret value.

Secret sharing mainly includes arithmetic secret sharing, Shamir secret sharing and binary secret sharing.

(3) The application of secure multi-party computing in PPML

Most PPML methods based on secure multi-party computing use a two-stage architecture, including offline and online stages. Most cryptographic operations are performed in the offline phase, and multiplication triples are generated in the offline phase. After that, the machine learning model is trained in the online phase using the multiplicative triples generated in the online phase.

2. Homomorphic encryption

Homomorphic encryption is gradually regarded as a feasible behavior to realize secure multi-party computing in PPML, and it is a ciphertext computing solution that does not require decryption of ciphertext.

The homomorphic encryption method $H$ is an encryption method that allows specific algebraic operations to be performed on the encrypted content by performing effective operations on the relevant ciphertext (without knowing the decryption key). A homomorphic encryption method $H$ consists of a four-tuple:

$H = \left \{ KeyGen, Enc, Dec, Eval \right \}$ ，

In the formula, KeyGen represents the key generation function . For asymmetric homomorphic encryption, a key generator g is input to KeyGen, and a key pair is output $\left \{ pk,sk \right \} = KeyGen(g)$ , where pk represents the public key used for plaintext encryption, and sk represents the key used for revealing secrets. For symmetric homomorphic encryption, only one key is generated $sk = KeyGen(g)$ .

Enc stands for encryption function . For asymmetric homomorphic encryption, an encryption function takes the public key pk and plaintext m as input, and produces a ciphertext $c = Enc_{pk}\left ( m \right )$ as output. For symmetric homomorphic encryption, the encryption process uses the public key sk and plaintext m as input, and generates ciphertext $c = Enc_{pk}\left ( m \right )$ .

Dec represents the decryption function. For asymmetric homomorphic encryption and symmetric homomorphic encryption, the privacy key sk and ciphertext c are used as $m =Dec_{sk}\left ( c \right )$ input to generate the relevant plaintext .

Eval stands for evaluation function. The evaluation function Eval takes the ciphertext c and the public key pk (for asymmetric homomorphic encryption) as input, and outputs the ciphertext corresponding to the plaintext.

Classification of homomorphic encryption. Homomorphic encryption is mainly divided into three categories:

Partial homomorphic encryption (PHE) . For partially homomorphic encryption methods, $\left (M, \odot _{M} \right )$ sum $\left (C, \odot _{C} \right )$ is a group. Operators $\ waiting _ {C}$ can be used in ciphertext indefinitely. PHE is a homomorphism technology, in particular, if $\ odot _ {M}$ a hair adder, the scheme may be referred to as an adder homomorphism .
Slightly Homomorphic Encryption (SHE) . Some homomorphic encryption methods mean that some arithmetic operations (such as addition and multiplication) in the homomorphic encryption method can only be performed a limited number of times. The SHE method uses noise data for safety. Each operation on the ciphertext will increase the amount of noise in the ciphertext, and the multiplication operation is the main technical means to increase the amount of noise. When the amount of noise exceeds an upper limit, the decryption operation will not get the correct result. This is why most SHE methods require limiting the number of calculation operations.
Fully homomorphic encryption (FHE) . The fully homomorphic encryption method allows unlimited addition and multiplication operations on the ciphertext. No FHE can be proven to be safe under any function, and FHE has security against indistinguishable selective ciphertext attacks.

3. Differential privacy

Differential privacy was originally developed to facilitate security analysis on sensitive data. The central idea of differential privacy is to confuse the adversary when querying individual information from the database, so that the adversary cannot distinguish the sensitivity of the individual level from the query results. Differential privacy provides an information theory security guarantee, that is, the output result of the function is not sensitive to any specific record in the data set. Therefore, differential privacy is used to resist member reasoning attacks.

Definition: $\left ( \epsilon ,\delta \right )-$ differential privacy. Only for a different set of data records $D$ and $D^{'}$ , a randomization mechanism $M$ to protect $\left ( \epsilon ,\delta \right )-$ the differential privacy, and for all $S\subset Range\left ( M \right )$ has:

$Pr\left [ M\left (d \right )\in S \right ] \leq Pr\left [ M\left (D^{'} \right )\in S \right ]\times e^{\epsilon }+\delta$

In the formula, it $\epsilon$ represents the privacy budget; it $\delta$ represents the probability of failure.

$ln\frac{Pr\left [ M\left ( D \right )\in S \right ]}{Pr\left [ M\left ( D^{'} \right )\in S \right ]}$ The value of is called privacy loss , where ln represents the natural logarithm operation. When $\delta =0$ the time, we will have a better performance $\epsilon -$ differential privacy.

Classification of differential privacy methods. There are two main methods to achieve differential privacy by adding noise to the data. One is to increase noise based on the sensitivity of the function; the other is to select noise based on the exponential distribution of discrete values.

The sensitivity of a real-valued function can be expressed as the maximum extent to which the value of the function may change due to the addition or deletion of a single sample.

Differential privacy algorithms can be classified according to the way and location of noise disturbances:

Input disturbance : Noise is added to the training model.
Target disturbance : Noise is added to the objective function of the learning algorithm.
Algorithm disturbance : Noise is added to intermediate values, such as gradients in an iterative algorithm.
Output disturbance : Noise is added to the output parameters after training.