Privacy computing is ushering in a trend of hundreds of billions. This article explains its technical and theoretical basis.

1. Secure multi-party computation

Before discussing secure multi-party computation (MPC is used below), we first discuss the setting of secure multi-party computation. Among all participants in MPC, some participants may be controlled by an adversary (attacker). Under the control of the adversary The participant is called the corrupted party, which will follow the adversary's instructions to attack the protocol during the execution of the protocol. A secure protocol should withstand attacks by any adversary. In order to formally describe and prove that a protocol is "secure", we need to accurately define the security of MPC.

1. Security

The security specification of secure multi-party computation is defined through an ideal/real world paradigm (Ideal/Real Paradigm). In an ideal world, there is a Trusted Third Party (TTP). Each participant provides its secret data to the trusted third party through a secure channel, and the third party performs functions on the joint data. Calculation, after completing the calculation, the trusted third party sends the output to each participant. Since the only action each participant can perform during the computation is to send secret data to a trusted third party, the only action an attacker can perform is to select the input of the corrupted participant, and the attacker cannot obtain the computation. No other information can be obtained other than the results.
Corresponding to the ideal world is the real world. There is no trusted third party in the real world. Participants participate in protocol execution without the help of any external nodes, and some participants may be attacked by attackers. "Corruption" or collusion. Therefore, a security protocol in the real world needs to withstand any attack by its adversary in the real world. When the same adversary attacks exist in the ideal world, the input/output data of the adversary and honest participants in the ideal world execution will be different. The joint distribution is computationally indistinguishable from the joint distribution of input/output data in a real-world execution, i.e., a real-world protocol execution is simulated in an ideal world.
The ideal world/real world paradigm is to ensure that multiple properties implied by "security" are met.

  1. Privacy: No party should obtain more than its specified output, and in particular should not obtain the input information of other partners from the output information. In an ideal world, the attacker would not be able to obtain any information other than the output of the corrupted party, and the same is true in the real world.
  2. Correctness: Ensure that the output received by each party is correct. The output obtained by an honest participant in the real world is the same as the output obtained from a trusted third party in the ideal world.
  3. Input independence: The input chosen by the corrupted participant must be independent of the input of the honest participant. In an ideal world protocol execution, a corrupted participant would not be able to obtain any input information from an honest participant when sending input to a trusted third party.
  4. Guaranteed Output: Corrupted participants should not have the ability to prevent honest participants from obtaining the output.
    5) Fairness: Corrupted participants can obtain the output if and only if honest participants obtain the output, that is, there is no corrupted participant who obtains the output but honest participants do not. output situation. In an ideal world, a trusted third party always returns output to all participants. Therefore output and fairness can be guaranteed. This also means that in the real world, honest participants get the same output as in the ideal world.

2. Participants

We need to define the participants of MPC: The participants of MPC refer to the parties participating in the agreement, and each participant can be abstracted as an interactive Turing machine with a Probabilistic Polynomial Time Algorithm (PPT Algorithm). Corrupted participants can be divided into three adversary types based on the adversary’s ability/right to control participants during the execution of the protocol.

  1. Semi-honest adversary: ​​This type of participant will perform various steps according to the requirements of the protocol. However, a semi-honest adversary will try to obtain all information during the execution of the protocol (including the execution script and all received messages) and try to derive additional private information.
  2. Malicious adversary: ​​During the execution of the protocol, this type of participant executes all steps of the protocol completely according to the attacker's instructions. Not only will all inputs, outputs, and intermediate results be leaked to the attacker, but they will also be changed according to the attacker's intentions. Input information, falsify intermediate and output information, and even terminate the agreement.
  3. Secret adversary: ​​This type of adversary may conduct malicious attacks on the protocol. Once it launches an attack, there is a certain probability that it will be detected. If it is not detected, then it may have completed a successful attack (the attack was launched to obtain additional information).

Therefore, based on the real-world attack behaviors of different participants in the secure multi-party computing protocol, the security model of the protocol can be divided as follows.

1) The Semi-Honest Model: When the protocol is executed, participants execute according to the process specified in the protocol, but malicious attackers may monitor and obtain their own information during the execution of the protocol. Inputs, outputs, and information obtained during the operation of the protocol.
2) The Malicious Model: When the protocol is executed, the attacker can use the participants under his control to analyze the honesty through illegal input or malicious tampering with the input. Participants' private information can also lead to the termination of the agreement through early termination and refusal to participate.
In addition, the adversary's corruption strategy can be divided into the following three models based on when and how the adversary controls the participants.
1) Static Corruption Model: In this model, before the agreement begins, the adversary controls a fixed set of partners. An honest partner is always honest, a corrupt partner is always corrupt.
2) Adaptive Corruption Model: The adversary can independently decide when to corrupt which participant. It should be noted that once a participant is corrupted, it will always remain state of corruption.
3) Proactive Security Model: Honest participants may be corrupted at a certain period of time, and corrupted participants may also become honest participants at a certain period of time. The active security model is told from the perspective of an external adversary who may invade a network, service or device. When the network is repaired, the adversary loses control of the machine and corrupted participants become honest participants.

In the real world, the MPC protocol does not run in isolation. It is usually a module serialization combination of the protocol or a parallel (running) combination with other protocols to get a larger protocol to run.

Research has proven that if an MPC protocol is run sequentially within a larger protocol, it still adheres to the real/ideal world paradigm, that is, there is a trusted third party that executes the protocol and outputs the corresponding results. This theory is called "modular composition" and it allows the construction of larger protocols in a modular manner using secure subprotocols, as well as the analysis of a larger system using MPC for certain computations.
For the case where the protocol runs in parallel, when there are other protocols running in parallel with the current protocol, if the protocol does not require other parallel protocols to send any messages, this assumption can be called the independent setting of the protocol. It is also the basic security definition of MPC security. In the independent setting, a protocol running in parallel behaves the same as a trusted third party executing it.
Finally, in some other scenarios, the MPC protocol may run in parallel with other instances of the protocol, or other MPC protocols, or other insecure protocols, and the protocol instances may need to interact with other instances. The protocol may not be safe to operate at this time. The protocol does not include interaction with other protocols (functional functions) in the ideal world, but needs to interact with another functional function in the real world. There are different execution conditions from the simulation in the ideal world (the real world at this time can be called a mixed world). In this case, the mainstream approach is to adopt "Universal Composability" for security definition. Under this definition, any protocol that is proven to be secure is guaranteed to perform according to the ideal behavior, regardless of its interaction with any other protocols. Whether the protocols are executed in parallel.
The security definition of MPC plays an important role. Specifically, if an MPC protocol is secure in the real world, then for a practitioner who uses the MPC protocol, he can only consider the MPC protocol in An ideal world implementation, that is, for a non-cryptographic user of the MPC protocol, would not need to worry about how the MPC protocol operates, or whether the protocol is secure, because the ideal model provides a clearer and simpler abstraction of MPC's functionality.
Although the ideal model provides a simple abstraction, in some cases it is prone to the following problems.

  1. In the real world, an adversary may enter any value, and the MPC protocol has no general solution to prevent this. For example, for the "millionaire" problem, the adversary can input the wealth of the corrupted participant arbitrarily (such as directly inputting the maximum value), then the adversary's corrupted party will always be the "winning" party. If the application of an MPC protocol relies on correct input from participants, then the correctness of participant input needs to be enhanced/validated through other techniques.
  2. The MPC protocol only guarantees the safety of the calculation process, but cannot guarantee the safety of the output. After the output results of the MPC protocol are revealed by each participant, the output results given may reveal the input information of other participants. For example, if you need to calculate the average salary of two participants, the MPC protocol can ensure that no other information is output except the average salary. However, one participant can calculate the salary of another participant based on his own salary and average salary. Therefore, using MPC does not mean that all information is protected.

In practice, considering the computational and communication overhead issues of MPC, the semi-honest model is usually used as the main security setting. Therefore, the MPC protocols discussed in this book are mainly semi-honest model protocols. Although some MPC protocols can support both semi-honest and malicious security, this article still mainly focuses on the semi-honest setting of the MPC protocol.

2. Cryptography

Cryptography is an important foundation for privacy computing technology and is often used in various technical routes of privacy computing. The theoretical system of cryptography is very large and complex. Interested readers can refer to books such as "Modern Cryptography and Its Applications" to expand their learning. This article only briefly introduces the basic knowledge of cryptography and cryptographic primitives commonly used in privacy computing.
In the MPC protocol, two data encryption methods are often used: symmetric encryption and asymmetric (public key) encryption.

  1. Symmetric encryption is an earlier encryption algorithm and its technology is mature. Because the same key is used for encryption and decryption, it is called symmetric encryption. Common symmetric encryption algorithms include DES, AES, IDEA, etc.
  2. Asymmetric encryption is also called public key encryption. Different from symmetric encryption, asymmetric encryption algorithm requires two keys: public key and private key, and they appear in pairs. The private key is kept by itself and cannot be disclosed to the outside world. A public key refers to a public key that anyone can obtain. Data is usually encrypted using the public key and decrypted using the private key. There is another use of asymmetric encryption, that is, digital signature, which uses the private key to sign the data and the public key to verify the signature. Digital signatures allow the public key holder to verify the identity of the private key holder and prevent the content published by the private key holder from being tampered with. Common asymmetric encryption algorithms include RSA, EIGamal, D-H, ECC, etc.

1.Elliptic curve encryption

Elliptic curve encryption is a public key encryption technology, which is often combined with other public key encryption algorithms to obtain the corresponding elliptic curve version encryption algorithm. It is generally believed that elliptic curves can achieve higher security using shorter keys.
Elliptic curve encryption is an encryption algorithm based on the discrete logarithm difficulty problem that limits the elliptic curve additive group in the real number domain to the prime number domain. An elliptic curve in the real number field can usually be defined by a binary cubic equation. Take the commonly used Weierstrass elliptic curve equation as an example:
Insert image description here
Among them, a and b are configurable parameters. All solutions corresponding to the elliptic curve equation (all points of the two-dimensional plane on the curve corresponding to the equation) plus a point 0 at infinity (the unit element of the group, 0 point) constitute the element set of the elliptic curve. On the set of these points, and then define the corresponding addition calculations and inverse element calculations that satisfy the properties of closure, commutation, and combination, the elliptic curve additive group can be formed. By modifying different parameters a and b of the elliptic curve, different circular curve groups can be obtained, as shown in the figure.

Insert image description here

For the two points P and Q on the elliptic curve, first define the straight line passing through the two points and the intersection point R with the elliptic curve. The addition result of P and Q on the elliptic curve is the symmetry point of R about the x-axis. This addition operation definition covers multiple situations, as shown in the figure.
picture

  1. If P and Q are not tangent points and are not inverse elements of each other, then there is a third intersection point R, at this time P+Q=-R.

  2. P or Q is the tangent point (assumed to be 0). At this time, the line connecting P and 0 is called the tangent line. Then define R=O, and at this time P+Q+Q=0. That is, the addition result is P+O=-Q.

  3. If the line connecting P and Q is perpendicular to the x-axis, and there is no intersection between the line and the elliptic curve, the intersection is considered to be at infinity, that is, P+Q=0.

  4. If P=Q, then the connecting line is considered to be the tangent line of the elliptic curve at point P; if the tangent line intersects the elliptic curve at point R, the result is -R, otherwise the intersection point is considered to be a point at infinity.

In terms of implementation, the addition of elliptic curves is first calculated The slope of the line connecting P and Q

picture

Then calculate the coordinates of intersection R according to Vedic theorem:

picture

If P and Q are the same point, its slope calculation is modified to

picture

Then calculate the coordinates of R according to the above formula.

Scalar multiplication of elliptic curves (or multiple point operations) can be implemented through multiple additions of points. For example, nP means adding n P points:
picture

Since the additive group of elliptic curves satisfies the commutative and associative laws, it can be optimized by the Double-and-Add algorithm.

When applying the elliptic curve additive group to encryption, it is often necessary to restrict the elements of the elliptic curve to a prime domain picture, so that the discrete logarithm puzzle can be constructed based on its scalar multiplication. The elliptic curve of the prime field is defined as follows:
Insert image description here

The point distribution of an elliptic curve in the prime field of the picture a=-1, b=3 is as follows:

For the points of the elliptic curve defined on the prime number domain picture, the addition and scalar multiplication of the coordinates are the same as the real number domain calculation rules, but all calculations need to be performed in the prime number domain E. As shown in Figure 1-3, the intersection point of P point (16,20) and 0 point (41.120) is R (86,46) on the PO straight line defined by the prime number field F [under the prime number field. The straight line on is defined as all points T that satisfy ax+by+c三0(modp), and the addition result is -R(86,81).
Similarly, according to the addition calculation rules, scalar multiplication in the prime field can be obtained. When the scalar n is very large, after calculating Q=nP, the Q point is used as the public key and n is used as the private key. Calculating n by knowing P and Q constitutes the discrete logarithm problem of circular curves in the prime field. Obviously, if n is defined in the integer domain, the point set traversed by the scalar multiplication of P constitutes the cyclic subgroup of the elliptic curve. To ensure safety, it is necessary to make the scalar multiplication of P cover enough points (the subgroup of The order is large enough), so it is necessary to find a base point with a higher element order.
A simple way to find the base point is to first determine the order n of the subgroup (it needs to be a prime number) according to the order N of the elliptic curve, calculate the cofactor h=N/n, and then calculate the order of the subgroup in the elliptic curve Randomly select point P, calculate G=hP, if G=0, re-select the point, otherwise point G is the base point.
There are multiple mainstream elliptic curves and parameters that are considered safe to choose from, such as elliptic curve secp256k1, curve25519, etc. in Bitcoin signatures; in the open source framework FATE, the twisted Edward curve is implemented Edwards25519.

3. Ciphertext calculation

If the encrypted ciphertext can be directly calculated, the corresponding encryption technology can be called homomorphic encryption. The concept of "homomorphism" comes from homomorphic mapping in abstract algebra. It refers to a type of mapping that can maintain operations between two algebraic systems (groups/rings/domains), that is, for algebraic systemspicture, if after a certain mapping picture, for picture, F(A·B)= F(A)XF(B), then it can Call F a homomorphic mapping from A to B.
If a plaintext is subjected to a certain encryption algorithm, and the ciphertext is subjected to the ciphertext operation "corresponding" to the plaintext, and the decryption results are the same as the plaintext operation, then the encryption algorithm can be considered to have the same Morphic properties allow homomorphic encryption of plaintext. Homomorphic encryption can be divided into three types according to the supported calculation type and degree of support: Partially Homomorphic Encryption (PHE), Approximately Homomorphic Encryption (SWHE) and Fully Homomorphic Encryption (SWHE) Homomorphic Encryption (FHE).

(1) Semi-homomorphic encryption

Semi-homomorphic encryption refers to an encryption algorithm that only supports addition or multiplication operations, and can be called additive homomorphism and multiplication homomorphism respectively. Common semi-homomorphic encryption algorithms include RSA, EIGamal, ECC-EIGamal, and Paillier. RSA and EIGamal have multiplicative homomorphic properties, and ECC-EIGamal and Paillier have additive homomorphic properties. Paillier is a commonly used semi-homomorphic encryption algorithm. It relies on the construction of difficult problems of composite residual classes, which has been proven to be very reliable after years of research and is frequently used in multiple open source privacy computing frameworks. The Paillier principle will be briefly introduced below.

A PHE usually contains the following functions.

  1. KeyGen(): Key generation, used to generate the public key pk and private key sk for encrypted data, as well as some public parameters.

  2. Encrypt(): Encryption algorithm, uses pk to encrypt user data m, and obtains ciphertext (ciphertext) c.

  3. Decrypt(): Decryption algorithm, use sk to decrypt the ciphertext c, and obtain the original data (plaintext) m.

  4. Add(): Homomorphic addition of ciphertext, input two ciphertexts c1 and c2, and perform homomorphic addition operation.

  5. ScalaMul(): Ciphertext homomorphic scalar multiplication, input c and a scalar s, and calculate the result of multiplying c and the scalar.

(2) Approximately fully homomorphic encryption

Approximately fully homomorphic encryption (limited series homomorphic encryption) is an encryption algorithm that supports both ciphertext addition and ciphertext multiplication, but it often only supports a limited series of ciphertext multiplication. Approximately fully homomorphic encryption is the basis for most fully homomorphic encryption. Fully homomorphic encryption algorithms often add a bootstrapping or progressive modulus switching to an approximately fully homomorphic encryption scheme. The fully homomorphic encryption algorithm originated from the solution proposed by Gentry in 2009. By adding a bootstrap operation to the approximately fully homomorphic encryption algorithm, the growth of noise during the operation is controlled. The bootstrapping method refers to converting the decryption process itself into a homomorphic operation circuit and generating a new public-private key pair, encrypting the original private key and the original ciphertext containing noise, and then using the ciphertext of the original private key to encrypt the original ciphertext. The homomorphic operation of decryption can be performed on the ciphertext of the text, which can obtain a new ciphertext without noise.

(3) Fully homomorphic encryption

Gentrv proposed a fully homomorphic encryption algorithm based on a circuit model in 2009, which only supports addition and multiplication homomorphic operations (Boolean operations) for each bit. The current mainstream homomorphic encryption scheme is based on the LWE Learning With Errors/Ring-LWE (RLWE) problem structure on the lattice. Both LWE/RLWE can be reduced to difficult problems based on the lattice [such as the shortest linearly independent vector (SIVP) problem]. However, the LWE problem involves the multiplication of matrices and vectors, and the calculation is more complicated, while the problem based on RLWE only involves the operation of polynomials on the ring, which has smaller computational overhead. Therefore, although mainstream homomorphic encryption algorithms (such as BGV, BFV, etc.) can be constructed based on both LWE/RLWE, RLWE will be the main implementation. In addition, Cheon et al. proposed a fully homomorphic encryption scheme for floating point numbers - CKKS in 2017. This scheme supports homomorphic operations of addition and multiplication of floating point numbers for real numbers or complex numbers. The calculation results obtained are approximate values, which is generally applicable Suitable for scenarios such as machine learning model training that do not require accurate results.
Pseudo-random function
Another widely used cryptographic primitive in privacy computing is the pseudo-random function (Pseudo Random Function). A pseudorandom function is a deterministic function of the form y = F(k,x), where is a key in the key space K, x is an element in the input space X, and y is an element in the output space Y element. Its security requirements: given a random key k, the function F(k,·) should look like a random function defined from X to Y. Oded et al. demonstrated that pseudorandom functions can be constructed via a pseudorandom number generator.

4. Machine Learning

From the perspective of privacy computing based on its protected calculation process, there is a large category of privacy-preserving machine learning. Machine learning can generally be divided into three types based on its learning methods: supervised learning, semi-supervised learning and unsupervised learning.
Supervised learning is a learning method given labeled/labeled training data. Its goal is to learn a model (function) in a given training data set. When new data appears, Prediction results can be given based on this function. Common supervisory learning algorithms include naive Bayes, logistic regression, linear regression, decision tree, ensemble tree, support vector machine, (deep) neural network, etc. However, in many practical problems, because the cost of labeling data is sometimes very high, only a small amount of labeled data and a large amount of unlabeled data are usually available. Semi-supervised learning is a learning method given less labeled training data and a large amount of unlabeled data. It uses unlabeled data to obtain more information about the data structure. Its goal is to obtain more information than training using labeled data alone. Supervised learning techniques for better results. Common semi-supervised learning strategies include self-training, PU Learning, Co-training, etc. Common tasks in unsupervised learning are clustering, representation learning, and density estimation. These tasks seek to understand the intrinsic structure of data without explicitly providing labels. Common unsupervised learning algorithms include k-means clustering, principal component analysis, and autoencoders. Since no labels are provided, there is no specific method for comparing model performance in most unsupervised learning algorithms.
In practical applications, supervised learning has a wide range of applications, so this article focuses on supervised learning to introduce some basic concepts.

1.Loss function

Supervised learning usually gives labeled training data (x, y), x is the input data, usually represented by a vector, and each element of the vector is called a feature; y is the output data that the model needs to learn, usually Also called a label. Training data generally consists of multiple pieces of (x, y) data. Each piece of (x, y) data is called a sample. The inputs of all samples form the input space/feature space X, and all the output labels form the output space Y. According to the distribution of the output space, supervised learning can be divided into classification models and regression models. Classification models are usually divided into binary classification models and multi-classification models according to the base of the output label y.
Therefore, supervised learning The goal is to use a learning algorithm to find a model f(x,w) on the training set X×Y so that the predicted value picture obtained by the model picture is consistent with the real output value. However, the model's picture of predicted values ​​may or may not be consistent with y. Therefore, a loss function needs to be used to quantify the difference between the model predicted value picture and the true value y (generally called picture-y as the residual). The loss function L(y,f(x,w)) is a non-negative real-valued function that needs to be defined differently depending on the type of supervised learning task. Commonly used loss functions include 0-1 loss function, square loss function, logarithmic loss function, cross loss function, hinge loss function, etc. However, the previous loss function is only the expected loss defined on the training data set, which is usually called the empirical loss or empirical risk. Only when the number of samples tends to infinity can the loss be considered to be the expected loss (usually also called the expected loss). called expectation risk). When the model parameters are complex and the training data is small, there may be a phenomenon where the model has high prediction accuracy on the training data, but low prediction accuracy on unknown data sets outside the training set. This phenomenon is called " Overfitting". In order to prevent the "over-fitting" phenomenon, parameter regularization terms (regularization loss) are usually added to the empirical loss to limit the complexity of the model parameter pairs. This new loss function can be called a structured loss function (or structural loss function). risk function).

2. Gradient descent

After the loss function is defined, the model parameter picture can be found through the optimization method so that the loss value continues to decrease. The lower the loss value, the smaller the difference between the model prediction value picture and the real output value y. A common parameter optimization method is the gradient descent method. For each training iteration, the gradient of the loss function to the parameters (first-order continuous partial derivative) is first calculated, and the reverse direction of the gradient (the negative number of the gradient) is multiplied by a certain step size lr (learning rate), update the parameters:
picture

The step size lr can be fixed as a ratio, or an adaptive ratio can be calculated through various optimizers. According to the selection strategy of training samples, gradient descent can be divided into stochastic gradient descent (randomly selecting one sample each time), batch gradient descent (using all samples each time) and mini-batch gradient descent (selecting a batch in sequence each time). sample).

In federated learning, the labeled party can directly calculate the gradient. On the unlabeled party, the gradient must be calculated in a dense state. The residual can be homomorphically encrypted by the labeled party and sent to the unlabeled party for calculation. It can also be calculated by MPC method joint calculation.

3. Deep learning

Deep learning is a very popular machine learning method in recent years. Compared with traditional machine learning, deep learning mainly uses deep neural networks as a specific model structure, through various methods such as layer structure, layer connection method, connection weight sampling, unit structure, activation function, learning strategy, and regularization of the network structure. Optimize to achieve model performance that far exceeds traditional machine learning. A variety of classic deep learning technologies are widely used, including Convolutional Neural Networks (CNN), Graph Convolutional Neural Networks (GCN), Dropout, Poling, Long Short-Term Memory Network (LongShort-Term Memory, LSTM), RNN, GRU, residual network, DQN, DDQN, Batch Normalization.Layer Normalization, Attention, Transformer, etc.
Since the deep learning model is complex and has a large number of parameters, and many calculations in privacy calculations involve ciphertext calculations, in practical applications, networks with fewer network levels and simpler structures are usually used. Neural networks, such as CNN, Dropout, Pooling, Batch Normalization, Layer Normalization, etc. There are many implementation schemes in privacy computing. For example, in federated learning, the forward calculation of the neural network is usually performed locally on the participating parties, then the gradient calculation is performed on the coordination node, and finally the back propagation of the network is performed locally. The deep neural network implemented by MPC usually secretly shares model parameters and data for fully dense training or prediction. CKKS, a solution based on fully homomorphic encryption, can directly perform network training and prediction in a fully dense state (data encryption or model encryption), and can satisfy the complete separation of model and data.

Guess you like

Origin blog.csdn.net/weixin_51390582/article/details/134997336