Detailed explanation of homomorphic encryption technology and its application in machine learning

What is homomorphic encryption

Homomorphic encryption (HE, homomorphic encryption) is a special encryption mode in cryptography. Homomorphic encryption allows us to send the encrypted ciphertext to any third party for calculation, and does not need to be decrypted before calculation, namely : Calculate on the ciphertext. Although homomorphic encryption concept first appeared 30 years ago, but the first to support any operation carried out in the ciphertext fully homomorphic encryption framework appeared later, in 2009, proposed by Craig Gentry.

The mathematical definition of homomorphic encryption is [1]:

image

Where E is the encryption algorithm, and M is the set of all possible information. If the encryption algorithm E satisfies the formula (1), then we say that E conforms to the properties of homomorphic encryption in ★ operation. The current homomorphic encryption algorithm mainly supports two kinds of operational homomorphism: addition and multiplication.

It should be noted that the above formula (1) is only to allow us to understand the nature of homomorphic encryption more clearly, and the actual homomorphic encryption algorithm may be somewhat different. For example, the Paillier algorithm is homomorphic for addition. According to formula (1), the sum of the ciphertext should be equal to the summed ciphertext, but the actual situation is that the product of the ciphertext is equal to the summed ciphertext, so we generally only The required ciphertext result is the same as our expected calculation, but there are no specific requirements for the calculation on the ciphertext (generally determined by the encryption algorithm).

2 The composition and classification of homomorphic encryption

The homomorphic encryption algorithm generally includes the following four parts:

  1. KeyGen: Key generation algorithm to generate public and private keys

  2. Encryption: encryption algorithm

  3. Decryption: Decryption algorithm

  4. Homomorphic Property: Homomorphic encryption calculation part

The first three parts can be seen in many encryption algorithms, and the fourth part is the core of the homomorphic encryption algorithm, which guides the operations under the ciphertext.

In order to better understand and use homomorphic encryption algorithms, we divide them into three categories according to the type and number of operations supported by homomorphic encryption algorithms: partial homomorphic encryption, hierarchical homomorphic encryption, and fully homomorphic encryption[1] .

  1. Partial HE (PHE for short) means that the homomorphic encryption algorithm only has a homomorphic property for addition or multiplication (one of them). For example: RSA encryption is the earliest public key encryption algorithm framework used, and RSA algorithm is also a PHE algorithm, which has a homomorphic property to multiplication. The research results of PHE appeared earlier, and the additive homomorphic encryption algorithm (Additive HE) is more than the multiplicative homomorphic encryption algorithm. The advantage of PHE is that the principle is simple and easy to implement, but the disadvantage is that it only supports one operation (addition or multiplication).
  2. Level homomorphic encryption algorithms (LHE, Leveled HE or SWHE, SomeWhat HE) generally support a limited number of addition and multiplication operations. The research on hierarchical homomorphic encryption is mainly divided into two stages. The first stage was before Gentry proposed the first FHE framework in 2009. Well-known examples include: BGN algorithm, Yao's confusion circuit, etc.; After the Gentry FHE framework, it mainly addresses the problem of low efficiency of FHE. The advantage of LHE is that it supports addition and multiplication at the same time, and because it appears later than PHE, the technology is more mature, the general efficiency is much higher than that of FHE, and the efficiency of PHE is close to or higher than that of PHE. The disadvantage is that the number of calculations supported is limited.
  3. Fully Homomorphic Encryption Algorithm (Fully HE, FHE for short) supports unlimited and arbitrary types of calculations on ciphertext . In terms of technology used, FHE has the following categories: FHE scheme based on ideal lattice, FHE scheme based on LWE/RLWE, and so on. The advantage of FHE is that it supports many operators and there is no limit to the number of operations. The disadvantage is that the efficiency is very low, and it is currently unable to support large-scale calculations.

Figure 1 Research timeline of three types of homomorphic encryption [1]

Figure 1 shows the research timeline of three types of homomorphic encryption algorithms. The concept of homomorphic encryption was proposed in 1976, and then the research results of PHE gradually enriched; before Gentry's FHE framework, LHE research dominated; after 2009, research The focus is on FHE.

3 同态加密在机器学习中的应用3.1 联邦学习(PHE)

联邦学习是一种隐私保护机器学习方法,其主要思想为:构建一个隐私保护机器学习系统,使得拥有数据的多方能够联合训练一个或多个模型,并且任意一方的数据不会泄露给其他参与者。这能在保证隐私数据不泄露的情况下,提升参与者们本地模型的任务表现,打破数据孤岛 [2]。

image

图 2 联邦学习流程示例

在联邦学习中,多方联合训练模型一般需要交换中间结果,如果直接发送明文的结果可能会有隐私泄露风险。在这种场景下,同态加密就可以发挥很重要的作用。多方直接将中间结果用同态加密算法进行加密,然后发送给第三方进行聚合,再将聚合的结果返回给所有参与者,不仅保证了中间结果没有泄露,还完成了训练任务(第三方可以通过优化系统设计去除)。

在联邦学习中,因为只需要对中间结果或模型进行聚合,一般使用的同态加密算法为 PHE(多见为加法同态加密算法),例如在 FATE 中使用的 Paillier 即为加法同态加密算法。为了更好地展示同态加密在联邦学习中的应用,我们在此展示一个同态加密在联邦学习推荐系统中的应用 [3]。

image

图 3 联邦矩阵分解推荐系统流程

在传统的推荐系统中,用户需要上传浏览记录、评价信息来实现个性化推荐,但是这些信息均属于个人的隐私数据,直接上传会带来很大的安全隐患。在联邦推荐系统中,每个用户将数据保存在本地,只上传特定的模型梯度。这样虽然避免了隐私数据的直接泄露,但是还是透露了梯度信息给云服务器。同时我们发现,从数学上可以证明,使用连续两次更新的梯度即可反推出用户的评分信息。这种情况下,就必须使用同态加密对用户上传的梯度进行保护,即用户在上传梯度前使用加法同态加密算法对梯度信息进行加密,然后云服务器将所有用户的密文梯度进行聚合(相加),再将更新后的模型返还给各个用户解密,完成训练更新。

3.2 密态机器学习(LHE and FHE)

image

除了联邦学习外,同态加密另一个比较重要的应用领域是密态计算。和联邦学习不同的是,密态计算不需要多方参与,但需要的计算比联邦学习更加复杂(算子多、计算量大)。密态计算中使用的同态加密算法多为 LHE 和 FHE。其实全同态加密研究的初衷,就是为了实现安全的云计算,即对云算力有需求的用户可以将本地的数据全部加密,然后上传到云端,然后云端的服务器即可按照用户指令完成计算,整个过程用户的数据不会泄露给云端,从而完成“绝对安全”的云计算服务。

但是由于目前 FHE 效率比较低,所以使用全同态加密进行云计算远远没有达到应用的级别。机器学习在云计算中有着广阔的市场,而机器学习有训练和推理两种需求,训练过程一般数据较多、计算量很大,而推理则数据量相对较小、计算量也小,所以目前研究主要集中在密态下的机器学习推理,并且目前已经有速度比较快的方案 [4];而密态下的机器学习训练研究稀少,是一个比较难解决的问题。

4 部分开源同态加密库的效率比较

目前 GitHub 中有很多的开源 HE 框架,在这里我们选择两个进行测试比较,一个是 python-paillier,支持加法同态;一个是 SEAL-CKKS,属于 LHE 算法,支持有限次数的加法和乘法。

image

表 1: Paillier 和 CKKS 的效率对比(ms)

表 1 展示了 Paillier 和 CKKS 的效率对比,时间单位为毫秒,测试机器为 Intel(R) Xeon(R) E5-2630 24-core 2.6GHz CPU,63GB RAM,表格 C+P 中的 C 代表密文、P 代表明文。表格中 CKKS 的 key 含义为 polynomial modulus degree。

It can be seen from the results that when the key size gradually increases, the time-consuming paillier increases rapidly (the speed exceeds linearity). The paillier generally uses a minimum of 2048-bit keys to ensure security, and the paillier under 2048 bits is more efficient than CKKS. It is worth mentioning that SEAL-CKKS supports SIMD operations, so in machine learning training and inference, a batch of data can be packaged and encrypted according to the batch size dimension, so that the calculation efficiency is linearly improved.

Summary: When SIMD operations cannot be used (some machine learning scenarios may not have batch, such as matrix decomposition), it is more efficient to use a paillier with a smaller key size; when SIMD operations can be used (such as machines in most scenes), it is more efficient to use the paillier. ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? Learning model training, reasoning), SEAL-CKKS is significantly more efficient than paillier.


Guess you like

Origin blog.51cto.com/15060462/2675355