In-depth understanding of federated learning - the difference and connection between federated learning and existing theories

Category: General Catalog of "In-depth Understanding of Federated Learning"


As a brand-new technology, federated learning has certain originality while drawing on some mature technologies. Below we explain the relationship between federated learning and other related concepts from multiple perspectives.

The difference between federated learning and differential privacy theory

The characteristics of federated learning allow it to be used to protect the privacy of user data, but it is different from the privacy protection theories commonly used in the field of big data and data mining, such as differential privacy protection theory (DifferentiaI Privacy), K-anonymity (K-Anonymity) and L There are still big differences in methods such as L-Diversity. First of all, the principle of federated learning is different from that of traditional privacy protection methods. Federated learning protects user data privacy through parameter exchange under encryption mechanisms, and encryption methods include homomorphic encryption. Unlike DifferentiaI Privacy, its data and models themselves will not be transmitted, so there is no possibility of leakage at the data level, and it does not violate stricter data protection laws such as GDPR. Differential privacy theory, K-anonymity and L-diversification methods add noise to the data, or use generalization methods to blur some sensitive attributes until the third party cannot distinguish individuals, so that the data cannot be accessed with a high probability. Restore, in order to protect user privacy. However, in essence, these methods still transmit raw data, which may be potentially attacked, and this data privacy protection method may no longer be applicable under stricter data protection laws such as GDPR. Correspondingly, federated learning is a more powerful means of protecting user data privacy.

The difference between federated learning and distributed machine learning

The multi-party joint training method in horizontal federated learning has some similarities with distributed machine learning (Distributed Machine Learning). Distributed machine learning covers many aspects, including distributed storage of training data in machine learning, distributed operation of computing tasks, distributed release of model results, etc. Parameter Server (Parameter Server) is a typical example of distributed machine learning. example. As a tool to accelerate the training process of machine learning models, the parameter server stores data on distributed working nodes, allocates data distribution and allocates computing resources through a central scheduling node, so as to obtain the final training model more efficiently . For federated learning, the first thing is that the working node in horizontal federated learning represents the data owner of the model training. It has complete autonomy over local data and can independently decide when to join federated learning for modeling. In the parameter server, the central node always occupies a dominant position, so federated learning is facing a more complex learning environment; secondly, federated learning emphasizes the data privacy protection of the data owner during the model training process, which is a kind of Effective measures to deal with data privacy protection can better cope with the increasingly stringent data privacy and data security regulatory environment in the future.

The relationship between federated learning and federated database

The Federated Database System is a system that integrates multiple different unit databases and manages the integrated whole. It is proposed to realize the mutual operation of multiple independent databases. The federated database system often adopts a distributed storage method for the unit database, and in practice, the data in each unit database is heterogeneous. Therefore, it has many similarities with federated learning in terms of data types and storage methods. However, the federated database system does not involve any privacy protection mechanism in the interaction process of each unit database, and all unit databases are completely visible to the management system. In addition, the work of the federated database system focuses on various basic database operations including insertion, deletion, search, and merging, while the purpose of federated learning is to establish a joint model for each data under the premise of protecting data privacy, so that the data contains The various patterns and laws of the world serve us better.

The relationship between federated learning and blockchain

Blockchain is a cryptographically secure distributed ledger that is easy to verify and cannot be tampered with. Blockchain 2.0 is a decentralized application. By using open source code and distributed storage and operation, it ensures extremely high transparency and security, so that data will not be tampered with. Typical applications of blockchain include Bitcoin (BTC), Ethereum (ETH), etc. Both blockchain and federated learning are decentralized networks. Blockchain is a completely P2P network structure. In federated learning, a third party will undertake functions such as aggregation models and management. Both federated learning and blockchain involve basic technologies such as cryptography and encryption algorithms. According to different technologies, the encryption algorithms used in blockchain technology include hash algorithms, asymmetric encryption, etc.; homomorphic encryption is used in federated learning, etc. From a data point of view, the blockchain records complete data on each node in an encrypted manner, while in federated learning, the data of all parties is only kept locally. From the point of view of the reward mechanism, in the blockchain, different nodes obtain rewards through competition for bookkeeping; in federated learning, multiple participants learn together to improve the model training results, and distribute rewards based on the contribution of each party.

The relationship between federated learning and multi-party secure computing

In federated learning, user privacy and security are top priorities. In order to protect user privacy and prevent federated learning applications from being attacked by malicious parties, multi-party secure computing technology can be applied in federated learning and become a part of the federated learning technical framework. The academic community has launched research on using multi-party secure computing to enhance the security of federated learning. McMahan pointed out that federated learning can provide stronger security guarantees through differential privacy, multi-party secure computing, or their combination. Bonawitz pointed out that in federated learning, multi-party secure computing can be used to compute the sum of model parameter updates from user devices in a secure manner. Truex proposes a federated learning method that utilizes differential privacy and multi-party secure computation to preserve privacy. Liu proposed to apply additive homomorphic encryption (AHE) to multi-party computation of neural networks. The open source federated learning framework FATE proposed by WeBank includes related operators for multi-party secure computing, which facilitates the efficient development of multi-party secure computing by the application side.

References:
[1] Yang Qiang, Liu Yang, Cheng Yong, Kang Yan, Chen Tianjian, Yu Han. Federated Learning [M]. Electronic Industry Press, 2020 [2] WeBank, FedAI.
Federated Learning White Paper V2.0. Tencent Research Institute, etc., 2021

Guess you like

Origin blog.csdn.net/hy592070616/article/details/132676395