(Privacy Computing) Federated Learning Overview

1. What is

concept

  • Federated Learning (FELE) is a distributed machine learning technology that breaks data silos and unleashes the potential of AI applications. It allows all participants in federated learning to learn without disclosing the underlying data and the encryption (obfuscation) form of the underlying data. Enables federated modeling by exchanging encrypted machine learning intermediate results. Federated learning takes into account both AI applications and privacy protection. It is open for cooperation and highly collaborative. It fully releases big data productivity and is widely applicable to business innovation scenarios in finance, consumer Internet and other industries.
  • vernacular
    • To give a simple example, there are 10 teams doing the same task, each with their own business data sets. They all hope to use other people’s data to improve model performance, but are unwilling to expose all their own data; The solution of federated learning is not to share each other's data, but to share each other's model parameters to achieve distributed model training in the cloud. In this way, everyone can protect their own data and share more data to improve model performance.
      Insert image description here

Legal & Compliance

  • At present, a series of laws and regulations such as the "Cryptozoology Law of the People's Republic of China", the "Cybersecurity Law of the People's Republic of China", and the "Personal Information Security Specifications for Information Security Technology" have officially come into effect, standardizing the specific requirements for information security and privacy protection. Privacy protection The importance and urgency are self-evident.
    • In April 2020, the State Council issued the "Opinions on Building a More Perfect Market-oriented Allocation System and Mechanism of Factors", which listed data as a production factor and required "strengthening data resource integration and security protection" and "formulating a data privacy protection system and security review" system".
    • In May 2020, the State Council issued the "Opinions on Accelerating the Improvement of the Socialist Market Economic System in the New Era" which clearly stated: "Strengthen the orderly sharing of data and protect personal information in accordance with the law."
    • In December 2020, the National Development and Reform Commission and three ministries and commissions issued the "Guiding Opinions on Accelerating the Construction of a National Integrated Big Data Center Collaborative Innovation System", focusing on deepening the market-oriented allocation reform of data elements and optimizing the layout of data center construction.

federated learning system

  • Horizontal federated learning (sample federation): more feature overlap and less user overlap
  • Vertical federated learning (feature union): less feature overlap and more user overlap
  • Federated transfer learning (transfer learning): less feature overlap and less user overlap
    [picture]

2. Name explanation

  • Data silos: The data collected by each enterprise is different, and the data is not used, and the data is not shared between enterprises.
  • Distributed machine learning: Each user trains the model locally, and finally updates it to the server uniformly, using the resources of each user to achieve distributed training
  • Data encryption: It is hoped that data user data can be kept private through data encryption, thus ensuring both data sharing and data privacy.
  • Joint modeling: encrypt and share enterprise data to train a joint model (everyone just shares the model and cannot know the details of the other party's use of data)

3. Learning process

3.1 Horizontal federated learning

basic concept

  • The essence of horizontal federated learning is the union of samples. It is suitable for scenarios where participants have the same business format but reach different customers, that is, there is a lot of feature overlap and little user overlap. For example, between banks in different regions, their businesses are similar (similar features). But users are different (samples are different). A typical case is FATE from WeBank: Later we will conduct practical operations on WEBank’s open source federated learning framework FATE.

learning process

Insert image description here

  • step1: Each participant downloads the latest model from server A;
  • Step 2: Each participant uses local data to train the model, and uploads the encrypted gradient to server A. Server A aggregates the gradients of each user to update the model parameters;
  • step3: Server A returns the updated model to each participant;
  • step4: Each participant updates their respective models.

3.2 Vertical Federated Learning

basic concept

  • The essence of vertical federated learning is the combination of features, which is suitable for scenarios where there is a lot of user overlap and little feature overlap, such as supermarkets and banks in the same area. The users they reach are all residents of the area (same samples), but the businesses are different. (different characteristics).

learning process

Insert image description here

  • The essence of vertical federated learning is the combination of features of cross-users in different business formats, such as supermarket A and bank B. In the traditional machine learning modeling process, the two parts of data need to be concentrated into a data center, and then each The user's characteristics are joined into a piece of data for training the model, so both parties need to have user intersections (modeling based on the join results), and one party needs to have a label. The learning steps are shown in the figure above and are divided into two major steps:
    • step1: Encrypted sample alignment. This is done at the system level, so non-cross users are not exposed at the enterprise awareness level.
    • step2: Align samples for model encryption training:
    • step3: Third party C sends the public key to A and B to encrypt the data that needs to be transmitted;
    • step4: A and B respectively calculate the intermediate results of features related to themselves, and encrypt the interaction to obtain their respective gradients and losses;
    • step5: A and B respectively calculate their encrypted gradients and add masks and send them to C. At the same time, B calculates the encrypted loss and sends it to C;
    • step6: C decrypts the gradient and loss and sends them back to A and B. A and B remove the mask and update the model.

3.3 Federated transfer learning

basic concept

  • When there is little overlap in features and samples between participants, federated transfer learning can be considered. Transfer learning refers to using the similarity between data, tasks, or models to apply models learned in the source domain to the target domain. A kind of learning process, for example: humans have learned to play table tennis, and they can also try to learn tennis, etc. This transfer learning ability
    learning process
    [picture]

  • The entire learning process is to use common samples between A and B to learn the invariant representations of their respective features, and at the same time use all sample labels of A and the invariant features of A to learn the classifier.

4. Application scenarios

Cooperative marketing between financial institutions and operators – customer marketing of financial products

Insert image description here

Joint risk control of financial and government affairs data – risk control of small and micro loan products

Insert image description here

actual case

Insert image description here

WeBank: Multi-party big data privacy computing platform WeDPR—PPC

  • In January 2020, WeBank released WeDPR, an efficient and immediately available scenario-based privacy protection solution. WeDPR integrates blockchain technology and privacy computing technology, allowing sensitive data in actual business scenarios to receive better privacy protection on the blockchain. In May 2021, combining the advantages of blockchain and secure multi-party computing, WeBank launched the multi-party big data privacy computing platform WeDPR-PPC

Ant Chain: Blockchain Network Platform FAIR

  • On October 22, 2021, at the Yunqi Conference, Ant Chain, a subsidiary of Ant Group, launched a new blockchain network platform, FAIR. At present, the FAIR platform has begun to be implemented in the government field and large enterprises, and is being explored in more fields such as finance.
    Insert image description here

Qulian Technology: Financial Industry Data Sharing Platform

  • Qulian Technology cooperates with central bank branches and banks, uses blockchain + privacy computing technology to design a data reporting model, successfully implements a financial industry data sharing platform in Nanchang, Jiangxi, establishes a joint financing credit reporting platform, and solves the problem of institutional data Shared issues.

Eight Components: Government Tax Data Platform

  • When the tax department supervises the tax data compiled by various enterprises, it cannot accurately identify whether the tax information (such as invoices) is fraudulent and whether there are real transactions behind it. Eight components provide a tax data platform based on privacy computing and cross-chain to solve the problems of data security, data sharing, data circulation and data verification between enterprises.
    Insert image description here

Nebula Gene: Oasis Network Framework

  • Patient data in the medical industry is highly private, and there is currently a lack of a data system that records patients' complete medical information. Nebula Gene uses the Oasis Network's framework so that customers can retain ownership of their genomic data, while Nebula Gene can analyze the data without viewing the customer's original information.

5. Key areas of privacy computing in future development

Insert image description here

Guess you like

Origin blog.csdn.net/xzpdxz/article/details/128812873