Chain paddle PaddleDTX series - basic concept introduction

PaddleDTX includes three important modules: blockchain, decentralized storage, and privacy computing. In order to understand its operating principle, you need to understand some basic concepts.

1. Concepts related to blockchain

Blockchain: Blockchain can be understood as a new type of distributed database. The entire network agrees on the status of ledgers and transactions through specific consensus mechanisms (such as pow, pos, vrf, etc.). Blockchain uses passwords such as hash The learning mechanism ensures that the data on the chain cannot be tampered with. All blockchain full nodes save the complete data information of the blockchain, even if one party tampers with the data, it will not be recognized by other nodes. PaddleDTX supports XuperChain and Fabric as the underlying blockchain architecture.

Smart contract: A smart contract is a computer protocol that runs on the blockchain and is designed to disseminate, verify or enforce contracts in an informational manner. The decentralized governance of nodes in PaddleDTX, the proof mechanism for keeping copies of data, and the management of computing tasks are all based on smart contracts.

2. Concepts related to decentralized storage

PaddleDTX's storage network has three main types of nodes:

  • Data holding node: the owner of the data, who has the need to store data;
  • Storage nodes: There are abundant idle storage resources, which can provide storage services;
  • Blockchain nodes: constitute a blockchain network, based on different blockchain frameworks, they have different definitions.

Proof of copy retention: In order to ensure that files are safely stored by storage nodes and have not been tampered with, PaddleDTX adopts a copy retention proof challenge and response mechanism. The data holder regularly initiates a data integrity proof challenge to the storage node, and the storage node responds to the challenge. The entire process is recorded on the blockchain and automatically verified through the smart contract.

File migration: In order to ensure the security and high availability of files, the data holder will regularly check the health status of its own files, and migrate unhealthy file slices from unhealthy storage nodes to healthy nodes to ensure that each file is healthy and A state that can be restored at any time.

Health: PaddleDTX supports health status monitoring of storage nodes and files to ensure high availability of the system

  • Storage node health status: The storage node health status is measured according to the activity of the node and the success ratio of the replica maintenance certificate response. When the file is distributed, the healthy storage node will be selected first;
  • File health status: The file health status is measured according to the health status of each slice, and the health status of each slice is determined by the health status of the storage node where it resides.

3. Concepts related to privacy computing

PaddleDTX's computing network has three main types of nodes:

  • Computing demand node: There are training model and forecasting needs;
  • Task execution node: has the authority to use data, participates in multi-party security calculations, and conducts model training and data prediction;
  • Blockchain nodes: constitute a blockchain network, based on different blockchain frameworks, they have different definitions.

There are two computing tasks in the PaddleDTX network:

  • Model training task: obtain the target model through training;
  • Prediction task: Obtain the target value of the data through prediction.

Model evaluation: PaddleDTX supports distributed evaluation of the trained model effect, and supports two modes of dynamic evaluation and static evaluation

  • Dynamic model evaluation: Dynamic evaluation is performed simultaneously with the training task, triggering model evaluation at a specified training stage, and obtaining evaluation indicators after the current stage of training is over. During the training process, the evaluation results of the model at each stage can be obtained to determine whether to terminate the training; at the end of the training task, a series of evaluation indicators can be obtained to show the changing trend of the training effect. PaddleDTX supports the dynamic model evaluation method of random partition;
  • Static model evaluation: Static evaluation is performed after the training task. The evaluator will divide the training samples, create and run the training task and the corresponding prediction task, and finally calculate the evaluation index. PaddleDTX supports three static evaluation methods: random partition, cross-validation and leave-one-out method.

Dataset: The training samples and prediction data sets in PaddleDTX are stored in the centralized storage network in the form of files, and are specified by computing demand nodes when publishing training tasks or prediction tasks.

PaddleDTX has open sourced three machine learning algorithms:

  • Multiple linear regression: multiple linear regression is used to describe a variable is affected by multiple factors, and their relationship can be expressed by multiple linear equations;
  • Multiple logistic regression: The multiple logistic regression model is obtained based on the change of the linear regression model. Unlike multiple linear regression, the target feature value of multiple logistic regression is discrete, usually defined as {1,0}, respectively indicating whether the target feature is specified value;
  • Neural network: A neural network is an operational model composed of a large number of nodes (or called neurons) connected to each other, which can theoretically approximate any function.

PaddleDTX has carried out the transformation of vertical federated learning for all three algorithms, and does not support horizontal federated learning for the time being.

  • Vertical federated learning: Participants have more sample overlap and less feature overlap. Split the samples vertically, and take out the part of the data with the same samples but different characteristics for training;
  • Horizontal federated learning: The characteristics of the participants overlap more, while the samples overlap less. The sample is divided horizontally, and the part of the data with the same characteristics but different samples is taken out for training.

Guess you like

Origin blog.csdn.net/weixin_40862140/article/details/126428086