Introduction to FATE - Longitudinal SecureBoost Model

Introduction to FATE - Longitudinal SecureBoost Model

0, demo description

Hetero_secureboost cases used, roles and data: (1) guest: breast_hetero_guest.csv (2) host: breast_hetero_host.csv.

Reference source: https://github.com/FederatedAI/FATE/tree/master/examples/dsl/v2/hetero_secureboost

1. Model training

1. Submit the job for model training using the following command:

flow job submit -c ${runtime_config} -d ${dsl}

#配置文件为:
Binary-Class:
example-data: (1) guest: breast_hetero_guest.csv (2) host: breast_hetero_host.csv
dsl: test_secureboost_train_dsl.json
runtime_config: test_secureboost_train_binary_conf.json

2. Analysis of training details:

Reference source: https://github.com/FederatedAI/FATE/blob/master/examples/experiment_template/user_usage/pipeline_predict_tutorial.md )

https://github.com/FederatedAI/FATE/blob/master/python/fate_client/flow_client/README_zh.rst

The training consists of the following stages, which are performed by the corresponding components:

  1. reader: read raw data;
  2. dateio: convert data to instance samples;
  3. Intersection: Find the intersection of the host and the guest;
  4. HeteroSecureBoost: tree model;
  5. evaluation: Evaluation metrics.

You can view modeling information from http://hostip:8080/ fateboard panel:

image-20210328195207508

Here _0_1 refers to the training and validation datasets, but the examples in the example are all the same data, so they will have the same dataouput.

In fact, the training of the first tree is to use their respective labels.(to be verified)

image-20210328192710335

model output

image-20210328192919093

Data output:

image-20210328193025876

Various indicators of the model:

image-20210328193811542

Model output on the host side:

image-20210328193445268

The host tag is anonymized at the guest.

image-20210328193523205

There is no data output on the host side.

There is no model indicator on the host side.

2. Model prediction

So how do we make predictions with our already trained model?

Reference: https://github.com/FederatedAI/FATE/blob/master/examples/experiment_template/user_usage/dsl_v2_predict_tutorial.md

1. Find the corresponding model_id and model_version through the jobid we just submitted:

flow job config -j 202103270933192332863 -r guest -p 9999 -o ./

image-20210328210923197

2. Deploy the model

flow model deploy --model-id guest-9999#host-9998#model --model-version 202103270933192332863

image-20210328210948645

3. Modify the model_id and model_id of the test_predict_conf.json configuration file, including part_id and role.

4. Use the modified configuration file to submit the task for prediction:

image-20210328211014338

You can see that the DAG predicted by the model seems to be somewhat different:

image-20210328195124362

As mentioned above, read and dataio here are almost the same, of course, this also includes intersection.

The module output of the guest side is also the same as above.

The predictions here are almost the same as seen above, or because the same samples are used in the example cases.

image-20210328194639357

Without loss of generality, let's look at the second tree output by the model:

image-20210328200132958

It can be known that the tree models of the guest side and the host side have the characteristics of each other (in fact, the two trees themselves should be like this, and they are also complementary). Therefore, when the model is used, both trees need to be used at the same time.

Guess you like

Origin blog.csdn.net/qq_40589204/article/details/115287846