Table of contents
2. Verify whether the deployment is successful
3. Simple training and reasoning
1. Preparation work
- Two hosts (physical machine or virtual machine, Ubuntu or Centos7 system, allowing login as root user)
- Install Docker on all hosts
- Install Docker-Compose on all hosts
- The deployment machine can be connected to the Internet, so the hosts can communicate with each other.
- The running machine has downloaded the component images of FATE
How to install docker and docker-compose, and how to download the FATE image are introduced in the previous article.
The two machines here are both virtual machines, CentOS 7 system, here they are called machine A and machine B. Machine A serves as both a deployment machine and a target machine. The IP address of machine A is 192.168.16.129, and the IP address of machine B. is 192.168.16.130, all logged in as root.
2. Deployment operations
1. Generate the deployment script file and deploy it (operate on the deployment machine, which is machine A)
//下载并解压Kubefate1.3的kubefate-docker-compose.tar.gz资源包
# curl -OL https://github.com/FederatedAI/KubeFATE/releases/download/v1.3.0/kubefate-docker-compose.tar.gz
# tar -xzf kubefate-docker-compose.tar.gz //进行解压
# cd docker-deploy/ //进入docker-deploy目录
# vi parties.conf //编辑parties.conf配置文件
user=root
dir=/data/projects/fate
partylist=(10000 9999) //此处为两个集群的ID
partyiplist=(192.168.16.129 192.168.16.130) //此处写入两个目标机的IP
servingiplist=(192.168.16.129 192.168.16.130) //此处写入两个目标机的IP
exchangeip=
# bash generate_config.sh //生成部署文件
# bash docker_deploy.sh all //执行启动部署集群脚本
//需要输入几次目标机的root密码
2. Verify whether the deployment is successful
Verify on target machines A and B respectively.
# docker ps //集群A(ID 10000)
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6186cc50baa1 federatedai/serving-proxy:1.2.2-release "/bin/sh -c 'java -D…" 14 minutes ago Up 12 minutes 0.0.0.0:8059->8059/tcp, :::8059->8059/tcp, 0.0.0.0:8869->8869/tcp, :::8869->8869/tcp, 8879/tcp serving-10000_serving-proxy_1
870a3048336b federatedai/serving-server:1.2.2-release "/bin/sh -c 'java -c…" 14 minutes ago Up 12 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp serving-10000_serving-server_1
9a594365a451 redis:5 "docker-entrypoint.s…" 14 minutes ago Up 12 minutes 6379/tcp serving-10000_redis_1
44a0df69d2b1 federatedai/egg:1.3.0-release "/bin/sh -c 'cd /dat…" 18 minutes ago Up 17 minutes 7778/tcp, 7888/tcp, 50000-60000/tcp confs-10000_egg_1
22fe1f5e1ec1 federatedai/federation:1.3.0-release "/bin/sh -c 'java -c…" 18 minutes ago Up 17 minutes 9394/tcp confs-10000_federation_1
f75f0405b4bc mysql:8 "docker-entrypoint.s…" 18 minutes ago Up 17 minutes 3306/tcp, 33060/tcp confs-10000_mysql_1
a503e90b1548 redis:5 "docker-entrypoint.s…" 18 minutes ago Up 17 minutes 6379/tcp confs-10000_redis_1
b09a08468ad3 federatedai/proxy:1.3.0-release "/bin/sh -c 'java -c…" 18 minutes ago Up 17 minutes 0.0.0.0:9370->9370/tcp, :::9370->9370/tcp confs-10000_proxy_1
# docker ps //集群B(ID 9999)
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
27262d0be615 federatedai/roll:1.3.0-release "/bin/sh -c 'java -c…" 10 minutes ago Up 9 minutes 8011/tcp confs-9999_roll_1
e0b244d55562 federatedai/meta-service:1.3.0-release "/bin/sh -c 'java -c…" 11 minutes ago Up 10 minutes 8590/tcp confs-9999_meta-service_1
6e249db9451c federatedai/egg:1.3.0-release "/bin/sh -c 'cd /dat…" 12 minutes ago Up 10 minutes 7778/tcp, 7888/tcp, 50000-60000/tcp confs-9999_egg_1
8db5215d3998 mysql:8 "docker-entrypoint.s…" 12 minutes ago Up 11 minutes 3306/tcp, 33060/tcp confs-9999_mysql_1
d16f4c43fb05 federatedai/proxy:1.3.0-release "/bin/sh -c 'java -c…" 12 minutes ago Up 11 minutes 0.0.0.0:9370->9370/tcp, :::9370->9370/tcp confs-9999_proxy_1
b5062d978a12 federatedai/federation:1.3.0-release "/bin/sh -c 'java -c…" 12 minutes ago Up 11 minutes 9394/tcp confs-9999_federation_1
ad673a6e2c4a redis:5 "docker-entrypoint.s…" 12 minutes ago Up 11 minutes 6379/tcp confs-9999_redis_1
3. Connectivity verification
Run the following command on the deployment machine (Machine A)
# docker exec -it confs-10000_python_1 bash //进入部署机的python容器
# cd /data/projects/fate/python/examples/toy_example //进入测试脚本文件夹
# python run_toy_example.py 10000 9999 1 //运行测试脚本,1代表多机
A successful test will return the following content
"2019-08-29 07:21:25,353 - secure_add_guest.py[line:96] - INFO: begin to init parameters of secure add example guest"
"2019-08-29 07:21:25,354 - secure_add_guest.py[line:99] - INFO: begin to make guest data"
"2019-08-29 07:21:26,225 - secure_add_guest.py[line:102] - INFO: split data into two random parts"
"2019-08-29 07:21:29,140 - secure_add_guest.py[line:105] - INFO: share one random part data to host"
"2019-08-29 07:21:29,237 - secure_add_guest.py[line:108] - INFO: get share of one random part data from host"
"2019-08-29 07:21:33,073 - secure_add_guest.py[line:111] - INFO: begin to get sum of guest and host"
"2019-08-29 07:21:33,920 - secure_add_guest.py[line:114] - INFO: receive host sum from guest"
"2019-08-29 07:21:34,118 - secure_add_guest.py[line:121] - INFO: success to calculate secure_sum, it is 2000.0000000000002"
In this way, the FATE federated learning environment between the two machines is completed.
3. Verify Serving-Service function
Use the two deployed FATE clusters for simple training and inference testing. The data set used for training is "breast", which is a simple test data set that comes with FATE. It is placed in "examples/data" and is divided into There are two parts "breast_a" and "breast_b". The host participating in the training holds "breast_a", while the guest holds "breast_b". The guest and host jointly perform logistic regression training on the data set. The finally trained model is pushed to FATE Serving for online inference.
1. Upload data
The following operations are performed on machine A
# docker exec -it confs-10000_python_1 bash //进入python容器
# cd fate_flow //进入fate_flow目录
# vi examples/upload_host.json //修改上传配置文件
{
"file": "examples/data/breast_a.csv",
"head": 1,
"partition": 10,
"work_mode": 1,
"namespace": "fate_flow_test_breast",
"table_name": "breast"
}
//将“breast_a.csv”上传到系统中
# python fate_flow_client.py -f upload -c examples/upload_host.json
The following operations are performed on machine B
# docker exec -it confs-9999_python_1 bash //进入python容器
# cd fate_flow //进入fate_flow目录
# vi examples/upload_guest.json //修改上传配置文件
{
"file": "examples/data/breast_b.csv",
"head": 1,
"partition": 10,
"work_mode": 1,
"namespace": "fate_flow_test_breast",
"table_name": "breast"
}
//将“breast_b.csv”上传到系统中
# python fate_flow_client.py -f upload -c examples/upload_guest.json
2. Carry out training
# vi examples/test_hetero_lr_job_conf.json //修改训练用配置文件
{
"initiator": {
"role": "guest",
"party_id": 9999
},
"job_parameters": {
"work_mode": 1
},
"role": {
"guest": [9999],
"host": [10000],
"arbiter": [10000]
},
"role_parameters": {
"guest": {
"args": {
"data": {
"train_data": [{"name": "breast", "namespace": "fate_flow_test_breast"}]
}
},
"dataio_0":{
"with_label": [true],
"label_name": ["y"],
"label_type": ["int"],
"output_format": ["dense"]
}
},
"host": {
"args": {
"data": {
"train_data": [{"name": "breast", "namespace": "fate_flow_test_breast"}]
}
},
"dataio_0":{
"with_label": [false],
"output_format": ["dense"]
}
}
},
....
}
//提交任务对上传的数据集进行训练
# python fate_flow_client.py -f submit_job -d examples/test_hetero_lr_job_dsl.json -c examples/test_hetero_lr_job_conf.json
//输出结果
{
"data": {
"board_url": "http://fateboard:8080/index.html#/dashboard?job_id=2022041901241226828821&role=guest&party_id=9999",
"job_dsl_path": "/data/projects/fate/python/jobs/2022041901241226828821/job_dsl.json",
"job_runtime_conf_path": "/data/projects/fate/python/jobs/2022041901241226828821/job_runtime_conf.json",
"logs_directory": "/data/projects/fate/python/logs/2022041901241226828821",
"model_info": {
"model_id": "arbiter-10000#guest-9999#host-10000#model",
"model_version": "2022041901241226828821"
}
},
"jobId": "2022041901241226828821",
"retcode": 0,
"retmsg": "success"
}
//用命令查看训练进度,直到全部success,此处-j后跟的是上面的jobId
# python fate_flow_client.py -f query_task -j 2022041901241226828821 | grep f_status
3. View training results
Open the browser at 127.0.0.1:8080 to enter fateboard to view the visualization task training results.
4. Delete deployment
If the deployment needs to be removed, all FATE clusters can be stopped by running the following command on the deployment machine:
# bash docker_deploy.sh --delete all
If you want to completely delete FATE deployed on the running machine, you can log in to the nodes separately and then run the command:
# cd /data/projects/fate/confs-<id>/ //此处的ID就是集群的ID
# docker-compose down
# rm -rf ../confs-<id>/