Distributed Tensorflow
More with a single card (gpu)
Multi-level multi card (distributed)
Distributed self-realization
API:
1, create a tf.train.ClusterSpec, for all tasks described the cluster, the description is the same for all tasks
2, tf.train.Server create ps, worker and run the appropriate computing tasks
cluster=tf.train.ClusterSpec({"ps":ps_spec,"worker":worker_spec})
ps_spec = ["ps0.example.com:port","ps2.example.com:port"] 对应 /job:ps/task:0,1
worker_spec=["worker0.example.com:port",...] /job:worker/task:10
tf.train.Server(server_orcluster,job_name,task_index=None,protocol_None,config=None,start=True) 创建服务
- server_or_cluster: cluster description
- job_name: Task Type Name
- task_index: number of tasks
- attributes: target target return tfSession connection to this server
- method: join () parameter server, until the server waits to receive the task parameters Close
tf.device(device_name_or_function)
- Select the specified device or device function
- if device_name
- Specified device
- Rei如 "/ job: worker / tsak: 0 / cpu: 0
- if function
- tf.train.replica_device_setter(worker_device=worker_device,cluster=cluster)
- Action: this function by coordinating the initialization operation on different devices
- worker_device:为指定设备,“job/worker/task:0/cpu:0" or "/job:worker/task:0/gpu:0"
- cluster: Cluster Object Description
- Use with tf.device () so that the different nodes on different devices work