_1_ deep learning neural network _4_ distributed Tensorflow

Distributed Tensorflow

More with a single card (gpu)

Multi-level multi card (distributed)

Distributed self-realization

API:

1, create a tf.train.ClusterSpec, for all tasks described the cluster, the description is the same for all tasks

2, tf.train.Server create ps, worker and run the appropriate computing tasks

  • ​ cluster=tf.train.ClusterSpec({"ps":ps_spec,"worker":worker_spec})

    ​ ps_spec = ["ps0.example.com:port","ps2.example.com:port"] 对应 /job:ps/task:0,1

    ​ worker_spec=["worker0.example.com:port",...] /job:worker/task:10

  • tf.train.Server(server_orcluster,job_name,task_index=None,protocol_None,config=None,start=True) 创建服务

    • server_or_cluster: cluster description
    • job_name: Task Type Name
    • task_index: number of tasks
    • attributes: target target return tfSession connection to this server
    • method: join () parameter server, until the server waits to receive the task parameters Close
  • tf.device(device_name_or_function)

    • Select the specified device or device function
    • if device_name
      • Specified device
      • Rei如 "/ job: worker / tsak: 0 / cpu: 0
    • if function
      • tf.train.replica_device_setter(worker_device=worker_device,cluster=cluster)
      • Action: this function by coordinating the initialization operation on different devices
      • worker_device:为指定设备,“job/worker/task:0/cpu:0" or "/job:worker/task:0/gpu:0"
      • cluster: Cluster Object Description
    • Use with tf.device () so that the different nodes on different devices work

Guess you like

Origin www.cnblogs.com/Dean0731/p/11815986.html