TensorFlow Architecture and Design: OP Essentialism

Author: Liu Guangcong 

ZTE senior system architect, focusing on machine learning algorithms, distributed system architecture and optimization. 

Original:  TensorFlow Architecture and Design: OP Essentialism

---------------------------------------------------------------------------------------------------------------------------------------------------------------

The system structure of TensorFlow is bounded by the C API, and the entire system is divided into two subsystems, "front-end" and "back-end". The front-end system plays the role of Client, completes the construction of the computation graph, forwards the Protobuf format GraphDefto the Master of the back-end system, and starts the execution process of the computation graph.

Finally, the Master splits the graph and registers the sub-picture segments to the Worker through the RegisterGraphinterface . GraphDefTherefore, it GraphDefis a knowledge model that describes the calculation graph, and the entire TensorFlow calculation process is developed around GraphDefit.


domain model

The unit of computation in TensorFlow is the OP, which represents some kind of abstract computation. This chapter first describes NodeDef, OpDefthe metadata model, and then describes the flow of metadata through a simple example.

metadata

OP means some kind of abstract computation that has 0 or more "inputs/outputs", and 0 or more "properties". Among them, the input/output exists in the form of Tensor.

In the system implementation, the metadata of the OP is OpDefdescribed in the Protobuf format to realize the data exchange between the front-end and the back-end, and the unification of its domain model.


Definition of OpDef

Definition of OpDef

The OpDef definition includes the OP's name, input and output list, attribute list, optimization options, etc. Among them, properties are often used to describe the type, size, default value, constraints, and other characteristics of the OP.


OpDef said
OP naming

OP is indexed by name, so OP's name must be guaranteed to be globally unique. According to the specification, the OP's name uses the "CamelCase" naming style, while the Python front-end uses the "lowercase underscore" naming style. The latter, also often referred to as "OP constructors", are also programming interfaces (APIs) exposed to users.

Also, OPs starting with an underscore are reserved by the system internal implementation. For example, _Send, _Recvthey are used for the OP of inter-device communication; _Source, _Sinkidentifying the start and end nodes of the computation graph.

input Output

The input/output of OP exists in the form of Tensor, and there are the following 4 cases.

  • 0 Tensors
    • zero input
    • zero output
  • 1 Tensor
    • Type determination
    • type indeterminate
  • Multiple Tensors
    • same type
    • not the same type

Relative to the OP's properties, the OP's input is dynamic, and its value changes every time it iterates (Step).

Attributes

An OP can have a "property set" that describes the type, size, default value, constraints, and other characteristics of the OP's input and output. Among them, when the calculation graph is constructed, the attribute value (AttrValue) is determined (carried by NodeDef and passed to the back-end execution system through GraphDef).

That is to say, OP's "property definition" and "property value setting" are two separate processes. Among them, the attribute definition is determined when the OP is registered and described by AttrDef; the attribute value setting is determined when the calculation graph is constructed (when the OP is added to the calculation graph), and it is described by AttrValue.

The OP's properties are static relative to the OP's input. The OP attribute value is determined during the construction of the computational graph, including the type, size, shape, etc. of the input and output, and will not change during the calculation iteration process.

NodeDef definition


NodeDef representation
OP index

NodeDefBy indexing opfrom .OpRegistryOpDef

input list

通过input指定节点的输入列表,它也是构造计算图最重要的知识所在。它存在2种情况,分别表示普通边与控制依赖边。

按照约定,为了解析方便,input列表前面存储普通边,随后存储控制依赖边。

node:src_output

表示此边为普通边,承载Tensor的数据流。其中,node为前驱节点的名称,src_output为前驱节点输出边的索引。特殊地,当src_output为0时,可以略去0

^node

表示该边为控制依赖边。其中,node为前驱节点的名称。

设备规范

通过device可以支持用户自定义设备分配方案。例如,

  • "@other/node": 与other/node节点分配在同一设备;
  • "/job:worker/replica:0/task:1/gpu:3":完整规范
  • "/job:worker/gpu:3":部分规范
  • "":空规范
属性值列表

在计算图的构造期,OP属性值得以确定,包括输入/输出的类型,Shape等信息。OP的属性值承载于OpDefattr属性列表之中。

符号编程

TensorFlow的计算过程是一个延迟计算,是一种典型的基于符号的编程范式。从计算时间轴看,计算过程基本分为2个阶段:

  • 图构造期:负责计算图的构造;
  • 图执行期:负责计算图的执行。

其中,在系统初始化时,系统实现对所有OP进行扫描注册,并保存于OpRegistry之中。

注册OP

理论上,OP的注册发生在系统初始化阶段。后端系统,可以使用REGISTER_OP实用宏注册OP。前端系统,也存在类似的OP注册机制。

使用REGISTER_OP注册OP过程,实际上是一个REGISTER_OP描述到OpDef表示的翻译过程。OpDefBuilder通过链式调用InputOutputAttr方法分别构造OP的输入、输出列表,及其属性列表。最后,通过调用Finalize成员函数,经过解析字符串表示,将其翻译为OpDef的内在表示,最后注册到OpRegistry之中。


OP构建过程

例如,REGISTER_OP("ZerosLike")向系统注册了一个zeros_like的OP,在运行时实现了OpDef的翻译表达。


OP注册
构造OP

在前端,用户使用OP构造器实现OP的构造,并将OP注册到计算图中。在计算图构造期间,OP的输入/输出的类型,Shape得以确定,OP属性值也得以确定。

计算图的构造过程,实际上就是GraphDef定义过程。其中,OP的属性值承载于NodeDef,计算图构造期间,NodeDef的属性值得以确定。

在计算图执行启动时,通过调用Session.run,将整个GraphDef传递给后端,并启动计算图的执行。例如,存在如下的计算图构造过程:

tensor = tf.constant([1, 2], name="n1")
zeros  = tf.zeros_like(tensor, name="n2")

ZerosLike的上游节点为n1,其src_output=0输出边流入ZerosLike。此时,ZerosLike的属性T的值自动推演为DT_INT32,两个节点构造了一个简单的计算图。


OP构造
执行OP

在计算图执行期间,输入由上游OP流入得以确定,根据特定设备类型,输入输出类型,多态选择合适的Kernel实现,并启动Kernel的计算过程。

例如,如果zeros_like上游输入为[1, 2, 3, 4],进过zeros_like的OP运算,输出为[0, 0, 0, 0]


OP执行

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324960090&siteId=291194637