TensorFlow real study notes (a)

1.1 TensorFlow Summary

TensorFlow Google is the second generation of distributed machine learning system, open source on GitHub in November 2015, and complements the distributed version in April 2016, and the January 2017 release of the 1.0 version of the preview, API interfaces trend in the stable. Currently TensorFlow is still in rapid development iteration.
    TensorFlow official website address: www.tensorflow.org
    GitHub URL: github.com/tensorflow/tensorflow
    model warehouse address: github.com/tensorflow/models
    TensorFlow implement an interface that is a machine learning algorithms, but also the implementation of the framework of machine learning algorithms. It supports front-end Python, C ++, Go, Java and other development languages, backend C ++, CUDA and other languages. TensorFlow algorithm implementation can be easily ported on a number of heterogeneous systems, such as Android phones, iPhone, ordinary CPU server, and even large-scale GPU clusters. In addition to performing deep learning algorithm, TensorFlow can be used for many other algorithms, including linear regression, logistic regression, random forests.

1.2 TensorFlow Introduction to Programming Model

1.2.1 Core Concepts

    TensorFlow calculation may be represented as a directed graph (directed graph), or referred to in FIG calculation (computation graph), wherein each arithmetic operation (operation) will act as a node (node), referred to as edge connections between nodes (edge). This calculation process is described calculation map data, it is also responsible for maintaining and updating status, the user can control the conditions of the operation or calculation cycle branch FIG. FIG calculation Each node can have any number of inputs and outputs, each node may describe an arithmetic operation, a node may be instantiated arithmetic operation (instance). In the calculation of the flow of FIG sides (Flow) data is referred to as tensor (tensor), so named TensorFlow. And tensor data types may be defined in advance, can be obtained according to the structure of the estimation calculation of FIG. The following is a design example of a computation graph in Python and executed.

import tensorflow as tf
b = tf.Variable(tf.zeros([100])) #生成100维的向量,初始化为0
w = tf.Variable(tf.random_uniform([784,100],-1,1)) #生成784*100的随机矩阵W
x = tf.placeholder(name = "x") #输入的Placeholder
relu = tf.nn.relu(tf.matmul(W,x)+b)  #RELU(Wx+b)
C = [...]   #根据Relu函数的结果计算Cost
s = tf.Session()
for step in range(0,10):
	input = ...construct 100-D input array...  #为输入创建一个100维的向量
	result = s.run(c, feed_dict = {x:input})  #获取Cost,供给输入x
	print(step,result)

An arithmetic operation represents a type of abstract operations, such as matrix multiplication or vector addition. Arithmetic operations can have their own attributes, but all the necessary properties is set in advance, or can be inferred when creating calculating FIG. Operating core (Kernel) is an arithmetic operation implemented in a specific hardware (CPU / GPU, etc.). In TensorFlow can be added by registering a new arithmetic operation or mechanism operating core. Table 1-2 shows the arithmetic operation portion TensorFlow built.
Session is an interactive interface to the user when using Tensorflow. Users can add new methods of Session Extend by nodes and edges to create a calculation map, and then calculates FIG run method may be performed by the Session: to be calculated given by the user node, while providing input data, will automatically find Tf of the All nodes to be calculated in accordance dependent sequential execution thereof. For most users, they will create a calculation map, then repeatedly perform the entire calculation chart diagram or a portion thereof. In most operations, the calculation map will be executed repeatedly, and the data is tensor and not persist, but the figure in the calculation over again.

1.2.2 The principle

(1) working components

Client: The client connected through the Session interface master and a plurality of worker.

Worker: worker can connect to multiple hardware devices, such as CPU or GPU, and is responsible for managing the hardware.

Master: master is responsible for directing all the worker performs the calculation according to the process diagram.

TensorFlow each worker can manage multiple devices, each device includes hardware category name, number, task number (no stand-alone version).

Stand-alone mode: / job: localhost / device: cpu: 0
Distributed Mode: / job: worker / task: 17 / device: gpu: 3

(2) Operation mode

Stand-alone mode: FIG calculation is performed in the order dependencies. When all upstream nodes have a dependent been executed (dependent on the number of 0), the node will be added to ready queue waiting for execution; Meanwhile, it is dependent on the number of all nodes downstream of minus 1, in fact, this is the standard computing topological order of the way.

Distributed Mode: design a strategy for the distribution device node. This strategy first need to calculate a cost model, this model estimates the cost of each input node, the size of the output tensor, and the computation time required. Heuristic rules that are part cost model developed by human experience obtained, the other part is a small part of the actual operation data obtained measured. When the device is determined to be assigned to a program node, the entire computation graph will be divided into a number of sub-picture, using the same apparatus and adjacent node are divided into the same sub FIG.
At the same time, from stand-alone version of the transformation of a single stand-alone device for multi-device version is also very easy to add the following code only this line in bold, to achieve the transformation from a multiple GPU to GPU training training.

Fault tolerance (3) of distributed TensorFlow

Fault-tolerant distributed TensorFlow is also a feature. Failure can be detected in both cases.

① information transmitted from the sending node to the receiving node fails.

② worker periodic heartbeat failure.

When a fault is detected, the entire calculation is terminated and restarted in FIG.

1.2.3 expansion capabilities

(1) TensorFlow native support for automatic derivation.

(2) TensorFlow also support the implementation of sub-picture alone, the user can select any of the sub-graph of FIG calculation, input data and along some of the edges, while obtaining the output from the other side.

(3) TensorFlow support control flow chart of the calculation, such as: if-condition and while-loop, because most machine learning algorithms require iterative, so this is very important.

(4) TensorFlow data entry except through feed node, there are special input node allows users to directly enter the path to the file system, such as a file path to the Google Cloud Platfrom.

(5) An important property of a queue (Queue) is TensorFlow task scheduling, this feature allows different nodes perform asynchronous computation graph.

(6) of the container (Container) is TensorFlow in a special mechanism for managing long-term variables, such as Variable object is stored in the container.

1.2.4 Performance Optimization

(1) TensorFlow supports several highly optimized calculation of third-party libraries.

Linear algebra library: Eigen

Matrix multiplication libraries: BLAS, cuBLAS (CUDA BLAS)

Depth study Computing Base: cuda-convnet, cuDNN

(2) TensorFlow parallel computing model provides three different acceleration neural network training.

Parallel Data: calculated by dividing a mini-batch data on different devices, the gradient is calculated to achieve parallelization. Calculating points may also be synchronous, asynchronous, and mixing three ways. There is no advantage synchronous interference gradient disadvantage is that fault tolerance is poor, a problem to re-run the machine. Asynchronous advantage that a certain degree of fault tolerance, but by the gradient of the interference problem, resulting in the utilization efficiency of each set of gradient fell. In general, the model synchronization accuracy of better training.

Parallel Model: The different parts of the computation graph on different computing devices, the simple model may be implemented in parallel, each of which is aimed at reducing a training iteration time, different from the training data simultaneously in parallel multiple data.

Parallel lines: parallel and asynchronous data like, but is realized in parallel on the same hardware device. The general idea is to make pipeline computing, on a device continuously performed in parallel, improving equipment utilization.

References:
"TensorFlow real"
https://blog.csdn.net/program_developer/article/details/78861954

Guess you like

Origin blog.csdn.net/xavier_muse/article/details/90613033