"In-depth Understanding of Tensorflow Architecture Design and Implementation Principles" Chapter 3 Reading Notes

1. Become a Paradigm: Data Flow Diagram 

Declarative programming and imperative programming are two common paradigms. Among them, declarative programming is good at application fields based on mathematical logic, such as deep learning, artificial intelligence, symbolic computing systems, etc.; imperative programming is good at complex business logic fields, such as interactive UI programs. Tensorflow adopts the declarative programming method, which has the characteristics of strong code readability, support for referential transparency (Tensorflow has many built-in functions, which can be directly used for calculation, and users do not need to program from scratch), and provides the ability to precompile and optimize.

Tensorflow defines a data flow graph as a directed acyclic graph that describes mathematical operations with nodes and directed edges . Deep learning problems based on gradient descent can usually be divided into two calculation stages: forward graph evaluation and backward graph gradient calculation. The forward graph is written by the user himself; the backward graph is automatically generated by the Tensorflow optimizer. The main function is to use the gradient to update the corresponding model parameters.

Main concepts in a data flow diagram:

1. Nodes: The nodes of the forward graph include: mathematical functions or expressions (add, matmul), storage type parameter variables ( Variable ), placeholders (placeholder); backward graph nodes include: gradient values, Zengxin parameter models , the updated model parameters.

2. Directed edge: used to define the relationship between operations, divided into two categories according to the role, one is used to transmit data, and the other is used to define control dependencies.

3. Execution principle

2. Data carrier: Tensor

Tensorflow provides two tensor abstractions, Tensor and SparseTensor, which represent dense data and sparse data, respectively. Tensorflow uses reference counting to determine whether the memory buffer of tensor data should be released.

2.1 Tensor Tensor

1. Create

The parameters of the constructor of the Tensor class are as follows: dtype (the type of the tensor to transmit data), name (the name of the tensor in the data flow), graph (the data flow graph to which the tensor belongs), op (to generate the tensor) pre-operation), shape (the shape of the tensor transfer data), value_index (the index of the tensor in all output values ​​of this pre-operation).

import tensorflow as tf
a=tf.constant(1.0)
b=tf.constant(2.0)
c=tf.add(a,b)

2. Solve

Solving needs to create a session, the following are two ways to create a session:

with tf.Session() as sess:
    print(c.eval())
    print(sess.run([a,b,c])
#or
sex = tf.InterativeSession ()
print(c.eval())
print(sess.run([a,b,c])

3. Member methods

eval()#Remove the tensor value
get_shape()#Get tensor shape
set_shape()#Modify tensor shape
consumers()#The post operation of getting tensors

2.2 Sparse Tensor SparseTensor

SparseTensor represents high-dimensional sparse matrices in the form of key-value pairs. Sparse matrices also have a corresponding series of operations.

 
 
#The matrix represented by this example is [[0,0,1,0],[0,0,0,2],[0,0,0,0]]
import tensorflow as tf
sp=tf.SparseTensor(indices=[[0,2],[1,3]],values=[1,2],dense_shape=[3,4])
#indices represents the position of non-zero elements; values ​​represents the value of non-zero elements; dense_shape represents the true size of the matrix.

3. Model Carrier: Manipulation

Tensorflow's algorithm model is represented by a data flow graph. A data flow graph consists of nodes and directed edges. Nodes are divided into the following three categories: compute nodes, storage nodes, and data nodes.

3.1 Compute Node Operation

The computing operation abstraction corresponding to the computing node is the Operation class. The properties of the compute node are as follows:

name #The name of the operation data stream
type #The type name of the operation
inputs #The input of the operation
control_inputs #input control dependency list
outputs #List of output tensors
device #The device (cpu or gpu) used for operation execution
graph #The data stream to which the operation belongs
traceback #Operate the stack of instantiation calls

The following takes the add operation as an example to illustrate how the child nodes inside the variable operate.

c=tf.add(a,b,name='add')

The variable a consists of four child nodes (a), Assign, read and initial_value. When c=tf.add(a,b,name='add') is executed, add calls the read child node inside the a subgraph to convert a into a scalar (a tensor of rank 0) and transmit it to the add operation. Before the data flow calculation starts, the user usually executes the tf.global_variables_initializer function for global initialization. Its essence is to pass the initial_value into the Assign child node, and implement the initial assignment to the variable.

3.2 Storage Node Variable

1. Variables

The storage node abstraction on the Tensorflow dataflow graph is the Variable class. Often called a variable. The properties provided by the constructor of the Variable class are as follows:

name #The name of the variable in the data stream
dtype #The data type of the variable
shape #The shape of the variable
initial_value #The initial value of the variable
initializer #Initialization operation for assigning values ​​to variables before calculation
device #The storage device of the variable
graph #The data flow graph to which the variable belongs
op #variable operation

When constructing variables, you need to specify the shape and dtype. The two construction methods of Variable are: using the initial value construction, and using the variable construction defined by Protocol Buffers. The two initialization methods are implemented in the private member methods _init_from_args and _init_from_proto of the Variable class respectively.

3.3 Data Node: Placeholder

Tensorflow's dataflow graph does not perform any operations until the user populates it with data. Tensorflow's data nodes are implemented by placeholder operators, and its corresponding operation function is tf.placeholder. For sparse data, Tensorflow also provides a sparse placeholder operator, and the corresponding function is tf.sparse_placeholder. The constructor parameters for these two functions are as follows:

name #The name of the placeholder operator in the data flow graph
dtype #fill data type
shape #fill the shape of the data

The following uses tensor and sparsetensor to illustrate how to fill in placeholders.

import tensorflow as tf
import numpy as np
with tf.name_scope("PlaceholderExample"):
    x=tf.placeholder(tf.float32,shape=(2,2),name='x')#declare a placeholder
    y=tf.matmul(x,x,name='matmul')
    with tf.Session() as sess:
        rand_array=np.random.rand(2,2)
        #Use the feed_dict (fill dictionary) parameter, call the numpy library to generate random numbers np.rand.rand_array to fill x
        print(sess.run(y,feed_dict={x:rand_array}))
import tensorflow as tf
import numpy as np
x=tf.sparse_placeholder(tf.float32)
y=tf.sparse_reduce_sum(x)
with tf.Session() as sess:
    indices=np.array([[3,2,1],[4,5,0]],dtype=np.int64)
    values=np.array([1.0,2.0],dtype=np.float32)
    shape=np.array([7,9,2],dtype=np.int64)
    #first way
    tmp=sess.run(y,feed_dict={x:(indices,values,shape)})
    print(tmp)
    #Second way
    print(sess.run(y,feed_dict={x:tf.SparseTensorValue(indices,values,shape)}))
    #third way
    sp=tf.SparseTensor(indices=indices,values=values,dense_shape=shape)
    sp_value=sp.eval()
    print(sess.run(y,feed_dict={x:sp_value}))

4 Operating Environment: Session

Three steps for session: create session, run session, close session. The parameters of the Session constructor are as follows:

target #Execution engine of session connection
graph #The data flow graph loaded by the session, when the user defines multiple data flow graphs in the code, the graph needs to be explicitly specified
config #Configuration of session startup

The method used by the Session class to run a session is run. The parameter table for this method is as follows:

fetches #Tensor or operation with solution
feed_dict #Data fill dictionary
options #RunOptions object, used to set optional feature switches at session runtime
run_metadata #RunMetadata object, used for tensor meta information output when the mobile phone session is running

In addition to solving with Session.run, you can also use Tensor's eval method.

Close the session using sess.close()

Interactive session: sess=tf.InteractiveSession()

5. Training Tool: Optimizer

Machine learning is roughly divided into three categories, namely supervised learning, unsupervised learning and semi-supervised learning. A typical supervised learning consists of three parts, namely the model, the loss function and the optimization method. The mainstream supervised learning models mainly use gradient descent-based optimization algorithms for magic training.

5.1 Loss function

Common loss functions are square loss function, cross entropy loss function and knowledge loss function. The loss function is non-negative, and the smaller the loss function, the better the fit. But if you pursue the minimum value of the loss function too much, it may lead to the problem of overfitting. Therefore, it is necessary to introduce a regularization term or a penalty term. At this time, the optimization objective is


5.2 Overview of the Optimizer

//In fact, I don't really understand the back propagation algorithm here.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325562941&siteId=291194637