1. Become a Paradigm: Data Flow Diagram
Declarative programming and imperative programming are two common paradigms. Among them, declarative programming is good at application fields based on mathematical logic, such as deep learning, artificial intelligence, symbolic computing systems, etc.; imperative programming is good at complex business logic fields, such as interactive UI programs. Tensorflow adopts the declarative programming method, which has the characteristics of strong code readability, support for referential transparency (Tensorflow has many built-in functions, which can be directly used for calculation, and users do not need to program from scratch), and provides the ability to precompile and optimize.
Tensorflow defines a data flow graph as a directed acyclic graph that describes mathematical operations with nodes and directed edges . Deep learning problems based on gradient descent can usually be divided into two calculation stages: forward graph evaluation and backward graph gradient calculation. The forward graph is written by the user himself; the backward graph is automatically generated by the Tensorflow optimizer. The main function is to use the gradient to update the corresponding model parameters.
Main concepts in a data flow diagram:
1. Nodes: The nodes of the forward graph include: mathematical functions or expressions (add, matmul), storage type parameter variables ( Variable ), placeholders (placeholder); backward graph nodes include: gradient values, Zengxin parameter models , the updated model parameters.
2. Directed edge: used to define the relationship between operations, divided into two categories according to the role, one is used to transmit data, and the other is used to define control dependencies.
3. Execution principle
2. Data carrier: Tensor
Tensorflow provides two tensor abstractions, Tensor and SparseTensor, which represent dense data and sparse data, respectively. Tensorflow uses reference counting to determine whether the memory buffer of tensor data should be released.
2.1 Tensor Tensor
1. Create
The parameters of the constructor of the Tensor class are as follows: dtype (the type of the tensor to transmit data), name (the name of the tensor in the data flow), graph (the data flow graph to which the tensor belongs), op (to generate the tensor) pre-operation), shape (the shape of the tensor transfer data), value_index (the index of the tensor in all output values of this pre-operation).
import tensorflow as tf a=tf.constant(1.0) b=tf.constant(2.0) c=tf.add(a,b)
2. Solve
Solving needs to create a session, the following are two ways to create a session:
with tf.Session() as sess: print(c.eval()) print(sess.run([a,b,c]) #or sex = tf.InterativeSession () print(c.eval()) print(sess.run([a,b,c])
3. Member methods
eval()#Remove the tensor value get_shape()#Get tensor shape set_shape()#Modify tensor shape consumers()#The post operation of getting tensors
2.2 Sparse Tensor SparseTensor
SparseTensor represents high-dimensional sparse matrices in the form of key-value pairs. Sparse matrices also have a corresponding series of operations.
#The matrix represented by this example is [[0,0,1,0],[0,0,0,2],[0,0,0,0]] import tensorflow as tf sp=tf.SparseTensor(indices=[[0,2],[1,3]],values=[1,2],dense_shape=[3,4]) #indices represents the position of non-zero elements; values represents the value of non-zero elements; dense_shape represents the true size of the matrix.
3. Model Carrier: Manipulation
Tensorflow's algorithm model is represented by a data flow graph. A data flow graph consists of nodes and directed edges. Nodes are divided into the following three categories: compute nodes, storage nodes, and data nodes.
3.1 Compute Node Operation
The computing operation abstraction corresponding to the computing node is the Operation class. The properties of the compute node are as follows:
name #The name of the operation data stream type #The type name of the operation inputs #The input of the operation control_inputs #input control dependency list outputs #List of output tensors device #The device (cpu or gpu) used for operation execution graph #The data stream to which the operation belongs traceback #Operate the stack of instantiation calls
The following takes the add operation as an example to illustrate how the child nodes inside the variable operate.
c=tf.add(a,b,name='add')
The variable a consists of four child nodes (a), Assign, read and initial_value. When c=tf.add(a,b,name='add') is executed, add calls the read child node inside the a subgraph to convert a into a scalar (a tensor of rank 0) and transmit it to the add operation. Before the data flow calculation starts, the user usually executes the tf.global_variables_initializer function for global initialization. Its essence is to pass the initial_value into the Assign child node, and implement the initial assignment to the variable.
3.2 Storage Node Variable
1. Variables
The storage node abstraction on the Tensorflow dataflow graph is the Variable class. Often called a variable. The properties provided by the constructor of the Variable class are as follows:
name #The name of the variable in the data stream dtype #The data type of the variable shape #The shape of the variable initial_value #The initial value of the variable initializer #Initialization operation for assigning values to variables before calculation device #The storage device of the variable graph #The data flow graph to which the variable belongs op #variable operation
When constructing variables, you need to specify the shape and dtype. The two construction methods of Variable are: using the initial value construction, and using the variable construction defined by Protocol Buffers. The two initialization methods are implemented in the private member methods _init_from_args and _init_from_proto of the Variable class respectively.
3.3 Data Node: Placeholder
Tensorflow's dataflow graph does not perform any operations until the user populates it with data. Tensorflow's data nodes are implemented by placeholder operators, and its corresponding operation function is tf.placeholder. For sparse data, Tensorflow also provides a sparse placeholder operator, and the corresponding function is tf.sparse_placeholder. The constructor parameters for these two functions are as follows:
name #The name of the placeholder operator in the data flow graph dtype #fill data type shape #fill the shape of the data
The following uses tensor and sparsetensor to illustrate how to fill in placeholders.
import tensorflow as tf import numpy as np with tf.name_scope("PlaceholderExample"): x=tf.placeholder(tf.float32,shape=(2,2),name='x')#declare a placeholder y=tf.matmul(x,x,name='matmul') with tf.Session() as sess: rand_array=np.random.rand(2,2) #Use the feed_dict (fill dictionary) parameter, call the numpy library to generate random numbers np.rand.rand_array to fill x print(sess.run(y,feed_dict={x:rand_array}))
import tensorflow as tf import numpy as np x=tf.sparse_placeholder(tf.float32) y=tf.sparse_reduce_sum(x) with tf.Session() as sess: indices=np.array([[3,2,1],[4,5,0]],dtype=np.int64) values=np.array([1.0,2.0],dtype=np.float32) shape=np.array([7,9,2],dtype=np.int64) #first way tmp=sess.run(y,feed_dict={x:(indices,values,shape)}) print(tmp) #Second way print(sess.run(y,feed_dict={x:tf.SparseTensorValue(indices,values,shape)})) #third way sp=tf.SparseTensor(indices=indices,values=values,dense_shape=shape) sp_value=sp.eval() print(sess.run(y,feed_dict={x:sp_value}))
4 Operating Environment: Session
Three steps for session: create session, run session, close session. The parameters of the Session constructor are as follows:
target #Execution engine of session connection graph #The data flow graph loaded by the session, when the user defines multiple data flow graphs in the code, the graph needs to be explicitly specified config #Configuration of session startup
The method used by the Session class to run a session is run. The parameter table for this method is as follows:
fetches #Tensor or operation with solution feed_dict #Data fill dictionary options #RunOptions object, used to set optional feature switches at session runtime run_metadata #RunMetadata object, used for tensor meta information output when the mobile phone session is running
In addition to solving with Session.run, you can also use Tensor's eval method.
Close the session using sess.close()
Interactive session: sess=tf.InteractiveSession()
5. Training Tool: Optimizer
Machine learning is roughly divided into three categories, namely supervised learning, unsupervised learning and semi-supervised learning. A typical supervised learning consists of three parts, namely the model, the loss function and the optimization method. The mainstream supervised learning models mainly use gradient descent-based optimization algorithms for magic training.
5.1 Loss function
Common loss functions are square loss function, cross entropy loss function and knowledge loss function. The loss function is non-negative, and the smaller the loss function, the better the fit. But if you pursue the minimum value of the loss function too much, it may lead to the problem of overfitting. Therefore, it is necessary to introduce a regularization term or a penalty term. At this time, the optimization objective is
5.2 Overview of the Optimizer
//In fact, I don't really understand the back propagation algorithm here.