TensorFlow for deep learning (2): TensorFlow basics

1. TensorFlow system architecture:

  It is divided into device layer and network layer, data operation layer, graph calculation layer, API layer and application layer. The device layer, network layer, data operation layer, and graph computing layer are the core layers of TensorFlow.

 

2. TensorFlow design concept:

 (1) Completely separate the definition of the graph from the operation of the graph. TensorFlow is completely symbolic programming.

    Symbolic computing generally defines various variables first, then establishes a data flow graph, specifies the calculation relationship between each variable in the data flow graph, and finally needs to compile the data flow graph. At this time, the data flow graph is still empty. The shell does not have any actual data in it. Only after the required input is put in, the data flow can be formed in the entire model, thereby forming the output value.

As shown in the following figure: An operation is defined, but does not actually run.

 (2) The operations involved in TensorFlow must be placed in the graph, and the operation of the graph only happens in the session. After the session is opened, the nodes can be filled with data to perform calculations; when the session is closed, calculations cannot be performed. Sessions provide an environment for operations to run and Tensors to evaluate.

A simple example:

 

3. TensorFlow concepts:

 (1) Edges: TensorFlow's edges have two connection relationships: data dependencies (represented by solid lines) and control dependencies (represented by dashed lines). Implementation edges represent data dependencies and represent data, that is, tensors. Data of any dimension is collectively referred to as a tensor. The dotted edge is called a control dependency and can be used to control the operation of the operation. There is no data flow on this type of edge, but the source node must complete the execution before the destination node starts executing.

 (2) Node: Node represents an operation and is generally used to represent the applied mathematical operation.

 (3) Graph: describe the operation task as a directed acyclic graph. Create a graph using the tf.constant() method:

a = tf.constant([1.0,2.0])

 (4) Session: The first step in starting the graph is to create a Session object. Sessions provide some methods of performing operations on the graph. Use the tf.Session() method to create the object and call the run() method of the Session object to execute the graph:

with tf.Session() as sess:
    result = sess.run([product])
    print(result)

 (5) Device: A device refers to a piece of hardware that can be used for operations and has its own address space. Method: tf.device()

 (6) Variable: A variable is a special kind of data that has a fixed position in the graph and does not flow like a normal tensor. Variables are created using the tf.Variable() constructor, which requires an initial value, and the shape and type of the initial value determines the shape and type of the variable.

#Create a variable, initialized to scalar 0 
state = tf.Variable(0, name= " counter " )

 (7) Kernel: A kernel is an implementation of an operation that can run on a specific device (such as CPU, GPU).

 

4. TensorFlow batch normalization:

 Batch normalization (BN) was born to overcome the difficulty in training caused by the deepening of neural network layers.

 Method: Batch normalization is generally used before nonlinear mapping (activation function) to plan x=Wu+b, so that the mean of the result (each dimension of the output signal) is 0 and the variance is 1.

 Usage: When the neural network convergence speed is slow and the gradient explosion cannot be trained, you can try to use batch normalization to solve it.

 

5. Neuron function:

 (1) Activation function: When the activation function is running, a certain part of the neurons in the neural network is activated, and the activation information is passed back to the neural network of the next layer. Introduce several commonly used activation functions.

  a.sigmoid function. sigmoid maps a real value to the (0, 1) interval, which can be used for binary classification.

The method of use is as follows:

a = tf.constant([[1.0,2.0], [1.0, 2.0], [1.0, 2.0]])
sex = tf.Session ()
 print (sess.run (tf.sigmoid (a))

 

  b.softmax function. softmax maps a k-dimensional real value vector (a1, a2 a3, a4, ...) to a (b1, b2, b3, b4, ...) where bi is a 0-1 constant, and then can be determined according to The size of bi is used for multi-classification tasks, such as taking the one dimension with the largest weight.

Function expression:

Function image:

  c.relu function. The relu function can solve the problem that the gradient of the sigmoid function is slow and disappears.

The method of use is as follows:

a = tf.constant([-1.0, 2.0])
with tf.Session() as sess:
    b = tf.nn.relu (a)
     print (sess.run (b))

  d.dropout function. A neuron will decide whether to be inhibited with probability keep_prob. If it is inhibited, the output of the neuron will be 0; if it is not inhibited, the output value of the neuron will be amplified to the original 1/keep_prob times. (can solve the overfitting problem).

The method of use is as follows:

a = tf.constant([[-1.0, 2.0, 3.0, 4.0]])
with tf.Sessin() as sess:
    b = tf.nn.dropout(a, 0.5, noise_shape=[1,4])
    print(sess.run(b))

 

 (2) Convolution function: The convolution function is an important scaffold for building a neural network, and it is a two-dimensional filter scanned on a batch of images. Here is a brief introduction to several methods of the convolution function.

  a.tf.nn.convolution(input, filter, padding, strides=None, dilation_rate=None, name=None, data_format=None) This function computes the sum of N-dimensional convolutions.

  b.tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None, name=None) The function of this function is to operate on a four-dimensional input data input and a four-dimensional convolution kernel filter, Then perform a two-dimensional convolution operation on the input data, and finally get the result after the convolution.

  In addition, there are methods such as tf.nn.depthwise_conv2d(), tf.nn.separable_conv2d(), etc., which will not be explained here.

 

 (3) Pooling function: In the neural network, the pooling function generally follows the next layer of the convolution function. The pooling operation uses a matrix window to scan on the tensor, and the value in each matrix window is obtained by taking Maximum or average value to reduce the number of elements. The size of the matrix window for each pooling operation is specified by ksize, and the strides is determined by the strides.

  a.tf.nn.avg_pool(value, ksize, strides, padding, data_format='NHWC', name=None). Calculates the average of the elements in the pooled area.

  b.tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None). Computes the maximum value of elements in the pooled region.

 

6. Model storage and loading

 (1) Model storage is mainly to establish a tf.train.Saver() to save variables, which are generated by calling Saver.save() on the tf.train.Saver object, and specify the location for saving. Generally, the extension of the model is . ckpt.

saver.save(sess, ckpt_dir + "/model.ckpt", global_step=global_step)

 (2) Loading the model You can use saver.resotre to load the model.

with tf.Session() as sess:
    tf.initialize_all_variables().run()
    
    ckpt = tf.train.get_checkpoint_state(ckpt_dir)
    if ckpt and ckpt.model_checkpoint_path:
        print(ckpt.model_checkpoint_path) 
        saver.restore(sess, ckpt.model_checkpoint_path) #Load all parameters

 

 

PS: A picture to understand fitting, overfitting and underfitting. It's kind of ugly to draw. . .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325650429&siteId=291194637