tensorflow miscellaneous notes



This article records some pit 0 and change log API changes encountered in the process of using the tensorflow API
. For example, version 0.11 adds support for hadoop:
https://github.com/tensorflow/tensorflow/blob/master /RELEASE.md

1. Good way to debug
print:
tf.Print can specify a set of variables to print while calculating a variable, see: https://www.tensorflow.org/versions/master/api_docs/python/control_flow_ops. html#Print
tdb, the visualization tool of tf https://github.com/ericjang/tdb

Version 0.12 has changed the way of summary processing, see for example: http://stackoverflow.com/questions/41027247/tensorflow-0-12- 0rc-tf-summary-scalar-error-using-placeholders
open tensorboard:
sudo tensorboard --logdir=./tensorboard/  & # use & to non-block the shell
sleep 2
gnome-www-browser http://127.0.0.1:6006/


Close the tensorboard service
pgrep tensorboard |xargs sudo kill -9


2. Memory leaks
Every time tf.assign or even tf.train.Saver() is used, an operator will be added to the current graph. If it is used continuously in the loop, it will not only cause memory leaks, but also slow down the speed! ! ! (One of my trainings gradually changed from 170ms to 1000ms+!!) The
debugging method, after adding all operations, use sess.graph.finalize() to make the entire graph read-only. For
details


Note: tf.train.Saver() can also be regarded as adding a node to the graph, so it must be placed before finilize.
However , tf.train.Saver() will only store variables that already exist when the Saver is declared! ! !

3. Conditional statements
Due to the different dependencies of dropout, batchnorm and other functions in training and prediction,
tf.case needs to be used to select different execution paths.
tf.cond is equivalent to if
tf.case is equivalent to switch

Examples and existing problems see:


4. Please understand the two concepts of sparse matrix and tf.nn.embedding_lookup_sparse


separately. Aside from tensorflow, how do you represent a sparse matrix?
Does each non-zero value have a coordinate and a value, such as (0,0):1.0,
then (0,0) corresponds to an item in the index, and 1.0 corresponds to an item in the value.
In addition, this matrix alsothere is a size, it corresponds to the shape

itemThat is (sample subscript, feature subscript)

It should be noted that no samples are not allowed in tensorflow. That is to say, if batch_size is 2, there must be at least two items in the matrix: (0,x):1, (1,x) :0 (if you don't have it, just fill in 0 yourself)

Note: The result of tf.nn.embedding_lookup_sparse is not a certain item, but needs to aggregate all the features of each sample through the combiner. For example, sample 0 has two non-zeros Features:
(0,0):1, (0,3):2, combiner='sum', the corresponding result is 1+2=3
However, the current combiner only supports "mean", "sqrtn" and " sum", if you need max, min, you can consider implementing it by tf.nn.embedding_lookup + tf.segment_max (not verified)
tf.segment_max: https://www.tensorflow.org/versions/master/api_docs/python/math_ops.html#segment_max


5. Shared variables, multi-GPU
shared variables are implemented through variable names, each variable has a unique variable name, and the
variable name is passed through tf .variable_scope control, see:


Multi-GPU (data parallelism):
tf stores the variable-model separately, generates a model on each GPU, and stores the parameters (variables) corresponding to the model in memory, and each GPU calculates gradients for half of the mini-batch data ( Suppose there are two GPUs), and then average and update the gradients on the CPU.
It's awkward to write, and you need to specify the equipment yourself to operate. . . Multiple GPUs cannot be used automatically, see:
https://www.tensorflow.org/versions/master/tutorials/deep_cnn/index.html#training-a-model-using-multiple-gpu-cards

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326461347&siteId=291194637