Study Notes CB013: TensorFlow, TensorBoard, seq2seq

tensorflow is based on the deep learning framework of graph structure, and the interaction between graph and computing kernel is realized through session internally.

tensorflow basic math operation usage.

import tensorflow as tf
sess = tf.Session()
a = tf.placeholder("float")
b = tf.placeholder("float")
c = tf.constant(6.0)
d = tf.mul(a, b)
y = tf.mul(d, c)
print sess.run(y, feed_dict={a: 3, b: 3})
A = [[1.1,2.3],[3.4,4.1]]
Y = tf.matrix_inverse(A)
print sess.run(Y)
sess.close()

Major number crunching.

tf.add
tf.sub
tf.mul
tf.div
tf.mod
tf.abs
tf.neg
tf.sign
tf.inv
tf.square
tf.round
tf.sqrt
tf.pow
tf.exp
tf.log
tf.maximum
tf.minimum
tf.cos
tf.sin

Main matrix operations.

tf.diag #Generate a diagonal matrix
tf.transpose
tf.matmul tf.matrix_determinant
#Calculate the value of the determinant
tf.matrix_inverse #Calculate the inverse of the matrix

tensorboard use. The tensorflow code, first builds the graph, and then executes it. It is inconvenient to debug the intermediate process. A tensorboard tool is provided for debugging. Prompt to write event files to the directory (/tmp/tflearn_logs/11U8M4/) during training. Execute the command to open http://192.168.1.101:6006 to see the interface of tensorboard.

tensorboard --logdir=/tmp/tflearn_logs/11U8M4/

Graph和Session。

import tensorflow as tf
with tf.Graph().as_default() as g:
with g.name_scope("myscope") as scope: # With this scope, the names of the ops below are all prefixes like myscope/Placeholder
sess = tf.Session(target='', graph = g, config=None) # target represents the tf execution engine to be connected
print "graph version:", g.version # 0
a = tf.placeholder("float")
print a.op # Output the entire operation information, the same as the result returned by g.get_operations below
print "graph version:", g.version # 1
b = tf.placeholder("float")
print "graph version:", g.version # 2
c = tf.placeholder("float")
print "graph version:", g.version # 3
y1 = tf.mul(a, b) # can also be written as a * b
print "graph version:", g.version # 4
y2 = tf.mul(y1, c) # can also be written as y1 * c
print "graph version:", g.version # 5
operations = g.get_operations()

for (i, op) in enumerate(operations):
print "============ operation", i+ 1, "==========="
print op # A structure, including: name, op, attr, input, etc., different ops are different
assert y1.graph is g
assert sess.graph is g
print " =============== graph object address ================"
print sess.graph
print "======== ======== graph define ==============="
print sess.graph_def
print "================== sess str ==============="
print sess.sess_str
print sess.run(y1, feed_dict={a: 3, b: 3}) # 9.0 Elements and values in feed_dictgraph map
print sess.run(fetches=[b,y1], feed_dict={a: 3, b: 3}, options=None,run_metadata=None) # The incoming feches has the same shape as the returned value
print sess.run({'ret_name':y1}, feed_dict={a: 3, b: 3}) # {'ret_name': 9.0} The incoming feches and the returned value have the same shape

assert tf.get_default_session() is not sess
with sess.as_default(): # Use sess as the default session, then tf.get_default_session is sess, otherwise it is not
assert tf.get_default_session() is sess

h = sess.partial_run_setup([y1, y2], [a, b , c]) # run in stages, parameters specify list of feches and feed_dict
res = sess.partial_run(h, y1, feed_dict={a: 3, b: 4}) # 12 run first stage
res = sess.partial_run( h, y2, feed_dict={c: res}) # 144.0 Run the second stage, which uses the execution result of the first stage
print "partial_run res:", res
sess.close()

tensorflow Session is the Graph and executor medium, Session.run() serializes graph, fetches, feed_dict to byte array, and calls tf_session.TF_Run (see /usr/local/lib/python2.7/site-packages/tensorflow/ python/client/session.py). tf_session.TF_Run calls the dynamic link library _pywrap_tensorflow.so to implement the _pywrap_tensorflow.TF_Run interface (see /usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py). The dynamic link library is the tensorflow polyglot python interface. _pywrap_tensorflow.so and pywrap_tensorflow.py are automatically generated by the SWIG tool, the core language of tensorflow is C, and various scripting language interfaces are generated by SWIG.

10 key lines of code to implement linear regression. Solving a linear regression problem with gradient descent is the simplest introductory example of tensorflow (10 lines of key code).

# -*- coding: utf-8 -*-
import numpy as np
import tensorflow as tf
# Randomly generate 1000 points around the line with y=0.1x+0.3
num_points = 1000
vectors_set = []
for i in xrange( num_points):
x1 = np.random.normal(0.0, 0.55)
y1 = x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.03)
vectors_set.append([x1, y1])
# generate some samples
x_data = [ v[0] for v in vectors_set]
y_data = [v[1] for v in vectors_set]
# Generate a 1-dimensional W matrix, the value is a random number between [-1,1]
W = tf.Variable(tf .random_uniform([1], -1.0, 1.0), name='W')
# Generate a 1-dimensional b matrix, the initial value is 0
b = tf.Variable(tf.zeros([1]), name='b ')
# Calculate the estimated value y
y = W * x_data + b
# Use the mean square error between the estimated value y and the actual value y_data as the loss
loss = tf.reduce_mean(tf.square(y - y_data), name='loss')
# Use gradient descent to optimize parameters
optimizer = tf.train.GradientDescentOptimizer(0.5)
# The training process is to minimize this error value
train = optimizer.minimize(loss, name='train')
sess = tf.Session()
# output graph structure
#print sess.graph_def
init = tf.initialize_all_variables()
sess.run(init)
# what are the initialized W and b
print "W =", sess.run(W), "b =", sess.run(b), "loss =", sess.run(loss)
# Execute 20 training runs
for step in xrange(20):
sess .run(train)
# output trained W and b
print "W =", sess.run(W), "b =", sess.run(b), "loss =", sess.run(loss)
# Generate summary file for tensorboard using
writer = tf.train.SummaryWriter("./tmp", sess.graph)

A diagram showing how linear regression works. Execute the code to generate a tmp directory locally, generate tensorboard to read data, and execute:

tensorboard --logdir=./tmp/

Open http://localhost:6006/ GRAPHS and expand a series of key nodes. The graph is the code generation graph structure, the graph describes the entire process of gradient descent to solve the linear regression problem, and each node represents one step of the code.

Detailed analysis of the linear regression graph. W and b. The code has three operations on W: Assign, read, and train. assign is based on random_uniform assignment.

W = tf.Variable(tf.random_uniform([1], -1.0, 1.0), name='W')

tf.random_uniform graph. read corresponds to:

y = W * x_data + b

train corresponds to the gradient descent training process operation.

There are three operations on b: Assign, read, train. Use zeros to assign initialization values.
W and b calculate update_W and update_b through gradient descent, and update the values of W and b. update_W and update_b are calculated based on three inputs, learning rate learning_rate, current value of W/b, gradients.
The most critical gradient descent process.

loss = tf.reduce_mean(tf.square(y - y_data), name='loss')

Taking y-y_data as input, x is not x_data, it is a temporary constant 2. 2(y-y_data) is obviously the derivative of (y-y_data)^2. With 2(y-y_data) as the input, the incremental update_b of the parameter b is finally generated after various processing. Generate update_W to update W, and reverse traceback depends on add_grad (based on y-y_data) and W and y generation, detailed calculation process: http://stackoverflow.com/questions/39580427/how-does-tensorflow-calculate-the-gradients -for-the-tf-train-gradientdescentopti , one-step simple operation is converted into many node graphs by tensorflow. The detailed nodes are not analyzed in depth, but only the expression of the operation graph, which is not very important.

tensorflow comes with seq2seq model based on one-hot word embedding, each word is replaced by a number to represent the relationship between words, word2vec multi-dimensional vector is used as word embedding, which can represent the relationship between words. Based on the seq2seq idea, using multi-dimensional word vectors to implement the model is expected to have higher accuracy.

seq2seq model principle. Refer to the paper "Sequence to Sequence Learning with Neural Networks". The core idea is that ABC is the input sentence, WXYZ is the output sentence, EOS is the end of a sentence, and the training unit is lstm. The characteristic of lstm is that it has long and short-term memory, and can determine the following words according to the input of multiple words. For lstm knowledge, please refer to http ://deeplearning.net/tutorial/lstm.html Model encoder and decoder share the same lstm layer, share parameters, separate https://github.com/farizrahman4u/seq2seq Green is encoder, yellow is decoder, orange is The arrows pass the lstm layer state information (memory information), the only state information that the encoder passes to the decoder.
Each timing input of the decoder is the previous timing output. Through different timing inputs "How are you <EOL>", the model can automatically output "WI am fine <EOL>" word by word, W is a special mark, it is the encoder The final output is the decoder trigger signal.
Directly change each time series input of the decoder to "WI am fine", and pass this part from the training sample input X, and Y is still the predicted output "WI am fine <EOL>", so that the trained model is the encoding decoder-decoder model.
Using the training model to predict, when decoding, the previous time series output is used as the input to predict, and "WI am fine <EOL>" can be output.

Text preparation. At least 300w chat corpus is used for word vector training and seq2seq model training. The richer the corpus, the better the quality of the training word vector.
Cut words:

python word_segment.py ./corpus.raw ./corpus.segment

The word segmentation file is converted into "|" separated question and answer pairs:

cat ./corpus.segment | awk '{if(last!="")print last"|"$0;last=$0}' | sed 's/| /|/g' > ./corpus.segment.pair

training word vectors. Train word vectors with google word2vec:

word2vec -train ./corpus.segment -output vectors.bin -cbow 1 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-5 -threads 20 -binary 1 -iter 15

corpus.raw raw corpus data, vectors.bin generated word vector binary file.
Generate word vector binary loading method https://github.com/warmheartli/ChatBotCourse/blob/master/word_vectors_loader.py .

Create a model. It is implemented with the tensorflow+tflearn library.

# First, we apply for variable space for the input sample data, as follows. Among them, self.max_seq_len refers to the maximum number of words contained in a word-cut sentence, and self.word_vec_dim is the dimension of the word vector, where shape specifies that the input data is an indeterminate number of samples, and each sample contains at most max_seq_len*2 words, each word is represented by a word_vec_dim-dimensional floating point number. 2 times max_seq_len is used here because we train the input X to contain both question sentences and answer sentences
input_data = tflearn.input_data(shape=[None, self.max_seq_len*2, self.word_vec_dim], dtype=tf .float32, name = "XY")

# Then we cut out the first max_seq_len word sequences of all input sample data, that is, the question sentence part, as the input of the encoder
encoder_inputs = tf.slice(input_data, [0, 0, 0], [-1, self.max_seq_len, self.word_vec_dim], name="enc_in")

# After taking out max_seq_len-1, that is, the answer sentence part, as the input of the decoder. Note that only max_seq_len-1 are taken here, because a set of GO logos must be spelled in front to tell the decoder that we are going to start decoding, that is, adding go_inputs below to make up the final go_inputs
decoder_inputs_tmp = tf.slice( input_data, [0, self.max_seq_len, 0], [-1, self.max_seq_len-1, self.word_vec_dim], name="dec_in_tmp")
go_inputs = tf.ones_like(decoder_inputs_tmp)
go_inputs = tf.slice(go_inputs, [0, 0, 0], [-1, 1, self.word_vec_dim])
decoder_inputs = tf.concat(1, [go_inputs, decoder_inputs_tmp], name ="dec_in")

# Then start the encoding process, the returned encoder_output_tensor is expanded into a vector of the shape (?, 1, 200) that can be recognized by tflearn.regression regression; the returned states are passed to the decoder
(encoder_output_tensor, states) = tflearn.lstm(encoder_inputs, self.word_vec_dim, return_state=True, scope='encoder_lstm')
encoder_output_sequence = tf.pack([encoder_output_tensor], axis=1)

# Take out the first word of decoder_inputs, which is GO
first_dec_input = tf. slice(decoder_inputs, [0, 0, 0], [-1, 1, self.word_vec_dim])

# Input it into the decoder, as follows, the initialization state of the decoder is the states generated by the encoder, note: here scope='decoder_lstm' is to reuse the same decoder below
decoder_output_tensor = tflearn.lstm(first_dec_input, self.word_vec_dim, initial_state=states, return_seq=False, reuse=False, scope='decoder_lstm')

# Temporarily save the first output of the decoder to decoder_output_sequence_list for the final output
decoder_output_sequence_single = tf.pack([decoder_output_tensor], axis=1)
decoder_output_sequence_list = [decoder_output_tensor]

# Next, we loop max_seq_len-1 times, and continuously take the word vectors of decoder_inputs as the next round of decoder input, and add the result to decoder_output_sequence_list , the reuse=True, scope='decoder_lstm' here indicates that the same lstm layer is used for the first decoding above
for i in range(self.max_seq_len-1):
next_dec_input = tf.slice(decoder_inputs, [0 , i+1, 0], [-1, 1, self.word_vec_dim])
decoder_output_tensor = tflearn.lstm(next_dec_input, self.word_vec_dim, return_seq=False, reuse=True, scope='decoder_lstm')
decoder_output_sequence_single = tf.pack([decoder_output_tensor], axis=1)
decoder_output_sequence_list.append(decoder_output_tensor)

# Next, we concatenate the first output of the encoder and all the outputs of the decoder as the input of tflearn.regression regression
decoder_output_sequence = tf.pack (decoder_output_sequence_list, axis=1)
real_output_sequence = tf.concat(1, [encoder_output_sequence, decoder_output_sequence])
net = tflearn.regression(real_output_sequence, optimizer='sgd', learning_rate=0.1, loss='mean_square')
model = tflearn.DNN (net)

The model is created, and the ideas are summarized:

1) The training input X and Y are the encoder-decoder input and prediction output respectively;
2) X is divided into two halves, the first half is the encoder input, and the second half is the decoder input;
3) The encoder-decoder output prediction value is Y Do regression training
4) The training uses the real value of the sample as the input of the decoder. The actual prediction will not have a WXYZ part, and the output of the previous time series will be used as the input of the next time series to

train the model. Instantiate the model and feed the data for training:

model = self.model()
model.fit(trainXY, trainY, n_epoch=1000, snapshot_epoch=False, batch_size=1)
model.load('./model/model')

trainXY and trainY are assigned by loading the corpus.

Load the word vector and save it to word_vector_dict, read the corpus file and look up word_vector_dict word by word, and assign the vector to question_seq and answer_seq:

def init_seq(input_file):
"""读取切好词的文本文件，加载全部词序列
"""
file_object = open(input_file, 'r')
vocab_dict = {}
while True:
question_seq = []
answer_seq = []
line = file_object.readline()
if line:
line_pair = line.split('|')
line_question = line_pair[0]
line_answer = line_pair[1]
for word in line_question.decode('utf-8').split(' '):
if word_vector_dict.has_key(word):
question_seq.append(word_vector_dict[word])
for word in line_answer.decode('utf-8').split(' '):
if word_vector_dict.has_key(word):
answer_seq.append(word_vector_dict[word])
else:
break
question_seqs.append(question_seq)
answer_seqs.append(answer_seq)
file_object.close()

There are question_seq and answer_seq, construct trainXY and trainY:

def generate_trainig_data(self):
xy_data = []
y_data = []
for i in range(len(question_seqs)):
question_seq = question_seqs[i]
answer_seq = answer_seqs[i]
if len(question_seq) < self.max_seq_len and len(answer_seq) < self.max_seq_len:
sequence_xy = [np.zeros(self.word_vec_dim)] * (self.max_seq_len-len(question_seq)) + list(reversed(question_seq))
sequence_y = answer_seq + [np.zeros(self.word_vec_dim)] * (self.max_seq_len-len(answer_seq))
sequence_xy = sequence_xy + sequence_y
sequence_y = [np.ones(self.word_vec_dim)] + sequence_y
xy_data.append(sequence_xy)
y_data.append(sequence_y)
return np.array(xy_data), np.array(y_data)

Construct training data to create a model, train:

python my_seq2seq_v2.py train

Finally generate the ./model/model model file.

effect prediction. Train the model, enter a sentence to predict the answer:

predict = model.predict(testXY)

Only question has no answer, testXY has no Y part, and the output of the previous sentence is used as the input of the next sentence:

for i in range(self.max_seq_len-1):
# next_dec_input = tf.slice(decoder_inputs, [0, i+1, 0], [-1, 1, self.word_vec_dim])这里改成下面这句
next_dec_input = decoder_output_sequence_single
decoder_output_tensor = tflearn.lstm(next_dec_input, self.word_vec_dim, return_seq=False, reuse=True, scope='decoder_lstm')
decoder_output_sequence_single = tf.pack([decoder_output_tensor], axis=1)
decoder_output_sequence_list.append(decoder_output_tensor)

The word vector is a multi-dimensional floating point number. The predicted word vector is matched by cosine similarity. The cosine similarity matching method:

def vector2word(vector):
max_cos = -10000
match_word = ''
for word in word_vector_dict:
v = word_vector_dict[word]
cosine = vector_cosine(vector, v)
if cosine > max_cos:
max_cos = cosine
match_word = word
return (match_word, max_cos)

The implementation of vector_cosine is as follows:

def vector_cosine(v1, v2):
if len(v1) != len(v2):
sys.exit(1)
sqrtlen1 = vector_sqrtlen(v1)
sqrtlen2 = vector_sqrtlen(v2)
value = 0
for item1, item2 in zip(v1, v2):
value += item1 * item2
return value / (sqrtlen1*sqrtlen2)

def vector_sqrtlen(vector):
len = 0
for item in vector:
len += item * item
len = math.sqrt(len)
return len

predict:

python my_seq2seq_v2.py test test.data

The first column of the output is the predicted word for each time series, the second column is the cosine similarity between the predicted output vector and the nearest word vector, and the third column is the predicted vector Euclidean distance.
The max_seq_len is fixed to 8, and the output sequence will have some extra words at the end, and a threshold truncation is set according to the cosine similarity or other indicators.
Full code https://github.com/warmheartli/ChatBotCourse/blob/master/chatbotv2/my_seq2seq_v2.py.

References:
"Python Natural Language Processing"
"NLTK Basic Tutorial Building Machine Learning Applications with NLTK and Python Libraries"
http://www.shareditor.com/blogshow?blogId=119
http://www.shareditor.com/blogshow? blogId=120
http://www.shareditor.com/blogshow?blogId=121

Welcome to recommend machine learning job opportunities in Shanghai, my WeChat: qingxingfengzi

Study Notes CB013: TensorFlow, TensorBoard, seq2seq

Guess you like