在完成cs224d第二个大作业(assignment2)之后,个人认为它的模型类封装做的不错,为方便以后学习使用,特简化形成一个简单的线性拟合类(class),以备后续学习使用。同时,在其中探索了利用tf.variable_scope()和tf.get_variable()进行变量共享的方法。代码见后面。
在完成cs224d assignment2作业时,用到variable_scope()和get_variable()操作,非常迷糊。经过大量网上文档查阅和试验之后,个人认为下面这个博客说得比较清楚:http://blog.csdn.net/Jerr__y/article/details/70809528。具体不再赘述。
需要说明的是,通过tf.get_variable()共享变量,可用于传递训练得到的神经网络模型参数。在cs224d assignment2的q3_RNNLM中便需要两次调用同一个class,其中,第一次是训练RNN模型,第二次是使用该RNN模型进行语句生成。显然,这其中需要共享模型参数。
这里基于上述简化的线性拟合类(class)代码对使用方法进行简单说明:
1. 在主程序中两次调用同一个class,如下(注意tf.variable_scope()和scope.reuse_variables()的使用):
with tf.variable_scope('LR') as scope:
model = linearReg(config)
scope.reuse_variables()
test_model = linearReg(config)
其中,第一次model是用于模型训练,第二次test_model是其它用途,但希望使用model训练出的模型参数。
2. 在linearReg里模型参数变量定义如下(注意使用tf.get_variable()而不是tf.Variable()):
def add_model(self):
with tf.variable_scope('Layer'):
self.W = tf.get_variable('W', [1,], initializer= tf.zeros_initializer())
self.b = tf.get_variable('b', [1,], initializer= tf.zeros_initializer())
output = self.W*self.x_placeholder + self.b
此处W和b定义为类内参数(self.)是为了后续打印确认参数共享,实际使用中完全可以仅定义为函数内部参数。
在这种使用方法下,model训练完成之后,可以看到,test_model的W和b与之完全一样。该示例程序打印结果如下:
======================================
Trained results, W = 2.000, b = 0.188
(Real value: W = 2.000, b = 0.200)
======================================
W in train model = W in test model ?
Yes!
b in train model = b in test model ?
Yes!
详见下面代码。
顺便广告一下,目前网上可以找到的cs224d assignment2的解答,很多可能都是基于低版本tensorflow的,在高版本(我的是r1.3)上会有问题,无法运行。其中有些是tensorflow新旧版本函数兼容性问题,但也有variable_scope()的使用问题。针对这个问题,个人进行了相应的修改,确保可以在tensorflow r1.3上正确运行。修改后的cs224d assignment2代码已上传至:http://download.csdn.net/download/foreseerwang/10274823 欢迎下载、交流。
简化的线性拟合类(class)代码如下:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
class Config(object):
max_epochs = 100
early_stopping = 5
lr = 0.3
class linearReg(object):
def load_data(self):
self.X_train = np.random.rand(1000)*2-1
self.y_train = 2 * self.X_train + np.random.randn(*self.X_train.shape) * 0.4 + 0.2
self.X_dev = np.random.rand(100)*2-1
self.y_dev = 2 * self.X_dev + np.random.randn(*self.X_dev.shape) * 0.4 + 0.2
self.X_test = np.random.rand(100)*2-1
self.y_test = 2 * self.X_test + np.random.randn(*self.X_test.shape) * 0.4 + 0.2
def add_placeholders(self):
self.x_placeholder = tf.placeholder(tf.float32, shape=(None))
self.y_placeholder = tf.placeholder(tf.float32, shape=(None))
def create_feed_dict(self, x_batch, y_batch=None):
if y_batch is None:
feed_dict = {
self.x_placeholder: x_batch
}
else:
feed_dict = {
self.x_placeholder: x_batch,
self.y_placeholder: y_batch
}
return feed_dict
def add_model(self):
with tf.variable_scope('Layer'):
self.W = tf.get_variable('W', [1,], initializer= tf.zeros_initializer())
self.b = tf.get_variable('b', [1,], initializer= tf.zeros_initializer())
output = self.W*self.x_placeholder + self.b
return output
def add_loss_op(self, y):
loss = tf.reduce_mean(tf.pow((y-self.y_placeholder), 2))
return loss
def add_training_op(self, loss):
optimizer = tf.train.GradientDescentOptimizer(self.config.lr)
train_op = optimizer.minimize(loss)
return train_op
def __init__(self, config):
"""Constructs the network using the helper functions defined above."""
self.config = config
self.load_data()
self.add_placeholders()
self.ypred = self.add_model()
self.loss = self.add_loss_op(self.ypred)
self.train_op = self.add_training_op(self.loss)
def run_epoch(self, session, input_x, input_y, train_op=None):
orig_X, orig_y = input_x, input_y
if not train_op:
train_op = tf.no_op()
feed = self.create_feed_dict(x_batch=orig_X, y_batch=orig_y)
loss, ypred, _ = session.run([self.loss, self.ypred, train_op], feed_dict=feed)
return loss, ypred
def test_LR():
config = Config()
with tf.Graph().as_default():
with tf.variable_scope('LR') as scope:
model = linearReg(config)
scope.reuse_variables()
test_model = linearReg(config)
init = tf.initialize_all_variables()
with tf.Session() as session:
best_val_loss = float('inf')
best_val_epoch = 0
session.run(init)
for epoch in xrange(config.max_epochs):
#print 'Epoch {}'.format(epoch)
train_loss, _ = model.run_epoch(session, model.X_train, model.y_train, model.train_op)
val_loss, y_val_pred = model.run_epoch(session, model.X_dev, model.y_dev)
#print 'Training loss: {}'.format(train_loss)
#print 'Validation loss: {}'.format(val_loss)
if val_loss < best_val_loss:
best_val_loss = val_loss
best_val_epoch = epoch
if epoch - best_val_epoch > config.early_stopping:
break
print("======================================")
print("Trained results, W = %5.3f, b = %5.3f" %(session.run(model.W), session.run(model.b)))
print(" (Real value: W = %5.3f, b = %5.3f)" %(2.0, 0.2))
print("======================================")
print("W in train model = W in test model ?")
print("Yes!" if session.run(model.W)==session.run(test_model.W) else "No!")
print("b in train model = b in test model ?")
print("Yes!" if session.run(model.b)==session.run(test_model.b) else "No!")
#plt.scatter(model.X_dev, model.y_dev)
#plt.scatter(model.X_dev, y_val_pred)
#plt.show()
if __name__ == "__main__":
test_LR()