TensorFlow study notes 3

TensorFlow中的Linear Regression

Linear regression is a very simple problem in machine learning, we implement a small example with tensorflow.

Problem: I want to find the relationship between arson and theft in a city. The number of arson is X, and the number of theft is Y. We build a linear relationship as follows, Y = wX + b.


TensorFlow implementation

First define placeholders for input X and target Y

X = tf.placeholder(tf.float32, shape=[], name='input')
Y = tf.placeholder(tf.float32, shape=[], name='label')

Inside shape=[ ] means scalar (scalar)


Then define the parameters w and b that need to be updated and learned

w = tf.get_variable(
'weight', shape=[], initializer=tf.truncated_normal_initializer())
b = tf.get_variable('bias', shape=[], initializer=tf.zeros_initializer())


Then define the output of the model and the error function, here use the mean squared error (Y - Y_predicted)^2

Y_predicted = w * X + b
loss = tf.square(Y - Y_predicted, name='loss')


Then define the optimization function. The simplest gradient descent is used here. The learning rate here can not only be a constant, but also a tensor

optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(loss)


How does tensorflow determine which parameters should be updated and which parameters should not be updated? tf.Variabel(trainable=False) means that the parameter will not be updated. By default, tf.Variable(trainable=True).


Then do the operation in the session

init = tf.global_variables_initializer()
with tf.Session() as sess:
    writer = tf.summary.FileWriter('./linear_log', graph=sess.graph)
sess.run(init)
for i in range(100):
    total_loss = 0
    for x, y in data:
        _, l = sess.run([optimizer, h_loss], feed_dict={X: x, Y: y})
        total_loss += l
    print("Epoch {0}: {1}".format(i, total_loss / n_samples))


visualization

We can open tensorboard to see our structure diagram as follows


Finally we draw the data points and the predicted line


How to improve the model

1. We can increase the dimension, the original model is Y = Xw + b, we can increase one dimension so that it becomes Y = X^2 w1 + X w2 + b


2. You can change a loss calculation method, such as huber loss, when the error is relatively small, the mean square error is used, and when the error is relatively large, the absolute value error is used




When implementing huber loss, because tf is defined in the form of a graph, logical statements, such as if, etc., cannot be used. We can use conditional judgment statements in TensorFlow, such as tf.where, tf.case, etc., huber loss The implementation method is as follows

def huber_loss(labels, predictions, delta=1.0):
   residual = tf.abs(predictions - labels)
   condition = tf.less(residual, delta)
   small_res = 0.5 * residual**2
   large_res = delta * residual - 0.5 * delta**2
return tf.where(condition, small_res, large_res)


About Optimizer

TensorFlow会自动求导,然后更新参数,使用一行代码tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(loss),下面我们将其细分开来,讲一讲每一步。


自动梯度

首先优化函数的定义就是前面一部分opt = tf.train.GradientDescentOptimizer(learning_rate),定义好优化函数之后,可以通过grads_and_vars = opt.compute_gradients(loss, <list of variables>)来计算loss对于一个变量列表里面每一个变量的梯度,得到的grads_and_vars是一个list of tuples,list中的每个tuple都是由(gradient, variable)构成的,我们可以通过get_grads_and_vars = [(gv[0], gv[1]) for gv in grads_and_vars]将其分别取出来,然后通过opt.apply_gradients(get_grads_and_vars)来更新里面的参数,下面我们举一个小例子。

import tensorflow as tf

x = tf.Variable(5, dtype=tf.float32)
y = tf.Variable(3, dtype=tf.float32)

z = x**2 + x * y + 3

sess = tf.Session()
# initialize variable
sess.run(tf.global_variables_initializer())

# define optimizer
optimizer = tf.train.GradientDescentOptimizer(0.1)

# compute gradient z w.r.t x and y
grads_and_vars = optimizer.compute_gradients(z, [x, y])

# fetch the variable
get_grads_and_vars = [(gv[0], gv[1]) for gv in grads_and_vars]

# dz/dx = 2*x + y= 13
# dz/dy = x = 5
print('grads and variables')
print('x: grad {}, value {}'.format(
sess.run(get_grads_and_vars[0][0]), sess.run(get_grads_and_vars[0][1])))

print('y: grad {}, value {}'.format(
sess.run(get_grads_and_vars[1][0]), sess.run(get_grads_and_vars[1][1])))

print('Before optimization')
print('x: {}, y: {}'.format(sess.run(x), sess.run(y)))

# optimize parameters
opt = optimizer.apply_gradients(get_grads_and_vars)
# x = x - 0.1 * dz/dx = 5 - 0.1 * 13 = 3.7
# y = y - 0.1 * dz/dy = 3 - 0.1 * 5 = 2.5
print('After optimization using learning rate 0.1')
sess.run(opt)
print('x: {:.3f}, y: {:.3f}'.format(sess.run(x), sess.run(y)))
sess.close()


上面程序的注释已经解释了所有的内容,就不细讲了,最后可以得到下面的结果。



在实际中,我们当然不用手动更新参数,optimizer类可以帮我们自动更新,另外还有一个函数也能够计算梯度。

tf.gradients(ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)

这个函数会返回list,list的长度就是xs的长度,list中每个元素都是 sum_{ys}(dys/dx)

实际运用: 这个方法对于只训练部分网络非常有用,我们能够使用上面的函数只对网络中一部分参数求梯度,然后对他们进行梯度的更新。


优化函数类型

随机梯度下降(GradientDescentOptimizer)仅仅只是tensorflow中一个小的更新方法,下面是tensorflow目前支持的更新方法的总结

tf.train.GradientDescentOptimizer
tf.train.AdadeltaOptimizer
tf.train.AdagradOptimizer
tf.train.AdagradDAOptimizer
tf.train.MomentumOptimizer
tf.train.AdamOptimizer
tf.train.FtrlOptimizer
tf.train.ProximalGradientDescentOptimizer
tf.train.ProximalAdagradOptimizer
tf.train.RMSPropOptimizer

这个博客对上面的方法都做了介绍,感兴趣的同学可以去看看,另外cs231n和coursera的神经网络课程也对各种优化算法做了介绍。


TensorFlow 中的Logistic Regression

我们使用简单的logistic regression来解决分类问题,使用MNIST手写字体,我们的模型公式如下

logits = X * w + b \\ Y_{predicted} = softmax(logits)\\ loss = CrossEntropy(Y, Y_{predicted})


TensorFlow实现

TF Learn中内置了一个脚本可以读取MNIST数据集

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('./data/mnist', one_hot=True)


接着定义占位符(placeholder)和权重参数

x = tf.placeholder(tf.float32, shape=[None, 784], name='image')
y = tf.placeholder(tf.int32, shape=[None, 10], name='label')

w = tf.get_variable(
'weight', shape=[784, 10], initializer=tf.truncated_normal_initializer())
b = tf.get_variable('bias', shape=[10], initializer=tf.zeros_initializer())

输入数据的shape=[None, 784]表示第一维接受任何长度的输入,第二维等于784是因为28x28=784。权重w使用均值为0,方差为1的正态分布,偏置b初始化为0。


然后定义预测结果、loss和优化函数

logits = tf.matmul(x, w) + b
entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(entropy, axis=0)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

使用tf.matmul做矩阵乘法,然后使用分类问题的loss函数交叉熵,最后将一个batch中的loss求均值,对其使用随机梯度下降法。


因为数据集中有测试集,所以可以在测试集上验证其准确率

preds = tf.nn.softmax(logits)
correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(y, 1))
accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32), axis=0)

首先对输出结果进行softmax得到概率分布,然后使用tf.argmax得到预测的label,使用tf.equal得到预测的label和实际的label相同的个数,这是一个长为batch的0-1向量,然后使用tf.reduce_sum得到正确的总数。

最后在session中运算,这个过程就不再赘述。


### 结果与可视化

最后可以得到训练集的loss的验证集准确率如下


可以发现经过10 epochs,验证集能够实现74%的准确率。同时,我们还能够得到tensorboard可视化如下。



这看着是有点混乱的,所以下一次课会讲一下如何结构化我们的模型。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326010263&siteId=291194637