我们这次搭建的神经网络如下图所示:
它的输入是一个像素值为28*28的灰度图,然后输入的数据先经过一个卷积层,卷积核的大小是5*5*32,得到32个feature map,之后经过池化层,这里我们选择了最大池化,得到12*12*32的数据,此时再经过一个卷积层,卷积核为5*5*64,得到的结果再一次经过池化,得到的数据为4*4*64,最后通过一个全连接层,得到最终的结果。
1.下载和准备数据
在这一部分我们完成对数据集的准备,这里我们选择最常见的书写数字图像集——MNIST,我们使用load_mnist函数来读取MNIST手写数字库,同时输出训练集、验证集、测试集的数量。
代码如下:
def
load_mnist(
path,
kind=
'train'):
"""Load MNIST data from `path`"""
labels_path = os.path.join(path,
'
%s
-labels-idx1-ubyte'
% kind)
images_path = os.path.join(path,
'
%s
-images-idx3-ubyte'
% kind)
with
open(labels_path,
'rb')
as lbpath:
magic, n = struct.unpack(
'>II',
lbpath.read(
8))
labels = np.fromfile(lbpath,
dtype=np.uint8)
with
open(images_path,
'rb')
as imgpath:
magic, num, rows, cols = struct.unpack(
">IIII",
imgpath.read(
16))
images = np.fromfile(imgpath,
dtype=np.uint8).reshape(
len(labels),
784)
return images, labels
X_data, y_data = load_mnist(
'./',
kind=
'train')
print(
'Rows:
%d
, Columns:
%d
' % (X_data.shape[
0], X_data.shape[
1]))
X_test, y_test = load_mnist(
'./',
kind=
't10k')
print(
'Rows:
%d
, Columns:
%d
' % (X_test.shape[
0], X_test.shape[
1]))
X_train, y_train = X_data[:
50000,:], y_data[:
50000]
X_valid, y_valid = X_data[
50000:,:], y_data[
50000:]
print(
'Training: ', X_train.shape, y_train.shape)
print(
'Validation: ', X_valid.shape, y_valid.shape)
print(
'Test Set: ', X_test.shape, y_test.shape)
输出结果如下:
2.生成模型前的准备工作
完成数据下载之后,此时我们需要从输入的数据中抽取股东size的batch,于是我们定义了batch生成函数。它返回一个数字+标签的元组。
代码如下:
def
batch_generator(
X,
y,
batch_size=
64,
shuffle=
False,
random_seed=
None):
idx = np.arange(y.shape[
0])
if shuffle:
rng = np.random.RandomState(random_seed)
rng.shuffle(idx)
X = X[idx]
y = y[idx]
for i
in
range(
0, X.shape[
0], batch_size):
yield (X[i:i+batch_size, :], y[i:i+batch_size])
接下来为了使得数据有更好的表现、更快的收敛,我们需要对数据进行归一下操作。我们计算每个feature的平均值和标准差,完成归一化操作。
代码如下:
mean_vals = np.mean(X_train,
axis=
0)
std_val = np.std(X_train)
X_train_centered = (X_train - mean_vals)/std_val
X_valid_centered = X_valid - mean_vals
X_test_centered = (X_test - mean_vals)/std_val
3.使用tensorflow的底层API搭建CNN网络
首先我们定义卷积层和全连接层,来简化搭建神经的过程。
首先是卷积层,这里我们定义了权重,误差,然后对他们进行初始化。这里的卷积操作使用tf.nn.conv2d函数,权重初始化使用Xavier,误差使用tf.zeros函数完成初始化,最后确定ReLU作为激活函数。
import tensorflow
as tf
import numpy
as np
## wrapper functions
def
conv_layer(
input_tensor,
name,
kernel_size,
n_output_channels,
padding_mode=
'SAME',
strides=(
1,
1,
1,
1)):
with tf.variable_scope(name):
## get n_input_channels:
## input tensor shape:
## [batch x width x height x channels_in]
input_shape = input_tensor.get_shape().as_list()
n_input_channels = input_shape[-
1]
weights_shape = (
list(kernel_size) +
[n_input_channels, n_output_channels])
weights = tf.get_variable(
name=
'_weights',
shape=weights_shape)
print(weights)
biases = tf.get_variable(
name=
'_biases',
initializer=tf.zeros(
shape=[n_output_channels]))
print(biases)
conv = tf.nn.conv2d(
input=input_tensor,
filter=weights,
strides=strides,
padding=padding_mode)
print(conv)
conv = tf.nn.bias_add(conv, biases,
name=
'net_pre-activation')
print(conv)
conv = tf.nn.relu(conv,
name=
'activation')
print(conv)
return conv
我们使用简单的输入来测试一下函数的功能:
g = tf.Graph()
with g.as_default():
x = tf.placeholder(tf.float32,
shape=[
None,
28,
28,
1])
conv_layer(x,
name=
'convtest',
kernel_size=(
3,
3),
n_output_channels=
32)
del g, x
得到结果如下,函数功能正常:
接下来我们定义全连接函数。同样地这里我们使用fc_layer来构建权重和误差,用conv_layer来初始化他们,接着然后使用tf.matmul函数完成生成矩阵。这个函数中有三个变量,分别为输入,该层的名称,用于确定范围、输出单元。
代码如下:
def
fc_layer(
input_tensor,
name,
n_output_units,
activation_fn=
None):
with tf.variable_scope(name):
input_shape = input_tensor.get_shape().as_list()[
1:]
n_input_units = np.prod(input_shape)
if
len(input_shape) >
1:
input_tensor = tf.reshape(input_tensor,
shape=(-
1, n_input_units))
weights_shape = [n_input_units, n_output_units]
weights = tf.get_variable(
name=
'_weights',
shape=weights_shape)
print(weights)
biases = tf.get_variable(
name=
'_biases',
initializer=tf.zeros(
shape=[n_output_units]))
print(biases)
layer = tf.matmul(input_tensor, weights)
print(layer)
layer = tf.nn.bias_add(layer, biases,
name=
'net_pre-activation')
print(layer)
if activation_fn
is
None:
return layer
layer = activation_fn(layer,
name=
'activation')
print(layer)
return layer
接着继续使用简单的输入验证函数功能。
g = tf.Graph()
with g.as_default():
x = tf.placeholder(tf.float32,
shape=[
None,
28,
28,
1])
fc_layer(x,
name=
'fctest',
n_output_units=
32,
activation_fn=tf.nn.relu)
del g, x
输出结果如下:
进行到这里,重头戏来了,我们要正式开始搭建CNN网络啦。。这里我们定义build_CNN 来管理搭建CNN模型的过程。
代码如下:
def
build_cnn():
## Placeholders for X and y:
tf_x = tf.placeholder(tf.float32,
shape=[
None,
784],
name=
'tf_x')
tf_y = tf.placeholder(tf.int32,
shape=[
None],
name=
'tf_y')
# reshape x to a 4D tensor:
# [batchsize, width, height, 1]
tf_x_image = tf.reshape(tf_x,
shape=[-
1,
28,
28,
1],
name=
'tf_x_reshaped')
## One-hot encoding:
tf_y_onehot = tf.one_hot(
indices=tf_y,
depth=
10,
dtype=tf.float32,
name=
'tf_y_onehot')
## 1st layer: Conv_1
print(
'
\n
Building 1st layer: ')
h1 = conv_layer(tf_x_image,
name=
'conv_1',
kernel_size=(
5,
5),
padding_mode=
'VALID',
n_output_channels=
32)
## MaxPooling
h1_pool = tf.nn.max_pool(h1,
ksize=[
1,
2,
2,
1],
strides=[
1,
2,
2,
1],
padding=
'SAME')
## 2n layer: Conv_2
print(
'
\n
Building 2nd layer: ')
h2 = conv_layer(h1_pool,
name=
'conv_2',
kernel_size=(
5,
5),
padding_mode=
'VALID',
n_output_channels=
64)
## MaxPooling
h2_pool = tf.nn.max_pool(h2,
ksize=[
1,
2,
2,
1],
strides=[
1,
2,
2,
1],
padding=
'SAME')
## 3rd layer: Fully Connected
print(
'
\n
Building 3rd layer:')
h3 = fc_layer(h2_pool,
name=
'fc_3',
n_output_units=
1024,
activation_fn=tf.nn.relu)
## Dropout
keep_prob = tf.placeholder(tf.float32,
name=
'fc_keep_prob')
h3_drop = tf.nn.dropout(h3,
keep_prob=keep_prob,
name=
'dropout_layer')
## 4th layer: Fully Connected (linear activation)
print(
'
\n
Building 4th layer:')
h4 = fc_layer(h3_drop,
name=
'fc_4',
n_output_units=
10,
activation_fn=
None)
## Prediction
predictions = {
'probabilities' : tf.nn.softmax(h4,
name=
'probabilities'),
'labels' : tf.cast(tf.argmax(h4,
axis=
1), tf.int32,
name=
'labels')
}
## Visualize the graph with TensorBoard:
## Loss Function and Optimization
cross_entropy_loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=h4,
labels=tf_y_onehot),
name=
'cross_entropy_loss')
## Optimizer:
optimizer = tf.train.AdamOptimizer(learning_rate)
optimizer = optimizer.minimize(cross_entropy_loss,
name=
'train_op')
## Computing the prediction accuracy
correct_predictions = tf.equal(
predictions[
'labels'],
tf_y,
name=
'correct_preds')
accuracy = tf.reduce_mean(
tf.cast(correct_predictions, tf.float32),
name=
'accuracy')
这里得到的tensorboard结果如下:
接下来我们将定义四个其他函数:保存和加载函数以保存加载训练模型的检查点,训练模型使用training_set,预测函数来得到测试数据标签或可能性。
代码如下:
def
save(
saver,
sess,
epoch,
path=
'./model/'):
if
not os.path.isdir(path):
os.makedirs(path)
print(
'Saving model in
%s
' % path)
saver.save(sess, os.path.join(path,
'cnn-model.ckpt'),
global_step=epoch)
def
load(
saver,
sess,
path,
epoch):
print(
'Loading model from
%s
' % path)
saver.restore(sess, os.path.join(
path,
'cnn-model.ckpt-
%d
' % epoch))
def
train(
sess,
training_set,
validation_set=
None,
initialize=
True,
epochs=
20,
shuffle=
True,
dropout=
0.5,
random_seed=
None):
X_data = np.array(training_set[
0])
y_data = np.array(training_set[
1])
training_loss = []
## initialize variables
if initialize:
sess.run(tf.global_variables_initializer())
np.random.seed(random_seed)
# for shuflling in batch_generator
for epoch
in
range(
1, epochs+
1):
batch_gen = batch_generator(
X_data, y_data,
shuffle=shuffle)
avg_loss =
0.0
for i,(batch_x,batch_y)
in
enumerate(batch_gen):
feed = {
'tf_x:0': batch_x,
'tf_y:0': batch_y,
'fc_keep_prob:0': dropout}
loss, _ = sess.run(
[
'cross_entropy_loss:0',
'train_op'],
feed_dict=feed)
avg_loss += loss
training_loss.append(avg_loss / (i+
1))
print(
'Epoch
%02d
Training Avg. Loss:
%7.3f
' % (
epoch, avg_loss),
end=
' ')
if validation_set
is
not
None:
feed = {
'tf_x:0': validation_set[
0],
'tf_y:0': validation_set[
1],
'fc_keep_prob:0':
1.0}
valid_acc = sess.run(
'accuracy:0',
feed_dict=feed)
print(
' Validation Acc:
%7.3f
' % valid_acc)
else:
print()
def
predict(
sess,
X_test,
return_proba=
False):
feed = {
'tf_x:0': X_test,
'fc_keep_prob:0':
1.0}
if return_proba:
return sess.run(
'probabilities:0',
feed_dict=feed)
else:
return sess.run(
'labels:0',
feed_dict=feed)
现在我们可以创建一个tensorflow图形对象,生成图形的随机种子,并在该图中建立CNN模型
import tensorflow
as tf
import numpy
as np
## Define hyperparameters
learning_rate =
1e-4
random_seed =
123
np.random.seed(random_seed)
## create a graph
g = tf.Graph()
with g.as_default():
tf.set_random_seed(random_seed)
## build the graph
build_cnn()
## saver:
saver = tf.train.Saver()
接下来我们训练CNN模型,实现过程中首先创建Tensorflow session来发布表格,然后使用train函数
在第一次创建网络时,需要初始化各个变量。
代码如下:
with tf.Session(
graph=g)
as sess:
train(sess,
training_set=(X_train_centered, y_train),
validation_set=(X_valid_centered, y_valid),
initialize=
True,
random_seed=
123)
save(saver, sess,
epoch=
20)
得到结果如下:
在20个epochs完成后,我们保存之前训练的模型。实现过程中我们首先删除了graph g,新定义了g2
重组了训练模型,完成对测试集的预测。
### Calculate prediction accuracy
### on test set
### restoring the saved model
del g
## create a new graph
## and build the model
g2 = tf.Graph()
with g2.as_default():
tf.set_random_seed(random_seed)
## build the graph
build_cnn()
## saver:
saver = tf.train.Saver()
## create a new session
## and restore the model
with tf.Session(
graph=g2)
as sess:
load(saver, sess,
epoch=
20,
path=
'./model/')
preds = predict(sess, X_test_centered,
return_proba=
False)
print(
'Test Accuracy:
%.3f%%
' % (
100*
np.sum(preds == y_test)/
len(y_test)))
得到结果如下:
接着我们看一下前10个测试样本的预测情况。
## run the prediction on
## some test samples
np.set_printoptions(
precision=
2,
suppress=
True)
with tf.Session(
graph=g2)
as sess:
load(saver, sess,
epoch=
20,
path=
'./model/')
print(predict(sess, X_test_centered[:
10],
return_proba=
False))
print(predict(sess, X_test_centered[:
10],
return_proba=
True))
得到的结果如下:
接下来我们继续完成剩下的20个epoch,这次我们设置Initialize=False来跳过初始化操作。
## continue training for 20 more epochs
## without re-initializing :: initialize=False
## create a new session
## and restore the model
with tf.Session(
graph=g2)
as sess:
load(saver, sess,
epoch=
20,
path=
'./model/')
train(sess,
training_set=(X_train_centered, y_train),
validation_set=(X_valid_centered, y_valid),
initialize=
False,
epochs=
20,
random_seed=
123)
save(saver, sess,
epoch=
40,
path=
'./model/')
preds = predict(sess, X_test_centered,
return_proba=
False)
print(
'Test Accuracy:
%.3f%%
' % (
100*
np.sum(preds == y_test)/
len(y_test)))
得到的结果如下:
结果表明,20个附加时期的训练略有改善。在测试集上获得99.37%的预测精度。