6.3循环神经网络的高级用法
在这一节中,我们将回顾三种先进技术来提高递归神经网络的性能和泛化能力。在本节结束时,您将了解关于使用Keras的递归网络所知道的大部分内容。我们将展示一个天气预报问题的所有三个概念,在那里我们可以访问来自安装在建筑物屋顶上的传感器的数据点的时间,例如温度、气压和湿度,我们用来预测在收集最后一个数据点后24小时的温度。这是一个相当具有挑战性的问题,举例说明了在使用时间序列时遇到的许多常见困难。
我们将讨论以下技术:
1.循环dropout
2.堆叠循环层
3.双向循环层
观察耶拿数据集是一个记录温度的变化的数据集
下载地址为:
# 观察耶拿数据集
import os
data_dir = r'D:\study\Python\Deeplearning\Untitled Folder\jena_climate_2009_2016.csv'
fname = os.path.join(data_dir,'jena_climate_2009_2016.csv')
f = open(fname)
data = f.read()
f.close()
lines = data.split('\n')
header = lines[0].split(',')
lines = lines[1:]
print(header)
print(len(lines))
['"Date Time"', '"p (mbar)"', '"T (degC)"', '"Tpot (K)"', '"Tdew (degC)"', '"rh (%)"', '"VPmax (mbar)"', '"VPact (mbar)"', '"VPdef (mbar)"', '"sh (g/kg)"', '"H2OC (mmol/mol)"', '"rho (g/m**3)"', '"wv (m/s)"', '"max. wv (m/s)"', '"wd (deg)"']
420551
# 将数据转换为numpy数组,解析数据
import numpy as np
float_data = np.zeros((len(lines),len(header)-1))
for i,line in enumerate(lines):
values = [float(x) for x in line.split(',')[1:]]
float_data[i,:] = values
# 绘制温度时间序列
import matplotlib.pyplot as plt
temp = float_data[:,1]
plt.plot(range(len(temp)),temp)
# 绘制前十天的温度时间序列
plt.plot(range(1440),temp[:1440])
现在问题的确切表述如下
一个时间步长是10分钟,每steps个时间步采样一次数据,给定过去lookback个时间步之内的数据,能否预测delay个时间步之后的温度?
lookback = 720, 给定过去5天内的观测数据
steps = 6, 观测数据的采样频率是每小时一个数据点
delay = 144, 目标是未来24小时之后的数据
# 1.数据标准化
mean = float_data[:200000].mean(axis = 0)
float_data -= mean
std = float_data[:200000].std(axis = 0)
float_data /= std
生成器代码
下面是我们将使用的数据生成器。它产生一个元组(samples,targets),其中samples是输入数据的一个批量,targets是相应的目标温度阵列。
生成器参数如下:
data:浮点数据组成的原始数据,我们在上边对其进行了标准化
lookback: 输入数据应该包含过去多少个时间步
delay: 目标应该在未来多少个时间步之后
min_index 和 max_index:data数据中的索引,用于界定需要抽取那些时间步,这有助于保存一部分数据用于验证,另一部分用于测试
shuffle: 是打乱样本,还是按照顺序抽取样本
batch_size: 每个批量的样本数
step: 数据采样的周期(单位:时间步),我们将其设置为6,每小时抽取一个数据
# 生成时间序列及其目标的生成器
def generator(data,lookback,delay,min_index,max_index,shuffle = False,batch_size = 128,step = 6):
if max_index is None:
max_index = len(data) - delay -1
i = min_index + lookback
while(1):
if shuffle:
rows = np.random.randint(min_index + lookback,max_index,size = batch_size)
else:
if i + batch_size >= max_index:
i = min_index + lookback
rows = np.arange(i,min(i + batch_size,max_index))
i += len(rows)
samples = np.zeros((len(rows),lookback//step,data.shape[-1]))
targets = np.zeros((len(rows),))
for j,row in enumerate(rows):
indices = range(rows[j] - lookback,rows[j],step)
samples[j] = data[indices]
targets[j] = data[rows[j] + delay][1]
yield samples,targets
用这个抽象的generator函数实例化三个生成器,一个用于训练,一个用于验证,一个用于测试
训练生成器:0-200000
验证生成器:200000-300000
测试生成器:剩下的
lookback = 1440
step = 6
delay = 144
batch_size = 128
train_gen = generator(float_data,
lookback=lookback,
delay=delay,
min_index=0,
max_index=200000,
shuffle=True,
step=step,
batch_size=batch_size)
val_gen = generator(float_data,
lookback=lookback,
delay=delay,
min_index=200001,
max_index=300000,
step=step,
batch_size=batch_size)
test_gen = generator(float_data,
lookback=lookback,
delay=delay,
min_index=300001,
max_index=None,
step=step,
batch_size=batch_size)
# 为了查看整个验证集,需要从val_gen中抽取多少次
val_steps = (300000-200001-lookback) // batch_size
# 为了查看整个测试集,需要从test_gen中抽取多少次
test_steps = (len(float_data) - 300001 - lookback) // batch_size
# 评估指标,平均绝对误差(MAE)
def evaluate_naive_method():
batch_maes = []
for step in range(val_steps):
samples,targets = next(val_gen)
preds = samples[:,-1,1]
mae = np.mean(np.abs(preds - targets))
batch_maes.append(mae)
print(np.mean(batch_maes))
evaluate_naive_method()
0.2897359729905486
一种基本的机器学习方法
以同样的方式,在尝试机器学习方法之前建立一个常识基线是有用的,尝试简单而便宜的机器学习模型(例如小型密集连接的网络)在寻找复杂的和计算上昂贵的模型如RNNs之前是有用的。这是确保我们在以后的问题上进一步复杂化的最佳方式是合法的,并带来真正的好处。
这里是一个简单的完全连接的模型,我们通过对数据进行平坦化,然后通过两个密集的层来运行。注意最后一个稠密层缺少激活函数,这是回归问题的典型。我们用MAE作为损失。由于我们正在评估的是完全相同的数据,并且与我们的常识方法完全相同的度量,结果将是直接可比的。
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.Flatten(input_shape=(lookback // step, float_data.shape[-1])))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit_generator(train_gen,
steps_per_epoch=500,
epochs=20,
validation_data=val_gen,
validation_steps=val_steps)
Epoch 1/20
500/500 [==============================] - 17s 33ms/step - loss: 1.2541 - val_loss: 0.8818
Epoch 2/20
500/500 [==============================] - 14s 28ms/step - loss: 0.3955 - val_loss: 0.3018
Epoch 3/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2839 - val_loss: 0.3041
Epoch 4/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2639 - val_loss: 0.3117
Epoch 5/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2527 - val_loss: 0.3083
Epoch 6/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2427 - val_loss: 0.3229
Epoch 7/20
500/500 [==============================] - 15s 29ms/step - loss: 0.2366 - val_loss: 0.3242
Epoch 8/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2295 - val_loss: 0.3819
Epoch 9/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2245 - val_loss: 0.3353
Epoch 10/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2196 - val_loss: 0.3321
Epoch 11/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2173 - val_loss: 0.3269
Epoch 12/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2138 - val_loss: 0.3268
Epoch 13/20
500/500 [==============================] - 15s 29ms/step - loss: 0.2110 - val_loss: 0.3779
Epoch 14/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2092 - val_loss: 0.3387
Epoch 15/20
500/500 [==============================] - 14s 29ms/step - loss: 0.2060 - val_loss: 0.3413
Epoch 16/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2034 - val_loss: 0.3350
Epoch 17/20
500/500 [==============================] - 14s 28ms/step - loss: 0.2040 - val_loss: 0.3388
Epoch 18/20
500/500 [==============================] - 14s 29ms/step - loss: 0.2006 - val_loss: 0.3414
Epoch 19/20
500/500 [==============================] - 14s 29ms/step - loss: 0.1999 - val_loss: 0.3376
Epoch 20/20
500/500 [==============================] - 14s 28ms/step - loss: 0.1976 - val_loss: 0.3761
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1,len(loss) + 1)
plt.figure()
plt.plot(epochs,loss,'r',label = 'Traning loss')
plt.plot(epochs,val_loss,'b',label = 'Validation loss')
plt.title('training and validation loss')
plt.legend()
plt.show()
GRU(门控循环单元)
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.GRU(32,input_shape = (None,float_data.shape[-1])))
model.add(layers.Dense(1))
model.compile(optimizer = RMSprop(),loss = 'mae')
history = model.fit_generator(train_gen,steps_per_epoch = 500,epochs = 20,validation_data = val_gen,validation_steps = val_steps)
Epoch 1/20
500/500 [==============================] - 193s 387ms/step - loss: 0.3093 - val_loss: 0.2794
Epoch 2/20
500/500 [==============================] - 194s 388ms/step - loss: 0.2839 - val_loss: 0.2691
Epoch 3/20
500/500 [==============================] - 192s 384ms/step - loss: 0.2798 - val_loss: 0.2664
Epoch 4/20
500/500 [==============================] - 192s 384ms/step - loss: 0.2742 - val_loss: 0.2620
Epoch 5/20
500/500 [==============================] - 191s 382ms/step - loss: 0.2713 - val_loss: 0.2636
Epoch 6/20
500/500 [==============================] - 191s 382ms/step - loss: 0.2705 - val_loss: 0.2641
Epoch 7/20
500/500 [==============================] - 191s 381ms/step - loss: 0.2638 - val_loss: 0.2628
Epoch 8/20
500/500 [==============================] - 190s 381ms/step - loss: 0.2627 - val_loss: 0.2640
Epoch 9/20
500/500 [==============================] - 191s 383ms/step - loss: 0.2571 - val_loss: 0.2658
Epoch 10/20
500/500 [==============================] - 191s 382ms/step - loss: 0.2544 - val_loss: 0.2706
Epoch 11/20
500/500 [==============================] - 203s 407ms/step - loss: 0.2512 - val_loss: 0.2722
Epoch 12/20
500/500 [==============================] - 210s 419ms/step - loss: 0.2476 - val_loss: 0.2780
Epoch 13/20
500/500 [==============================] - 226s 452ms/step - loss: 0.2431 - val_loss: 0.2772
Epoch 14/20
500/500 [==============================] - 211s 422ms/step - loss: 0.2372 - val_loss: 0.2767
Epoch 15/20
500/500 [==============================] - 208s 415ms/step - loss: 0.2340 - val_loss: 0.2814
Epoch 16/20
500/500 [==============================] - 197s 395ms/step - loss: 0.2299 - val_loss: 0.2833
Epoch 17/20
500/500 [==============================] - 194s 388ms/step - loss: 0.2277 - val_loss: 0.2860
Epoch 18/20
500/500 [==============================] - 198s 396ms/step - loss: 0.2237 - val_loss: 0.2955
Epoch 19/20
500/500 [==============================] - 200s 400ms/step - loss: 0.2195 - val_loss: 0.3060
Epoch 20/20
500/500 [==============================] - 193s 387ms/step - loss: 0.2174 - val_loss: 0.2908
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1,len(loss) + 1)
plt.figure()
plt.plot(epochs,loss,'r',label = 'Traning loss')
plt.plot(epochs,val_loss,'b',label = 'Validation loss')
plt.title('training and validation loss')
plt.legend()
plt.show()
1. 循环Dropout层 降低过拟合
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.GRU(32,dropout = 0.2,recurrent_dropout = 0.2,input_shape = (None,float_data.shape[-1])))
model.add(layers.Dense(1))
model.compile(optimizer = RMSprop(),loss = 'mae')
history = model.fit_generator(train_gen,steps_per_epoch = 500,epochs = 40,validation_data = val_gen,validation_steps = val_steps)
Epoch 1/40
500/500 [==============================] - 232s 464ms/step - loss: 0.3415 - val_loss: 0.2788
Epoch 2/40
500/500 [==============================] - 231s 463ms/step - loss: 0.3174 - val_loss: 0.2724
Epoch 3/40
500/500 [==============================] - 249s 498ms/step - loss: 0.3101 - val_loss: 0.2701
Epoch 4/40
500/500 [==============================] - 246s 491ms/step - loss: 0.3074 - val_loss: 0.2726
Epoch 5/40
500/500 [==============================] - 222s 444ms/step - loss: 0.3025 - val_loss: 0.2694
Epoch 6/40
500/500 [==============================] - 252s 504ms/step - loss: 0.2987 - val_loss: 0.2659
Epoch 7/40
500/500 [==============================] - 252s 503ms/step - loss: 0.2959 - val_loss: 0.2695
Epoch 8/40
500/500 [==============================] - 236s 471ms/step - loss: 0.2980 - val_loss: 0.2657
Epoch 9/40
500/500 [==============================] - 247s 493ms/step - loss: 0.2941 - val_loss: 0.2634
Epoch 10/40
500/500 [==============================] - 247s 495ms/step - loss: 0.2906 - val_loss: 0.2654
Epoch 11/40
500/500 [==============================] - 243s 485ms/step - loss: 0.2899 - val_loss: 0.2622
Epoch 12/40
500/500 [==============================] - 241s 482ms/step - loss: 0.2893 - val_loss: 0.2630
Epoch 13/40
500/500 [==============================] - 247s 494ms/step - loss: 0.2879 - val_loss: 0.2675
Epoch 14/40
500/500 [==============================] - 239s 479ms/step - loss: 0.2858 - val_loss: 0.2643
Epoch 15/40
500/500 [==============================] - 233s 467ms/step - loss: 0.2860 - val_loss: 0.2638
Epoch 16/40
500/500 [==============================] - 232s 463ms/step - loss: 0.2836 - val_loss: 0.2715
Epoch 17/40
500/500 [==============================] - 221s 443ms/step - loss: 0.2832 - val_loss: 0.2610
Epoch 18/40
500/500 [==============================] - 228s 456ms/step - loss: 0.2839 - val_loss: 0.2634
Epoch 19/40
500/500 [==============================] - 251s 502ms/step - loss: 0.2809 - val_loss: 0.2663
Epoch 20/40
500/500 [==============================] - 233s 465ms/step - loss: 0.2825 - val_loss: 0.2639
Epoch 21/40
500/500 [==============================] - 222s 443ms/step - loss: 0.2794 - val_loss: 0.2625
Epoch 22/40
500/500 [==============================] - 221s 442ms/step - loss: 0.2782 - val_loss: 0.2620
Epoch 23/40
500/500 [==============================] - 266s 532ms/step - loss: 0.2769 - val_loss: 0.2615
Epoch 24/40
500/500 [==============================] - 241s 482ms/step - loss: 0.2778 - val_loss: 0.2616
Epoch 25/40
500/500 [==============================] - 256s 513ms/step - loss: 0.2778 - val_loss: 0.2633
Epoch 26/40
500/500 [==============================] - 246s 492ms/step - loss: 0.2767 - val_loss: 0.2754
Epoch 27/40
500/500 [==============================] - 232s 465ms/step - loss: 0.2760 - val_loss: 0.2628
Epoch 28/40
500/500 [==============================] - 225s 451ms/step - loss: 0.2741 - val_loss: 0.2721
Epoch 29/40
500/500 [==============================] - 233s 466ms/step - loss: 0.2753 - val_loss: 0.2628
Epoch 30/40
500/500 [==============================] - 227s 455ms/step - loss: 0.2732 - val_loss: 0.2645
Epoch 31/40
500/500 [==============================] - 251s 502ms/step - loss: 0.2738 - val_loss: 0.2619
Epoch 32/40
500/500 [==============================] - 253s 505ms/step - loss: 0.2737 - val_loss: 0.2632
Epoch 33/40
500/500 [==============================] - 244s 488ms/step - loss: 0.2736 - val_loss: 0.2640
Epoch 34/40
500/500 [==============================] - 258s 515ms/step - loss: 0.2715 - val_loss: 0.2644
Epoch 35/40
500/500 [==============================] - 239s 478ms/step - loss: 0.2715 - val_loss: 0.2650
Epoch 36/40
500/500 [==============================] - 235s 471ms/step - loss: 0.2694 - val_loss: 0.2766
Epoch 37/40
500/500 [==============================] - 244s 489ms/step - loss: 0.2711 - val_loss: 0.2660
Epoch 38/40
500/500 [==============================] - 223s 446ms/step - loss: 0.2707 - val_loss: 0.2632
Epoch 39/40
500/500 [==============================] - 228s 456ms/step - loss: 0.2702 - val_loss: 0.2668
Epoch 40/40
500/500 [==============================] - 222s 443ms/step - loss: 0.2700 - val_loss: 0.2678
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1,len(loss) + 1)
plt.figure()
plt.plot(epochs,loss,'r',label = 'Traning loss')
plt.plot(epochs,val_loss,'b',label = 'Validation loss')
plt.title('training and validation loss')
plt.legend()
plt.show()
2.循环层堆叠 过拟合问题解决后,提升网络的精度
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.GRU(32,dropout = 0.1,recurrent_dropout = 0.5,return_sequences = True,input_shape = (None,float_data.shape[-1])))
model.add(layers.GRU(64,activation = 'relu',dropout = 0.1,recurrent_dropout = 0.5))
model.compile(optimizer = RMSprop(),loss = 'mae')
model.add(layers.Dense(1))
history = model.fit_generator(train_gen,steps_per_epoch = 500,epochs = 40,validation_data = val_gen,validation_steps = val_steps)
Epoch 1/40
500/500 [==============================] - 472s 944ms/step - loss: 0.3356 - val_loss: 0.2820
Epoch 2/40
500/500 [==============================] - 473s 947ms/step - loss: 0.3141 - val_loss: 0.2749
Epoch 3/40
500/500 [==============================] - 498s 995ms/step - loss: 0.3063 - val_loss: 0.2710
Epoch 4/40
500/500 [==============================] - 480s 960ms/step - loss: 0.2996 - val_loss: 0.2674
Epoch 5/40
500/500 [==============================] - 477s 955ms/step - loss: 0.2954 - val_loss: 0.2676
Epoch 6/40
500/500 [==============================] - 483s 965ms/step - loss: 0.2955 - val_loss: 0.2664
Epoch 7/40
500/500 [==============================] - 474s 948ms/step - loss: 0.2893 - val_loss: 0.2647
Epoch 8/40
500/500 [==============================] - 465s 930ms/step - loss: 0.2876 - val_loss: 0.2726
Epoch 9/40
500/500 [==============================] - 479s 958ms/step - loss: 0.2869 - val_loss: 0.2654
Epoch 10/40
500/500 [==============================] - 460s 920ms/step - loss: 0.2819 - val_loss: 0.2639
Epoch 11/40
500/500 [==============================] - 465s 930ms/step - loss: 0.2813 - val_loss: 0.2644
Epoch 12/40
500/500 [==============================] - 465s 931ms/step - loss: 0.2791 - val_loss: 0.2632
Epoch 13/40
500/500 [==============================] - 463s 927ms/step - loss: 0.2787 - val_loss: 0.2678
Epoch 14/40
500/500 [==============================] - 473s 947ms/step - loss: 0.2777 - val_loss: 0.2632
Epoch 15/40
500/500 [==============================] - 506s 1s/step - loss: 0.2756 - val_loss: 0.2722
Epoch 16/40
500/500 [==============================] - 493s 986ms/step - loss: 0.2741 - val_loss: 0.2646
Epoch 17/40
500/500 [==============================] - 459s 919ms/step - loss: 0.2724 - val_loss: 0.2652
Epoch 18/40
500/500 [==============================] - 461s 923ms/step - loss: 0.2720 - val_loss: 0.2697
Epoch 19/40
500/500 [==============================] - 472s 943ms/step - loss: 0.2705 - val_loss: 0.2672
Epoch 20/40
500/500 [==============================] - 463s 927ms/step - loss: 0.2673 - val_loss: 0.2659
Epoch 21/40
500/500 [==============================] - 459s 918ms/step - loss: 0.2684 - val_loss: 0.2706
Epoch 22/40
500/500 [==============================] - 456s 913ms/step - loss: 0.2661 - val_loss: 0.2629
Epoch 23/40
500/500 [==============================] - 456s 913ms/step - loss: 0.2668 - val_loss: 0.2664
Epoch 24/40
500/500 [==============================] - 458s 915ms/step - loss: 0.2643 - val_loss: 0.2658
Epoch 25/40
500/500 [==============================] - 468s 936ms/step - loss: 0.2656 - val_loss: 0.2674
Epoch 26/40
500/500 [==============================] - 456s 913ms/step - loss: 0.2639 - val_loss: 0.2683
Epoch 27/40
500/500 [==============================] - 457s 915ms/step - loss: 0.2625 - val_loss: 0.2717
Epoch 28/40
500/500 [==============================] - 456s 912ms/step - loss: 0.2610 - val_loss: 0.2702
Epoch 29/40
500/500 [==============================] - 459s 917ms/step - loss: 0.2618 - val_loss: 0.2728
Epoch 30/40
500/500 [==============================] - 463s 925ms/step - loss: 0.2594 - val_loss: 0.2735
Epoch 31/40
500/500 [==============================] - 467s 934ms/step - loss: 0.2595 - val_loss: 0.2723
Epoch 32/40
500/500 [==============================] - 462s 924ms/step - loss: 0.2589 - val_loss: 0.2702
Epoch 33/40
500/500 [==============================] - 456s 912ms/step - loss: 0.2583 - val_loss: 0.2744
Epoch 34/40
500/500 [==============================] - 455s 909ms/step - loss: 0.2578 - val_loss: 0.2786
Epoch 35/40
500/500 [==============================] - 463s 927ms/step - loss: 0.2562 - val_loss: 0.2692
Epoch 36/40
500/500 [==============================] - 507s 1s/step - loss: 0.2565 - val_loss: 0.2756
Epoch 37/40
500/500 [==============================] - 469s 938ms/step - loss: 0.2564 - val_loss: 0.2701
Epoch 38/40
500/500 [==============================] - 457s 914ms/step - loss: 0.2560 - val_loss: 0.2716
Epoch 39/40
500/500 [==============================] - 458s 916ms/step - loss: 0.2538 - val_loss: 0.2725
Epoch 40/40
500/500 [==============================] - 469s 938ms/step - loss: 0.2539 - val_loss: 0.2734
3.使用双向RNN,捕捉数据中的逆序信息,最后与正序信息进行合并,提高精度
# 示例:使用逆序序列在imdb上训练并评估一个LSTM
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras import layers
from keras.models import Sequential
max_features = 1000 # 作为特征的单词个数
maxlen = 500 # 在这么多单词之后截断文本(这些单词都属于前max_features个最常见的单词)
(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words = max_features)
# 将训练数据个测试数据反转
x_train = [x[::-1] for x in x_train]
x_test = [x[::-1] for x in x_test]
x_train = sequence.pad_sequences(x_train,maxlen = maxlen)
x_test = sequence.pad_sequences(x_test,maxlen = maxlen)
model = Sequential()
model.add(layers.Embedding(max_features,128))
model.add(layers.LSTM(32))
model.add(layers.Dense(1,activation = 'sigmoid'))
model.compile(optimizer = 'rmsprop',loss = 'binary_crossentropy',metrics = ['acc'])
history = model.fit(x_train,y_train,epochs = 10,batch_size = 128,validation_split = 0.2)
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 133s 7ms/step - loss: 0.5582 - acc: 0.7119 - val_loss: 0.6260 - val_acc: 0.7276
Epoch 2/10
20000/20000 [==============================] - 131s 7ms/step - loss: 0.4317 - acc: 0.8165 - val_loss: 0.4199 - val_acc: 0.8206
Epoch 3/10
20000/20000 [==============================] - 130s 7ms/step - loss: 0.3997 - acc: 0.8377 - val_loss: 0.3784 - val_acc: 0.8426
Epoch 4/10
20000/20000 [==============================] - 137s 7ms/step - loss: 0.3829 - acc: 0.8447 - val_loss: 0.4548 - val_acc: 0.8210
Epoch 5/10
20000/20000 [==============================] - 129s 6ms/step - loss: 0.3683 - acc: 0.8516 - val_loss: 0.3782 - val_acc: 0.8524
Epoch 6/10
20000/20000 [==============================] - 128s 6ms/step - loss: 0.3568 - acc: 0.8572 - val_loss: 0.4300 - val_acc: 0.8066
Epoch 7/10
20000/20000 [==============================] - 132s 7ms/step - loss: 0.3432 - acc: 0.8602 - val_loss: 0.3538 - val_acc: 0.8558
Epoch 8/10
20000/20000 [==============================] - 136s 7ms/step - loss: 0.3336 - acc: 0.8663 - val_loss: 0.4279 - val_acc: 0.8110
Epoch 9/10
20000/20000 [==============================] - 139s 7ms/step - loss: 0.3286 - acc: 0.8684 - val_loss: 0.3807 - val_acc: 0.8400
Epoch 10/10
20000/20000 [==============================] - 133s 7ms/step - loss: 0.3103 - acc: 0.8767 - val_loss: 0.4236 - val_acc: 0.8402
与在前一节中尝试的时间顺序LSTM中获得了几乎相同的性能。
因此,显著地,在这样的文本数据集上,颠倒顺序处理正好和时序处理一样,证实了我们的假设,尽管词序在理解语言方面是重要的,但是您使用的顺序并不重要。重要的是,训练在反向序列上的RNN将学习不同于原始序列训练的表达方式,这与在真实世界中如果时间倒流的话,你会拥有完全不同的心智模型——如果你生活在第一天就死了,而你出生在你的最后一天。在机器学习中,不同但有用的表示总是值得开发的,并且它们的差别越大越好:它们提供了一个新的角度,从中查看数据,捕获由其他方法错过的数据的方面,从而可以提高任务的性能。这是“合奏”背后的直觉,我们将在下一章中介绍这个概念。
为了在KARAS中实例化一个双向RNN,人们将使用Bidirectional,它的第一个参数是一个循环层实例。Bidirectional将创建这个递归层的第二个单独实例,并且将使用一个实例来按时间顺序处理输入序列,而另一个实例用于以相反的顺序处理输入序列。让我们尝试一下IMDB情绪分析任务:
model = Sequential()
model.add(layers.Embedding(max_features,32))
model.add(layers.Bidirectional(layers.LSTM(32)))
model.add(layers.Dense(1,activation = 'sigmoid'))
model.compile(optimizer = 'rmsprop',loss = 'binary_crossentropy',metrics = ['acc'])
history = model.fit(x_train,y_train,epochs = 10,batch_size = 128,validation_split = 0.2)
# 双向层的参数是单向层参数的两倍,所以会很快过拟合,添加一些正则化,双向层的性能会比单向层好很多
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 262s 13ms/step - loss: 0.6086 - acc: 0.6662 - val_loss: 0.5476 - val_acc: 0.7258
Epoch 2/10
20000/20000 [==============================] - 264s 13ms/step - loss: 0.4401 - acc: 0.8118 - val_loss: 0.4585 - val_acc: 0.7962
Epoch 3/10
20000/20000 [==============================] - 256s 13ms/step - loss: 0.3871 - acc: 0.8405 - val_loss: 0.5309 - val_acc: 0.7984
Epoch 4/10
20000/20000 [==============================] - 285s 14ms/step - loss: 0.3813 - acc: 0.8423 - val_loss: 0.4563 - val_acc: 0.8344
Epoch 5/10
20000/20000 [==============================] - 304s 15ms/step - loss: 0.3744 - acc: 0.8474 - val_loss: 0.3959 - val_acc: 0.8284
Epoch 6/10
20000/20000 [==============================] - 262s 13ms/step - loss: 0.3631 - acc: 0.8541 - val_loss: 0.4096 - val_acc: 0.8432
Epoch 7/10
20000/20000 [==============================] - 253s 13ms/step - loss: 0.3524 - acc: 0.8618 - val_loss: 0.4814 - val_acc: 0.7588
Epoch 8/10
20000/20000 [==============================] - 250s 13ms/step - loss: 0.3519 - acc: 0.8601 - val_loss: 0.6171 - val_acc: 0.7120
Epoch 9/10
20000/20000 [==============================] - 252s 13ms/step - loss: 0.3505 - acc: 0.8592 - val_loss: 0.3733 - val_acc: 0.8468
Epoch 10/10
20000/20000 [==============================] - 252s 13ms/step - loss: 0.3418 - acc: 0.8626 - val_loss: 0.3599 - val_acc: 0.8492
# 将相同的方法(双向GRU)用于温度预测任务
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.Bidirectional(layers.GRU(32),input_shape = (None,float_data.shape[-1])))
model.add(layers.Dense(1))
model.compile(optimizer = RMSprop(),loss = 'mae')
history = model.fit_generator(train_gen,steps_per_epoch = 500,epochs = 40,validation_data = val_gen,validation_steps = val_steps)
Epoch 1/40
500/500 [==============================] - 396s 792ms/step - loss: 0.2930 - val_loss: 0.2739
Epoch 2/40
500/500 [==============================] - 399s 799ms/step - loss: 0.2743 - val_loss: 0.2701
Epoch 3/40
500/500 [==============================] - 387s 774ms/step - loss: 0.2665 - val_loss: 0.2687
Epoch 4/40
500/500 [==============================] - 386s 772ms/step - loss: 0.2628 - val_loss: 0.2712
Epoch 5/40
500/500 [==============================] - 386s 772ms/step - loss: 0.2552 - val_loss: 0.2714
Epoch 6/40
500/500 [==============================] - 387s 773ms/step - loss: 0.2490 - val_loss: 0.2714
Epoch 7/40
500/500 [==============================] - 387s 775ms/step - loss: 0.2452 - val_loss: 0.2796
Epoch 8/40
500/500 [==============================] - 397s 794ms/step - loss: 0.2399 - val_loss: 0.2786
Epoch 9/40
500/500 [==============================] - 386s 773ms/step - loss: 0.2333 - val_loss: 0.2822
Epoch 10/40
500/500 [==============================] - 387s 774ms/step - loss: 0.2286 - val_loss: 0.2846
Epoch 11/40
500/500 [==============================] - 387s 773ms/step - loss: 0.2225 - val_loss: 0.2881
Epoch 12/40
500/500 [==============================] - 387s 773ms/step - loss: 0.2167 - val_loss: 0.2914
Epoch 13/40
500/500 [==============================] - 387s 773ms/step - loss: 0.2101 - val_loss: 0.2941
Epoch 14/40
500/500 [==============================] - 391s 782ms/step - loss: 0.2071 - val_loss: 0.2999
Epoch 15/40
500/500 [==============================] - 392s 783ms/step - loss: 0.2015 - val_loss: 0.3021
Epoch 16/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1966 - val_loss: 0.3073
Epoch 17/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1907 - val_loss: 0.3065
Epoch 18/40
500/500 [==============================] - 387s 775ms/step - loss: 0.1884 - val_loss: 0.3151
Epoch 19/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1845 - val_loss: 0.3168
Epoch 20/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1801 - val_loss: 0.3236
Epoch 21/40
500/500 [==============================] - 395s 790ms/step - loss: 0.1764 - val_loss: 0.3139
Epoch 22/40
500/500 [==============================] - 386s 773ms/step - loss: 0.1733 - val_loss: 0.3224
Epoch 23/40
500/500 [==============================] - 386s 773ms/step - loss: 0.1699 - val_loss: 0.3199
Epoch 24/40
500/500 [==============================] - 387s 773ms/step - loss: 0.1663 - val_loss: 0.3273
Epoch 25/40
500/500 [==============================] - 389s 777ms/step - loss: 0.1645 - val_loss: 0.3258
Epoch 26/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1615 - val_loss: 0.3275
Epoch 27/40
500/500 [==============================] - 399s 797ms/step - loss: 0.1589 - val_loss: 0.3309
Epoch 28/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1553 - val_loss: 0.3300
Epoch 29/40
500/500 [==============================] - 388s 776ms/step - loss: 0.1536 - val_loss: 0.3370
Epoch 30/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1527 - val_loss: 0.3322
Epoch 31/40
500/500 [==============================] - 387s 773ms/step - loss: 0.1507 - val_loss: 0.3339
Epoch 32/40
500/500 [==============================] - 386s 773ms/step - loss: 0.1479 - val_loss: 0.3392
Epoch 33/40
500/500 [==============================] - 392s 784ms/step - loss: 0.1463 - val_loss: 0.3361
Epoch 34/40
500/500 [==============================] - 388s 776ms/step - loss: 0.1451 - val_loss: 0.3395
Epoch 35/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1434 - val_loss: 0.3397
Epoch 36/40
500/500 [==============================] - 387s 774ms/step - loss: 0.1422 - val_loss: 0.3376
Epoch 37/40
500/500 [==============================] - 387s 773ms/step - loss: 0.1416 - val_loss: 0.3373
Epoch 38/40
500/500 [==============================] - 387s 773ms/step - loss: 0.1389 - val_loss: 0.3400
Epoch 39/40
500/500 [==============================] - 387s 773ms/step - loss: 0.1374 - val_loss: 0.3404
Epoch 40/40
500/500 [==============================] - 388s 776ms/step - loss: 0.1369 - val_loss: 0.3475
这个模型的表现与普通的GRU层表现差不多,其原因很容易理解,因为逆序对结果是没有什么贡献的,所有的贡献都来自正序的网络
小结
1.正如第一次在第4章学到的,当你遇到一个新的问题时,首先要为你的选择度量标准建立常识基线。如果没有需要打败的基准,你就不能判断你是否有任何真正的进步。
2.尝试复杂的模型之前,先用一个简单的模型尝试以下,有时一个简单的模型将会是你最好的选择。
3.在时间排序问题的数据上,循环神经网络是一个非常适合的选择,并且容易胜过首先将时态数据铺平的模型。
4.要使用循环网络的dropout,应该使用一个不随时间变化的dropout掩码与循环dropout掩码。这二者都内嵌在Kras的循环层中,所以你所要做的就是使用循环层的dropout和recurrrent_dropout参数即可。
5.堆叠RNN比单一RNN层性能更好,但其计算代价也更高。
6.双向RNS从两个方向看序列,是非常有用的自然语言处理问题。但如果在序列数据中,最近的数据比序列开头包含更多的信息,那么这种方法的效果就不明显