reference:
- https://tensorflow.google.cn/tutorials/structured_data/time_series
A time series prediction
1.1, the data set
#显示所有列(参数设置为None代表显示所有行,也可以自行设置数字)
pd.set_option('display.max_columns',None)
#禁止自动换行(设置为Flase不自动换行,True反之)
pd.set_option('expand_frame_repr', False)
def loadWeatherData():
# 如果直接下载不了,将提前下载好的数据集放到指定的dataset目录(xx\.keras\datasets\)
zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)
print(csv_path)
# 使用pands读取csv文件
df = pd.read_csv(csv_path)
print(df.head())
A record every 10 minutes
as shown above, was observed every 10 minutes of recording time. This means that within an hour, you will have six observation. Similarly, one day containing 144 (6x24) observations.
Given a particular time, if you want to predict the next 6 hours temperature. In order to make predictions, you choose to use five days of observation. Therefore, you will create a final containing 720 (5x144) observations of the window to train the model. There may be many such configurations, which makes this data integration is a good experiment.
The following function returns the above-described model training time window. Parameter history_size
is the size of the last window of information. target_size is a need to predict the label.
def univariate_data(dataset, start_index, end_index, history_size, target_size):
'''
dataset:
start_index:
end_index:
history_size:
target_size:
@return: 特征, 标签
'''
data = []
labels = []
start_index = start_index + history_size
if end_index is None:
end_index = len(dataset) - target_size
for i in range(start_index, end_index):
indices = range(i-history_size, i)
# Reshape data from (history_size,) to (history_size, 1)
data.append(np.reshape(dataset[indices], (history_size, 1)))
labels.append(dataset[i+target_size])
return np.array(data), np.array(labels)
1.2, a single variable prediction
1.2.1 Single facilitate extraction, the extraction temperature is in this case
# 抽取单一变量, 此处为温度,以供使用预测
uni_data = df['T (degC)']
# 线性的数据结构, series是一个一维数组
# Pandas 会默然用0到n-1来作为series的index, 但也可以自己指定index( 可以把index理解为dict里面的key )
uni_data.index = df['Date Time']
print(type(uni_data), '\n' ,uni_data.head())
data visualization
#可视化
uni_data.plot(subplots=True)
plt.show()
1.2.2, normalized data set
Note: The mean and standard deviation should only be computed using the training data.
# 训练数据的平均及标准差
uni_train_mean = uni_data[:TRAIN_SPLIT].mean()
uni_train_std = uni_data[:TRAIN_SPLIT].std()
# 归一化
uni_data = (uni_data - uni_train_mean)/uni_train_std
1.2.3 split training set and a validation data set
# 训练集
x_train_uni, y_train_uni = univariate_data(uni_data, 0, TRAIN_SPLIT,
univariate_past_history,
univariate_future_target)
# 验证集
x_val_uni, y_val_uni = univariate_data(uni_data, TRAIN_SPLIT, None,
univariate_past_history,
univariate_future_target)
print ('Single window of past history')
print (x_train_uni[0])
print ('\n Target temperature to predict')
print (y_train_uni[0])
The blue line is the next to the data network training, a red cross on the data to be predicted
def create_time_steps(length):
return list(range(-length, 0))
def baseline(history):
return np.mean(history)
def show_plot(plot_data, delta, title):
labels = ['History', 'True Future', 'Model Prediction']
marker = ['.-', 'rx', 'go']
time_steps = create_time_steps(plot_data[0].shape[0])
if delta:
future = delta
else:
future = 0
plt.title(title)
for i, x in enumerate(plot_data):
if i:
plt.plot(future, plot_data[i], marker[i], markersize=10,
label=labels[i])
else:
plt.plot(time_steps, plot_data[i].flatten(), marker[i], label=labels[i])
plt.legend()
plt.xlim([time_steps[0], (future+5)*2])
plt.xlabel('Time-Step')
return plt
show_plot([x_train_uni[0], y_train_uni[0]], 0, 'Sample Example')
plt.show()
Before continuing training model, let's start with a simple baseline. Given an input point baseline method to view all history, and predict the next point is the average of the last 20 observations.
show_plot([x_train_uni[0], y_train_uni[0], baseline(x_train_uni[0])], 0,
'Baseline Prediction Example')
plt.show()
Let's see if you can use recurrent neural networks to go beyond this baseline.