Time Series Prediction

reference:

  • https://tensorflow.google.cn/tutorials/structured_data/time_series

A time series prediction

1.1, the data set

#显示所有列(参数设置为None代表显示所有行,也可以自行设置数字)
pd.set_option('display.max_columns',None)
#禁止自动换行(设置为Flase不自动换行,True反之)
pd.set_option('expand_frame_repr', False)

def loadWeatherData():
    # 如果直接下载不了,将提前下载好的数据集放到指定的dataset目录(xx\.keras\datasets\)
    zip_path = tf.keras.utils.get_file(
        origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
        fname='jena_climate_2009_2016.csv.zip',
        extract=True)
    csv_path, _ = os.path.splitext(zip_path)
    print(csv_path)

    # 使用pands读取csv文件
    df = pd.read_csv(csv_path)
    print(df.head())

A record every 10 minutes
Here Insert Picture Description
as shown above, was observed every 10 minutes of recording time. This means that within an hour, you will have six observation. Similarly, one day containing 144 (6x24) observations.

Given a particular time, if you want to predict the next 6 hours temperature. In order to make predictions, you choose to use five days of observation. Therefore, you will create a final containing 720 (5x144) observations of the window to train the model. There may be many such configurations, which makes this data integration is a good experiment.

The following function returns the above-described model training time window. Parameter history_sizeis the size of the last window of information. target_size is a need to predict the label.

def univariate_data(dataset, start_index, end_index, history_size, target_size):
    '''
    dataset:
    start_index:
    end_index:
    history_size:
    target_size:

    @return: 特征, 标签
    '''

    data = []
    labels = []

    start_index = start_index + history_size
    if end_index is None:
        end_index = len(dataset) - target_size

    for i in range(start_index, end_index):
        indices = range(i-history_size, i)
        # Reshape data from (history_size,) to (history_size, 1)
        data.append(np.reshape(dataset[indices], (history_size, 1)))
        labels.append(dataset[i+target_size])

    return np.array(data), np.array(labels)

1.2, a single variable prediction

1.2.1 Single facilitate extraction, the extraction temperature is in this case

# 抽取单一变量, 此处为温度,以供使用预测
uni_data = df['T (degC)']
# 线性的数据结构, series是一个一维数组
# Pandas 会默然用0到n-1来作为series的index, 但也可以自己指定index( 可以把index理解为dict里面的key )
uni_data.index = df['Date Time']
print(type(uni_data), '\n' ,uni_data.head())

data visualization

#可视化
uni_data.plot(subplots=True)
plt.show()

Here Insert Picture Description

1.2.2, normalized data set

Note: The mean and standard deviation should only be computed using the training data.

# 训练数据的平均及标准差
uni_train_mean = uni_data[:TRAIN_SPLIT].mean()
uni_train_std = uni_data[:TRAIN_SPLIT].std()

# 归一化
uni_data = (uni_data - uni_train_mean)/uni_train_std

1.2.3 split training set and a validation data set

# 训练集
x_train_uni, y_train_uni = univariate_data(uni_data, 0, TRAIN_SPLIT,
                                           univariate_past_history,
                                           univariate_future_target)
# 验证集
x_val_uni, y_val_uni = univariate_data(uni_data, TRAIN_SPLIT, None,
                                   univariate_past_history,
                                   univariate_future_target)

print ('Single window of past history')
print (x_train_uni[0])
print ('\n Target temperature to predict')
print (y_train_uni[0])

The blue line is the next to the data network training, a red cross on the data to be predicted

def create_time_steps(length):
    return list(range(-length, 0))

def baseline(history):
    return np.mean(history)

def show_plot(plot_data, delta, title):
    labels = ['History', 'True Future', 'Model Prediction']
    marker = ['.-', 'rx', 'go']
    time_steps = create_time_steps(plot_data[0].shape[0])
    if delta:
        future = delta
    else:
        future = 0

    plt.title(title)
    for i, x in enumerate(plot_data):
        if i:
            plt.plot(future, plot_data[i], marker[i], markersize=10,
                     label=labels[i])
        else:
            plt.plot(time_steps, plot_data[i].flatten(), marker[i], label=labels[i])
    plt.legend()
    plt.xlim([time_steps[0], (future+5)*2])
    plt.xlabel('Time-Step')
    return plt

show_plot([x_train_uni[0], y_train_uni[0]], 0, 'Sample Example')
plt.show()

Here Insert Picture Description

Before continuing training model, let's start with a simple baseline. Given an input point baseline method to view all history, and predict the next point is the average of the last 20 observations.

show_plot([x_train_uni[0], y_train_uni[0], baseline(x_train_uni[0])], 0,
          'Baseline Prediction Example')
plt.show()

Here Insert Picture Description
Let's see if you can use recurrent neural networks to go beyond this baseline.

1.2.4, RNN

1.3, multivariable predictive

Published 784 original articles · won praise 90 · views 440 000 +

Guess you like

Origin blog.csdn.net/wuxintdrh/article/details/103921944