Time series forecasting - GRU implements multi-variable multi-step photovoltaic forecasting (Tensorflow)

Table of contents

1 Data processing

1.1 Introduction to data sets

1.2 Import library files

1.3 Data set processing

1.4 Training data structure

2 Model training and prediction

2.1 Model training

2.2 Model multi-step prediction

2.3 Prediction visualization


Number of calculations

1.1 Introduction to data sets

The experimental data set uses data set 7: Changzhou Bridgestone photovoltaic data set (Download link), including the data set including time, Five characteristics including site name, irradiation intensity (Wh/㎡), ambient temperature (℃), and total site power (kW), with a time interval of 5 minutes. (Note: There is a space before the irradiation intensity (Wh/㎡), ambient temperature (℃), and full-field power (kW) feature names) a>

# 可视化数据
def visualize_data(data, row, col):
    cycol = cycle('bgrcmk')
    cols = list(data.columns)
    fig, axes = plt.subplots(row, col, figsize=(16, 4))
    fig.tight_layout()
    if row == 1 and col == 1:  # 处理只有1行1列的情况
        axes = [axes]  # 转换为列表,方便统一处理
    for i, ax in enumerate(axes.flat):
        if i < len(cols):
            ax.plot(data.iloc[:,i], c=next(cycol))
            ax.set_title(cols[i])
        else:
            ax.axis('off')  # 如果数据列数小于子图数量,关闭多余的子图
    plt.subplots_adjust(hspace=0.5)
    plt.show()

visualize_data(data, 1, 3)

Looking at some photovoltaic power generation data separately, we found that there are strong regularities.

1.2 Import library files

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dropout, Dense
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from itertools import cycle

import joblib
import datetime


plt.rcParams['font.sans-serif'] = ['SimHei']     # 显示中文
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams.update({'font.size':18})

1.3 Data set processing

First check the missing values ​​of the data. From the statistical data, we can see that there are a small number of missing values.

# 缺失值统计
data.isnull().sum()

Invalid information about time and station name can be deleted. There are a few missing values ​​in irradiation intensity (Wh/㎡), ambient temperature (℃), and full-site power (kW), which can be filled in with the values ​​of the preceding and following items (the missing values ​​here can be filled according to your own method processing).​ 

# 特征删除和缺失值填充
data.drop(['时间','场站名称'], axis=1, inplace=True)
data = data.fillna(method='ffill')
# 调整列位置
data = data[[' 辐照强度(Wh/㎡)', ' 环境温度(℃)', ' 全场功率(kW)']]

The data is then converted into numerical types for subsequent processing.

dataf = data.values

1.4 Training data structure

It is planned to have 96 data 1/4 days after the prediction. The data to be predicted will be retained (that is, the unknown data in the future), and the previously trained data (that is, historical data) will be extracted separately, and the data set will be divided on a rolling basis. Features and Labels are divided separately.

#构造数据集
def create_dataset(datasetx,datasety,timesteps=36,predict_size=6):
    datax=[]#构造x
    datay=[]#构造y
    for each in range(len(datasetx)-timesteps - predict_steps):
        x = datasetx[each:each+timesteps,0:6]
        y = datasety[each+timesteps:each+timesteps+predict_steps,0]
        datax.append(x)
        datay.append(y)
    return datax, datay#np.array(datax),np.array(datay)

Then set the time step of prediction, the step size of each prediction, and the final total prediction step size. The parameters can be changed as needed. Different from the previous article, there is no rolling prediction here because there is no continuous feature input. Rolling prediction can be made when features are input in actual applications.

timesteps = 96*5 #构造x,为96*5个数据,表示每次用前5/4天的数据作为一段
predict_steps = 96 #构造y,为96个数据,表示用后1/4的数据作为一段
length = 96 #预测多步,预测96个数据据

Then the data is normalized, the features and labels are divided separately, and normalized separately.

# 特征和标签分开划分
datafx = dataf[:,:-1]
datafy = dataf[:,-1].reshape(dataf.shape[0],1)

# 分开进行归一化处理
scaler1 = MinMaxScaler(feature_range=(0,1))
scaler2 = MinMaxScaler(feature_range=(0,1))
datafx = scaler1.fit_transform(datafx)
datafy = scaler2.fit_transform(datafy)

Finally, the row data set is divided and the data is transformed into data that meets the model format requirements.

trainx, trainy = create_dataset(datafx[:-predict_steps*6,:],datafy[:-predict_steps*6],timesteps, predict_steps)
trainx = np.array(trainx)
trainy = np.array(trainy)

Model introductionYoshogi

2.1 Model training

First, build the general operation of the model, and then use the training data trainx and trainy for training, and perform training for 20 epochs. Each batch contains 128 samples. At this time, input_shape is the shape of each x when dividing the data set. (It is recommended to use GPU for training, because my computer performance is limited, it is recommended to increase the epochs value)

# Define the GPU device
physical_devices = tf.config.list_physical_devices('GPU')
if physical_devices:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

# GRU training
start_time = datetime.datetime.now()
model = Sequential()
model.add(GRU(128, input_shape=(timesteps, trainx.shape[2]), return_sequences=True))
model.add(Dropout(0.5))
model.add(GRU(128, return_sequences=True))
model.add(GRU(64, return_sequences=False))
model.add(Dense(predict_steps))
model.compile(loss="mean_squared_error", optimizer="adam")
model.fit(trainx, trainy, epochs=20, batch_size=128)
end_time = datetime.datetime.now()
running_time = end_time - start_time

# 保存模型
model.save('gru_model.h5')

2.2 Model multi-step prediction

The following introduces the most important method in the article, which is also the method to predict future labels without real future features. The overall idea is to train the next 96 future data through the first 96*5 data, and when predicting, take the first 96*5 data to predict the next 96 future data. This is different from single-variable prediction. There is no rolling prediction because the results of single-variable prediction can be rolled as historical data. Multi-variable here only generates predicted values ​​and has no prediction labels. Rolling prediction cannot be performed. In fact, there is a continuous flow of data. Rolling forecast can be used. (The data inside can be changed according to needs)

​First, extract the data that needs to be brought into the model, that is, the 96*5 rows of features before prediction and the last 96 labels.

y_true = dataf[-96:,-1]
predictx = datafx[-96*6:-96]

Then load the trained model:

# 加载模型
from tensorflow.keras.models import load_model
model = load_model('gru_model.h5')

2.3 Prediction visualization

Predict, calculate and visualize errors, encapsulating these steps as functions.

def predict_and_plot(x, y_true, model, scaler, timesteps):
    # 变换输入x格式,适应LSTM模型
    predict_x = np.reshape(x, (1, timesteps, 2))  
    # 预测
    predict_y = model.predict(predict_x)
    predict_y = scaler.inverse_transform(predict_y)
    y_predict = []
    y_predict.extend(predict_y[0])
    # 计算误差
    train_score = np.sqrt(mean_squared_error(y_true, y_predict))
    print("train score RMSE: %.2f" % train_score)

    # 预测结果可视化
    cycol = cycle('bgrcmk')
    plt.figure(dpi=100, figsize=(14, 5))
    plt.plot(y_true, c=next(cycol), markevery=5)
    plt.plot(y_predict, c=next(cycol), markevery=5)
    plt.legend(['y_true', 'y_predict'])
    plt.xlabel('时间')
    plt.ylabel('功率(kW)')
    plt.show()
    
    return y_predict

Finally, after running the results, it was found that the prediction effect roughly captured the trend. There was a certain degree of fluctuation in the prediction value, and there were also situations where the power value was less than 0, which can be handled by yourself.

y_predict = predict_and_plot(predictx1, y_true1, model, scaler2, timesteps)

 

Guess you like

Origin blog.csdn.net/qq_41921826/article/details/134809538