Time Series Forecasting — Informer implements multivariable load forecasting (PyTorch)

Table of contents

1 Experimental data set

2 How to run your own dataset

3 Error analysis


1 Experimental data set

The experimental data set uses Data Set 4: 2016 Electrician Mathematical Modeling Competition Load Forecasting Data Set (Download Link), the data set contains Date, maximum temperature ℃, minimum temperature ℃, average temperature ℃, relative humidity (average), rainfall (mm), daily demand load (KWh), and the time interval is 1H.

Process the relative data before using it. The same processing method is used when using other data sets. First read the data. The data sent is not in UTF-8 format. Read the data by adding encoding = 'gbk'.The data passed in by the model must be in UTF-8 format< /span>.

df= pd.read_table('E:\\课题\\08数据集\\2016年电工数学建模竞赛负荷预测数据集\\2016年电工数学建模竞赛负荷预测数据集.txt',encoding = 'gbk')

Then check the data for missing values:

df.isnull().sum()

It is found that there are a small number of missing values ​​in the data. After analyzing the characteristics of the data, the missing values ​​can be filled in by filling in the previous or subsequent items:

df = df.fillna(method='ffill')

You need to change the table column name to English later, and the time column name is date, otherwise an error will be reported when running later:

df.columns = ["date","max_temperature(℃)","Min_temperature(℃ )","Average_temperature(℃)","Relative_humidity(average)","Rainfall(mm)","Load"]

Finally save the data inUTF-8 format

load.to_csv('E:\\课题\\08数据集\\2016年电工数学建模竞赛负荷预测数据集\\2016年电工数学建模竞赛负荷预测数据集_处理后.csv', index=False,encoding = 'utf-8')

Finally, let’s take a visual look at the data:

# 可视化
load.drop(['date'], axis=1, inplace=True)
cols = list(load.columns)
fig = plt.figure(figsize=(16,6))
plt.tight_layout()
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=0.8)
for i in range(len(cols)):
    ax = fig.add_subplot(3,2,i+1)
    ax.plot(load.iloc[:,i])
    ax.set_title(cols[i])
    # plt.subplots_adjust(hspace=1)

2 How to run your own dataset

The previous two articles introduced the principles of the paper, code analysis, and official data set training and operation. Then there are several things you need to modify when using the model to train your own data set.

parser.add_argument('--data', type=str, default='custom', help='data')
parser.add_argument('--root_path', type=str, default='./data/Load/', help='root path of the data file')
parser.add_argument('--data_path', type=str, default='load.csv', help='data file')
parser.add_argument('--features', type=str, default='MS', help='forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate')
parser.add_argument('--target', type=str, default='Load', help='target feature in S or MS task')
parser.add_argument('--freq', type=str, default='h', help='freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h')
  • data: Default='custom' must be filled in, that is, changed to customized data
  • root_path: fill in the data folder path
  • data_path: fill in the specific data file name
  • Features: As explained earlier, features have three options (M, MS, S), which are multivariate prediction, multivariate prediction unit, and unit prediction unit. It depends on your own data set.
  • Target: It is the column name in your data set that you want to know the predicted value of that column. Change it to Load here.
  • freq: It is the time interval between your two pieces of data.
parser.add_argument('--seq_len', type=int, default=96, help='input sequence length of Informer encoder')
parser.add_argument('--label_len', type=int, default=48, help='start token length of Informer decoder')
parser.add_argument('--pred_len', type=int, default=24, help='prediction sequence length')
  • seq_len: How many pieces of past data are used to predict future data?
  • label_len: The part that can be split into higher weights is smaller than seq_len
  • pred_len: predict how many time points in the future the data will be
parser.add_argument('--enc_in', type=int, default=6, help='encoder input size')
parser.add_argument('--dec_in', type=int, default=6, help='decoder input size')
parser.add_argument('--c_out', type=int, default=1, help='output size')
  • enc_in: How many columns does your data have? You need to subtract the time column. Here I am entering 8 columns of data, but one column is time, so I fill in 7.
  • dec_in: Same as above
  • c_out: There are some differences here. If your features are filled in M, then it is the same as above. If you fill in MS, then you need to enter 1 here because your output only has one column of data.
## 解析数据集的信息 ##
# 字典data_parser中包含了不同数据集的信息,键值为数据集名称('ETTh1'等),对应一个包含.csv数据文件名
# 目标特征、M、S和MS等参数的字典
data_parser = {
    'ETTh1':{'data':'ETTh1.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'ETTh2':{'data':'ETTh2.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'ETTm1':{'data':'ETTm1.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'ETTm2':{'data':'ETTm2.csv','T':'OT','M':[7,7,7],'S':[1,1,1],'MS':[7,7,1]},
    'WTH':{'data':'WTH.csv','T':'WetBulbCelsius','M':[12,12,12],'S':[1,1,1],'MS':[12,12,1]},
    'ECL':{'data':'ECL.csv','T':'MT_320','M':[321,321,321],'S':[1,1,1],'MS':[321,321,1]},
    'Solar':{'data':'solar_AL.csv','T':'POWER_136','M':[137,137,137],'S':[1,1,1],'MS':[137,137,1]},
    'Custom':{'data':'load.csv','T':'Load','M':[137,137,137],'S':[1,1,1],'MS':[6,6,1]},
}

The prediction results are saved in the result file in numpy format. The prediction results can be visualized through the following script:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 指定.npy文件路径
file_path1 = "results/informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_test_0/true.npy"
file_path2 = "results/informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_test_1/pred.npy"

# 使用NumPy加载.npy文件
true_value = []
pred_value = []
data1 = np.load(file_path1)
data2 = np.load(file_path2)
print(data2)
for i in range(24):
    true_value.append(data2[0][i][6])
    pred_value.append(data1[0][i][6])

# 打印内容
print(true_value)
print(pred_value)

#保存数据
df = pd.DataFrame({'real': true_value, 'pred': pred_value})
df.to_csv('results.csv', index=False)

#绘制图形
fig = plt.figure(figsize=( 16, 8))
plt.plot(df['real'], marker='o', markersize=8)
plt.plot(df['pred'], marker='o', markersize=8)
plt.tick_params(labelsize = 28)
plt.legend(['real','pred'],fontsize=28)
plt.show()

The final prediction effect is as follows. It is found that it is not very good. We will see whether the model prediction effect can be improved after parameter tuning.​ 

3 Error analysis

Error 1: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 56-57: invalid continuation byte,Specifically , the 'utf-8' codec cannot decode some bytes in the file because they do not comply with the rules of UTF-8 encoding.

  File "D:\Progeam Files\python\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 548, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 637, in pandas._libs.parsers.TextReader._get_header
  File "pandas\_libs\parsers.pyx", line 848, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 859, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "pandas\_libs\parsers.pyx", line 2017, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 56-57: invalid continuation byte

Solution:

(1) According to the prompts, if you want to change the data to "utf-8" format, the easiest way is to open the data with Notepad and save it in UTF-8 format.  

(2) Try using another codec (such as 'latin1') to read the file, or specify the correct encoding format when reading the file.

 Error 2: ValueError: list.remove(x): x not in list, Try to delete two elements from the list, but at least one of the two elements is not in the list List.

File "E:\课题\07代码\Informer2020-main\Informer2020-main\data\data_loader.py", line 241, in __read_data__
cols = list(df_raw.columns); cols.remove(self.target); cols.remove('date')
ValueError: list.remove(x): x not in list

Solution: When no specific reason is foundYou can check whether the list contains the element to be deleted before deleting the element, or use the try-except statement to catch the exception. , so that the program does not break if the element does not exist. After checking, it is best to change the column names in the data to English to avoid garbled characters.

if self.cols:
    cols=self.cols.copy()
    cols.remove(self.target)
else:
    # 添加调试信息
    cols = list(df_raw.columns)
    print(cols)  # 输出列的内容
    if self.target in cols:
        cols.remove(self.target)
    else:
        print(f"{self.target} not in columns")
    if 'date' in cols:
        cols.remove('date')
    else:
        print("date not in columns")
    # 添加调试信息

    cols = list(df_raw.columns); cols.remove(self.target); cols.remove('date')
df_raw = df_raw[['date']+cols+[self.target]]

 

Guess you like

Origin blog.csdn.net/qq_41921826/article/details/134619199