[Notas de lectura] TPA-LSTM para series temporales (incluida la implementación del código Pytorch)

Este artículo es un resumen y una reflexión después de leer el artículo. No implica la traducción del artículo ni la interpretación del modelo. Es adecuado para que todos intercambien ideas después de leer el artículo. Para la traducción del artículo, puede revisa las referencias. Dirección del artículo: https://arxiv.org/abs/1809.04206

1. Resumen del texto completo

Este artículo propone el uso de un conjunto de filtros para extraer patrones temporales invariantes en el tiempo (CNN) , similar a convertir datos de series de tiempo en su "dominio de frecuencia". Luego, proponemos un nuevo mecanismo de atención para seleccionar series de tiempo relevantes y utilizar su información en el dominio de frecuencia para la predicción multivariada. Este artículo aplica el modelo propuesto a varias tareas del mundo real y logra un rendimiento de vanguardia en casi todos los casos.
Insertar descripción de la imagen aquí

2. Métodos de investigación

  1. Se propone un nuevo mecanismo de atención, a saber, atención de patrón temporal (TPA) , donde "patrón temporal" se refiere a cualquier patrón invariante en el tiempo en múltiples pasos de tiempo .
  2. En TPA, en lugar de seleccionar pasos de tiempo relevantes como un mecanismo de atención típico, la máquina aprende a seleccionar series de tiempo relevantes . Se introduce una red neuronal convolucional (CNN) para extraer información de patrón temporal de cada variable individual .

3. Conclusión

Este artículo toma la predicción MTS como objeto de investigación y propone un nuevo mecanismo de atención de patrones temporales , que elimina las limitaciones de los mecanismos de atención típicos en tales tareas. Se permite caracterizar la dimensión de atención para que el modelo aprenda las interdependencias entre múltiples variables no solo dentro del mismo paso de tiempo sino también en todos los tiempos y secuencias anteriores . Los experimentos tanto con ejemplos de juguetes como con conjuntos de datos del mundo real respaldan firmemente esta idea y muestran que el modelo propuesto logra resultados de última generación.

4. Puntos de innovación

Los mecanismos de atención típicos generalmente solo se centran en unos pocos pasos de tiempo, lo que dificulta la identificación de patrones periódicos que abarcan múltiples pasos de tiempo. Este artículo presenta un nuevo concepto de atención, donde seleccionamos variables relevantes en lugar de pasos de tiempo relevantes . Este método es simple, versátil y adecuado para RNN.

5. Pensando

Después de la simulación, el modelo tiene un efecto relativamente obvio.

6. Referencias

  1. Mecanismo de atención TPA (TPA-LSTM)

7. Implementación de Pytorch⭐

La siguiente referencia de código: https://github.com/jingw2/demand_forecast , se han corregido algunos errores en el código original y se han agregado algunos comentarios necesarios para una mejor comprensión.

import torch 
from torch import nn
import torch.nn.functional as F 
from torch.optim import Adam

import numpy as np
import math
import os
import random
import matplotlib.pyplot as plt
import pickle
from tqdm import tqdm
import pandas as pd
from sklearn.preprocessing import StandardScaler
from datetime import date
import argparse
from progressbar import *

util(función de herramienta)

def get_data_path():
    folder = os.path.dirname(__file__)
    return os.path.join(folder, "data")

def RSE(ypred, ytrue):
    rse = np.sqrt(np.square(ypred - ytrue).sum()) / \
            np.sqrt(np.square(ytrue - ytrue.mean()).sum())
    return rse

def quantile_loss(ytrue, ypred, qs):
    '''
    Quantile loss version 2
    Args:
    ytrue (batch_size, output_horizon)
    ypred (batch_size, output_horizon, num_quantiles)
    '''
    L = np.zeros_like(ytrue)
    for i, q in enumerate(qs):
        yq = ypred[:, :, i]
        diff = yq - ytrue
        L += np.max(q * diff, (q - 1) * diff)
    return L.mean()

def SMAPE(ytrue, ypred):
    ytrue = np.array(ytrue).ravel()
    ypred = np.array(ypred).ravel() + 1e-4
    mean_y = (ytrue + ypred) / 2.
    return np.mean(np.abs((ytrue - ypred) \
        / mean_y))

def MAPE(ytrue, ypred):
    ytrue = np.array(ytrue).ravel() + 1e-4
    ypred = np.array(ypred).ravel()
    return np.mean(np.abs((ytrue - ypred) \
        / ytrue))

def train_test_split(X, y, train_ratio=0.7):
    '''
    - X (array like): shape (num_samples, num_periods, num_features)
    - y (array like): shape (num_samples, num_periods)
    '''
    num_ts, num_periods, num_features = X.shape
    train_periods = int(num_periods * train_ratio)
    random.seed(2)
    Xtr = X[:, :train_periods, :]
    ytr = y[:, :train_periods]
    Xte = X[:, train_periods:, :]
    yte = y[:, train_periods:]
    return Xtr, ytr, Xte, yte

class StandardScaler:
    
    def fit_transform(self, y):
        self.mean = np.mean(y)
        self.std = np.std(y) + 1e-4
        return (y - self.mean) / self.std
    
    def inverse_transform(self, y):
        return y * self.std + self.mean

    def transform(self, y):
        return (y - self.mean) / self.std

class MaxScaler:

    def fit_transform(self, y):
        self.max = np.max(y)
        return y / self.max
    
    def inverse_transform(self, y):
        return y * self.max

    def transform(self, y):
        return y / self.max


class MeanScaler:
    
    def fit_transform(self, y):
        self.mean = np.mean(y)
        return y / self.mean
    
    def inverse_transform(self, y):
        return y * self.mean

    def transform(self, y):
        return y / self.mean

class LogScaler:

    def fit_transform(self, y):
        return np.log1p(y)
    
    def inverse_transform(self, y):
        return np.expm1(y)

    def transform(self, y):
        return np.log1p(y)


def gaussian_likelihood_loss(z, mu, sigma):
    '''
    Gaussian Liklihood Loss
    Args:
    z (tensor): true observations, shape (num_ts, num_periods)
    mu (tensor): mean, shape (num_ts, num_periods)
    sigma (tensor): standard deviation, shape (num_ts, num_periods)
    likelihood: 
    (2 pi sigma^2)^(-1/2) exp(-(z - mu)^2 / (2 sigma^2))
    log likelihood:
    -1/2 * (log (2 pi) + 2 * log (sigma)) - (z - mu)^2 / (2 sigma^2)
    '''
    negative_likelihood = torch.log(sigma + 1) + (z - mu) ** 2 / (2 * sigma ** 2) + 6
    return negative_likelihood.mean()

def negative_binomial_loss(ytrue, mu, alpha):
    '''
    Negative Binomial Sample
    Args:
    ytrue (array like)
    mu (array like)
    alpha (array like)
    maximuze log l_{nb} = log Gamma(z + 1/alpha) - log Gamma(z + 1) - log Gamma(1 / alpha)
                - 1 / alpha * log (1 + alpha * mu) + z * log (alpha * mu / (1 + alpha * mu))
    minimize loss = - log l_{nb}
    Note: torch.lgamma: log Gamma function
    '''
    batch_size, seq_len = ytrue.size()
    likelihood = torch.lgamma(ytrue + 1. / alpha) - torch.lgamma(ytrue + 1) - torch.lgamma(1. / alpha) \
        - 1. / alpha * torch.log(1 + alpha * mu) \
        + ytrue * torch.log(alpha * mu / (1 + alpha * mu))
    return - likelihood.mean()

def batch_generator(X, y, num_obs_to_train, seq_len, batch_size):
    '''
    Args:
    X (array like): shape (num_samples, train_periods, num_features)
    y (array like): shape (num_samples, train_periods)
    num_obs_to_train (int): 训练的历史窗口长度
    seq_len (int): sequence/encoder/decoder length
    batch_size (int)
    '''
    num_ts, num_periods, _ = X.shape
    if num_ts < batch_size:
        batch_size = num_ts
    t = random.choice(range(num_obs_to_train, num_periods-seq_len)) # 从num_obs_to_train和num_periods-seq_len-1之间随机选一个整数,作为预测点
    batch = random.sample(range(num_ts), batch_size) # 从num_ts条数据中随机选择batch_size条
    X_train_batch = X[batch, t-num_obs_to_train:t, :] # (batch_size, num_obs_to_train, num_features)
    y_train_batch = y[batch, t-num_obs_to_train:t] # (batch_size, num_obs_to_train)
    Xf = X[batch, t:t+seq_len, :] # (batch_size, seq_len, num_features)
    yf = y[batch, t:t+seq_len] # (batch_size, seq_len)
    return X_train_batch, y_train_batch, Xf, yf

Modelo

class TemporalPatternAttention(nn.Module):

    def __init__(self, filter_size, filter_num, attn_len, attn_size):
        super(TemporalPatternAttention, self).__init__()
        self.filter_size = filter_size # 1
        self.filter_num = filter_num
        self.feat_size = attn_size - self.filter_size + 1 # hidden_size
        self.conv = nn.Conv2d(1, filter_num, (attn_len, filter_size))
        self.linear1 = nn.Linear(attn_size, filter_num)
        self.linear2 = nn.Linear(attn_size + self.filter_num, attn_size)
        self.relu = nn.ReLU()
    
    def forward(self, H, ht): # H:(batch_size, 1, obs_len-1, hidden_size) ht:(batch_size, hidden_size)       
        _, channels, _, attn_size = H.size()

        conv_vecs = self.conv(H) # (batch_size, filter_num, 1, hidden_size)      
        conv_vecs = conv_vecs.view(-1, self.feat_size, self.filter_num) # (batch_size, hidden_size, filter_num)
        conv_vecs = self.relu(conv_vecs) # (batch_size, hidden_size, filter_num)
        
        # score function
        htt = self.linear1(ht) # (batch_size, filter_num) 
        htt = htt.view(-1, self.filter_num, 1) # (batch_size, filter_num, 1)
        s = torch.bmm(conv_vecs, htt) # (batch_size, hidden_size, 1)
        alpha = torch.sigmoid(s) # (batch_size, hidden_size, 1)
        v = torch.bmm(conv_vecs.view(-1,self.filter_num,attn_size), alpha).view(-1, self.filter_num) # (batch_size, filter_num)
        
        concat = torch.cat([ht, v], dim=1) # (batch_size, hidden_size+filter_num)
        new_ht = self.linear2(concat) # (batch_size, hidden_size)
        return new_ht

class TPALSTM(nn.Module):

    def __init__(self, input_size, output_horizon, hidden_size, obs_len, n_layers):
        super(TPALSTM, self).__init__()
        self.hidden = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.lstm = nn.LSTM(hidden_size, hidden_size, n_layers, \
                    bias=True, batch_first=True) # output (batch_size, obs_len, hidden_size)
        self.hidden_size = hidden_size
        self.filter_num = 16
        self.filter_size = 1
        self.output_horizon = output_horizon
        self.attention = TemporalPatternAttention(self.filter_size, \
            self.filter_num, obs_len-1, hidden_size)
        self.linear = nn.Linear(hidden_size, output_horizon)
        self.n_layers = n_layers

    def forward(self, x):
        batch_size, obs_len, features_size = x.shape #(batch_size, obs_len, features_size)
        xconcat = self.hidden(x) #(batch_size, obs_len, hidden_size)

        H = torch.zeros(batch_size, obs_len-1, self.hidden_size).to(device) #(batch_size, obs_len-1, hidden_size)
        ht = torch.zeros(self.n_layers, batch_size, self.hidden_size).to(device) # (num_layers, batch_size, hidden_size)
        ct = ht.clone()
        for t in range(obs_len):
            xt = xconcat[:, t, :].view(batch_size, 1, -1) #(batch_size, 1, hidden_size)
            out, (ht, ct) = self.lstm(xt, (ht, ct)) # ht size (num_layers, batch_size, hidden_size)
            htt = ht[-1, :, :] # (batch_size, hidden_size)
            if t != obs_len - 1:
                H[:, t, :] = htt
        H = self.relu(H) #(batch_size, obs_len-1, hidden_size)
        
        # reshape hidden states H
        H = H.view(batch_size, 1, obs_len-1, self.hidden_size) #(batch_size, 1, obs_len-1, hidden_size)
        new_ht = self.attention(H, htt) # (batch_size, hidden_size)
        ypred = self.linear(new_ht) # (batch_size, output_horizon)
        return ypred

Cargar datos

num_epoches = 100
step_per_epoch = 3 #在一个epoch中,从训练集中提取step_per_epoch次训练数据
lr = 1e-3
n_layers = 1
hidden_size = 24
seq_len = 30 #预测的未来窗口长度
num_obs_to_train = 168  #训练的历史窗口长度
num_results_to_sample = 10
show_plot = True
run_test = True
standard_scaler = True
log_scaler = False
mean_scaler = False
max_scaler = False
batch_size = 128

device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 读取数据
data = pd.read_csv("LD_MT200_hour.csv", parse_dates=["date"])
data["year"] = data["date"].apply(lambda x: x.year)
data["day_of_week"] = data["date"].apply(lambda x: x.dayofweek)
data = data.loc[(data["date"].dt.date >= date(2014, 1, 1)) & (data["date"].dt.date <= date(2014, 3, 1))]
print(data.shape)
plt.figure(figsize=(16, 4)) 
plt.plot(data['MT_200'])
data.head()

Insertar descripción de la imagen aquí

# 数据预处理
features = ["hour", "day_of_week"]
# hours = pd.get_dummies(data["hour"])
# dows = pd.get_dummies(data["day_of_week"])
years = data["year"]
hours = data["hour"]
dows = data["day_of_week"]
MT_200 = np.asarray(data["MT_200"]).reshape(-1,1)
yscaler1 = StandardScaler()
MT_200 = yscaler1.fit_transform(MT_200)
X = np.c_[np.asarray(hours),np.asarray(dows),np.asarray(MT_200)] #X:(len,features)
num_features = X.shape[1]
num_periods = len(data)
X = np.asarray(X).reshape((-1, num_periods, num_features))
y = np.asarray(data["MT_200"]).reshape((-1, num_periods))
print("X_shape=",X.shape) # (series_num,len,features_num)
print("y_shape=",y.shape) # (series_num,len)
# X = np.tile(X, (10, 1, 1))
# y = np.tile(y, (10, 1))

输出:
X_shape= (1, 1440, 3)
y_shape= (1, 1440)
def sliding_window(DataSet, width, multi_vector = True): #DataSet has to be as an Array
    if multi_vector: #三维 (num_samples,length,features)
        num_samples,length,features = DataSet.shape
    else: #二维 (num_samples,length)
        DataSet = DataSet[:,:,np.newaxis] #(num_samples,length,1)
        num_samples,length,features = DataSet.shape

    x = DataSet[:,0:width,:] #(num_samples,width,features)
    x = x[np.newaxis,:,:,:] #(1,num_samples,width,features)
    for i in range(length - width):
        i += 1
        tmp = DataSet[:,i:i + width,:]#(num_samples,width,features)
        tmp = tmp[np.newaxis,:,:,:] #(1,num_samples,width,features)
        x = np.concatenate([x,tmp],0) #(i+1,num_samples,width,features)
    return x
    
width = num_obs_to_train + seq_len 
X_data = sliding_window(X, width, multi_vector = True) #(len-width+1,num_samples,width,features)
Y_data = sliding_window(y, width, multi_vector = False) #(len-width+1,num_samples,width,1)
print("x的维度为:",X_data.shape)
print("y的维度为:",Y_data.shape)
# 取其中一类序列
i = 0
X_data = X_data[:,i,:,:]
Y_data = Y_data[:,i,:,0]
print("x的维度为:",X_data.shape)
print("y的维度为:",Y_data.shape)

输出:
x的维度为: (1243, 1, 198, 3)
y的维度为: (1243, 1, 198, 1)
x的维度为: (1243, 198, 3)
y的维度为: (1243, 198)
###### SPLIT TRAIN TEST
from sklearn.model_selection import train_test_split

Xtr, Xte, ytr, yte = train_test_split(X_data, Y_data, 
                                    test_size=0.2, 
                                    random_state=0,
                                    shuffle=False)
print("X_train:{},y_train:{}".format(Xtr.shape,ytr.shape))
print("X_test:{},y_test:{}".format(Xte.shape,yte.shape))

输出:
X_train:(994, 198, 3),y_train:(994, 198)
X_test:(249, 198, 3),y_test:(249, 198)
# 标准化
yscaler = None
if standard_scaler:
    yscaler = StandardScaler()
elif log_scaler:
    yscaler = LogScaler()
elif mean_scaler:
    yscaler = MeanScaler()
if yscaler is not None:
    ytr = yscaler.fit_transform(ytr.reshape(-1,1)).reshape(-1,seq_len+num_obs_to_train)
Xtr=torch.as_tensor(torch.from_numpy(Xtr), dtype=torch.float32)
ytr=torch.as_tensor(torch.from_numpy(ytr),dtype=torch.float32)     
Xte=torch.as_tensor(torch.from_numpy(Xte), dtype=torch.float32)
yte=torch.as_tensor(torch.from_numpy(yte),dtype=torch.float32)

print("X_train:{},y_train:{}".format(Xtr.shape,ytr.shape))
print("X_test:{},y_test:{}".format(Xte.shape,yte.shape))

train_dataset=torch.utils.data.TensorDataset(Xtr,ytr) #训练集dataset
train_Loader=torch.utils.data.DataLoader(train_dataset,batch_size=batch_size)

输出:
X_train:torch.Size([994, 198, 3]),y_train:torch.Size([994, 198])
X_test:torch.Size([249, 198, 3]),y_test:torch.Size([249, 198])

Tren

Argumentos:

  • X (tipo matriz): forma (num_samples, num_periods, num_features)
  • y (tipo matriz): forma (num_samples, num_periods)
  • épocas (int): número de épocas a ejecutar
  • step_per_epoch (int): pasos por época a ejecutar
  • num_obs_to_train (int): la duración de la ventana del historial para el entrenamiento
  • seq_len (int): horizonte de salida
  • probabilidad (str): qué tipo de probabilidad usar, el valor predeterminado es gaussiano
  • num_skus_to_show (int): cuántos skus mostrar en la fase de prueba
  • num_results_to_sample (int): cuántas muestras en la fase de prueba como predicción
# 定义模型和优化器
num_ts, num_periods, num_features = X.shape
model = TPALSTM(input_size=Xtr.shape[2], output_horizon=seq_len, hidden_size=32, obs_len=num_obs_to_train, n_layers=1).to(device)
optimizer = Adam(model.parameters(), lr=lr)
random.seed(2)

losses = []
cnt = 0    
    
# training
print("开启训练")
progress = ProgressBar()
for epoch in progress(range(num_epoches)):
#     print("Epoch {} starts...".format(epoch))
    for x,y in train_Loader:
        x = x.to(device) # (batch_size, num_obs_to_train+seq_len, num_features) 
        y = y.to(device) # (batch_size, num_obs_to_train+seq_len)
        Xtrain = x[:,:num_obs_to_train,:].float() # (batch_size, num_obs_to_train, num_features)
        ytrain = y[:,:num_obs_to_train].float() # (batch_size, num_obs_to_train)
        Xf = x[:,-seq_len:,:].float() # (batch_size, seq_len, num_features)
        yf = y[:,-seq_len:].float() # (batch_size, seq_len)             
               
        ypred = model(Xtrain) # ypred:(batch_size, seq_len)
        
        loss = F.mse_loss(ypred, yf)
        
        losses.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        cnt += 1
        
# 绘制loss
if show_plot:
    plt.plot(range(len(losses)), losses, "k-")
    plt.xlabel("Period")
    plt.ylabel("Loss")
    plt.show()

Insertar descripción de la imagen aquí

# test 
print("开启测试")
X_test_sample = Xte[:,:,:].reshape(-1,num_obs_to_train+seq_len,num_features).to(device) # (num_samples, num_obs_to_train+seq_len, num_features)
y_test_sample = yte[:,:].reshape(-1,num_obs_to_train+seq_len).to(device) # (num_samples, num_obs_to_train+seq_len)

X_test = X_test_sample[:,:num_obs_to_train,:] # (num_samples, num_obs_to_train, num_features)
Xf_test = X_test_sample[:, -seq_len:, :] # (num_samples, seq_len, num_features)
y_test = y_test_sample[:, :num_obs_to_train] # (num_samples, num_obs_to_train)
yf_test = y_test_sample[:, -seq_len:] # (num_samples, seq_len)

ypred = model(X_test)
ypred = ypred.cpu().detach().numpy()
if yscaler is not None:
    ypred = yscaler.inverse_transform(ypred.reshape(-1,1)).reshape(-1,seq_len)
# ypred = ypred.ravel()
yf_test = yf_test.cpu().detach().numpy()
loss = np.sqrt(np.sum(np.square(yf_test - ypred)))
print("losses: ", loss)

输出:
开启测试
losses:  11473.168
i = -1
if show_plot: # 序列总长度为:历史窗口长度(num_obs_to_train)+预测长度(seq_len)
    plt.figure(1, figsize=(20, 5))
    plt.plot([k + seq_len + num_obs_to_train - seq_len for k in range(seq_len)], ypred[i,:], "r-") # 绘制50%分位数曲线
    plt.title('Prediction uncertainty')
    yplot = y_test_sample[i,:].cpu() #真实值 (1, seq_len+num_obs_to_train)
    plt.plot(range(len(yplot)), yplot, "k-")
    plt.legend(["P50 forecast", "P10-P90 quantile", "true"], loc="upper left")
    ymin, ymax = plt.ylim()
    plt.vlines(seq_len + num_obs_to_train - seq_len, ymin, ymax, color="blue", linestyles="dashed", linewidth=2)
    plt.ylim(ymin, ymax)
    plt.xlabel("Periods")
    plt.ylabel("Y")
    plt.show()

Insertar descripción de la imagen aquí

Supongo que te gusta

Origin blog.csdn.net/cyj972628089/article/details/131011259
Recomendado
Clasificación