[Reading Notes] DeepAR for Probabilistic Prediction (including Pytorch code implementation)

This article is a summary and reflection after reading the paper. It does not involve the translation of the paper and the interpretation of the model. It is suitable for everyone to exchange ideas after reading the paper. For the translation of the paper, you can check the references. Paper address: https://arxiv.org/abs/1704.04110

1. Summary of the full text

In this paper we propose DeepAR, a method for producing accurate probabilistic predictions based on training an autoregressive recurrent network model on a large number of relevant time series . This article shows how many of the challenges faced by widely used classical methods can be overcome by applying deep learning techniques to probabilistic predictions . Extensive empirical evaluation on several real-world prediction datasets shows an accuracy improvement of around 15% compared to state-of-the-art methods.
Insert image description here

2. Research methods

  1. First, assume the distribution of the prediction data. For example, for continuous data , it is assumed that the distribution is Gaussian distribution , for count data, it is assumed to be negative binomial distribution , and there are corresponding distributions for other types of data.

Gaussian distribution:
Insert image description here
negative binomial distribution:
Insert image description here

  1. An RNN architecture for probabilistic prediction is proposed to predict the parameters of the distribution of predicted values ​​and update the model parameters by minimizing the negative log-likelihood function .

Insert image description here
It is worth noting that during the experiment, since the encoder model was the same as the decoder, the historical sequence of training was included when calculating the loss, that is, t 0 t_0t0=0。

  1. To solve the problem of inconsistent scales of multiple time series, a scale factor is introduced to adjust the scale; to solve the problem of inconsistent numbers of time series at different scales, non-uniform sampling is performed during training based on the scale factor .

The prediction performance of the model is shown in the table below. The three data sets are all count data (positive count-valued). The assumption of negative binomial distribution is more accurate, so the performance of rnn-gaussian is very poor. The difference between rnn-negbin and DeepAR is whether scale factor and non-uniform sampling are used. It can be seen that the performance of the two is similar in the data set without scale-gap in the data, and DeepAR has the best performance in the scale-gap data set.
Insert image description here

3. Conclusion

Prediction methods based on modern deep learning techniques can significantly improve the prediction accuracy of state-of-the-art prediction methods on various data sets. Our proposed DeepAR model effectively learns global models from correlated time series , handles the problem of scale inconsistency in sequences through scale factor scaling and scale-based non-uniform sampling , generates calibrated probabilistic predictions with high accuracy, and is able to learn from data Complex patterns such as seasonality and uncertainty grow over time.

4. Innovation points

  1. Compared with traditional RNN, this article predicts the probability distribution parameters of each time point to obtain the probability density function of each time point, and optimizes the model parameters through the maximum likelihood method.
  2. DeepAR makes probabilistic predictions in the form of Monte Carlo samples , which can be used to calculate consistent quantile estimates for all subranges within the prediction horizon .
  3. For the problem of inconsistent scales in multiple time series, the traditional method is to standardize or group prediction during data preprocessing. This paper uses the method of introducing scale factors and non-uniform weighted sampling to solve this problem.
  4. It supports direct input of multiple time series , and can learn the common characteristics of similar time series through the Embedding layer , which facilitates processing of massive time series, and can handle prediction problems when there are missing values ​​in the series .

5. Thinking

DeepAR is a breakthrough that combines deep learning with probabilistic prediction. However, DeepAR is a parameter prediction in probabilistic prediction, and the advance assumptions about the distribution of predicted values ​​will have a great impact on the prediction accuracy.

6. References

  1. DeepAR: Autoregressive recurrent network for time series probability prediction
  2. [Time Series] DeepAR Probabilistic Prediction Model Paper Notes
  3. [Intensive reading of the paper] DeepAR: Using autoregressive RNN to predict time series probability distribution⭐

7. Pytorch implementation⭐

The following code reference: https://github.com/jingw2/demand_forecast , some errors in the original code have been corrected, and some necessary comments have been added for better understanding.

import torch 
from torch import nn
import torch.nn.functional as F 
from torch.optim import Adam

import numpy as np
import math
import os
import random
import matplotlib.pyplot as plt
import pickle
from tqdm import tqdm
import pandas as pd
from sklearn.preprocessing import StandardScaler
from datetime import date
import argparse
from progressbar import *

util(tool function)

def get_data_path():
    folder = os.path.dirname(__file__)
    return os.path.join(folder, "data")

def RSE(ypred, ytrue):
    rse = np.sqrt(np.square(ypred - ytrue).sum()) / \
            np.sqrt(np.square(ytrue - ytrue.mean()).sum())
    return rse

def quantile_loss(ytrue, ypred, qs):
    '''
    Quantile loss version 2
    Args:
    ytrue (batch_size, output_horizon)
    ypred (batch_size, output_horizon, num_quantiles)
    '''
    L = np.zeros_like(ytrue)
    for i, q in enumerate(qs):
        yq = ypred[:, :, i]
        diff = yq - ytrue
        L += np.max(q * diff, (q - 1) * diff)
    return L.mean()

def SMAPE(ytrue, ypred):
    ytrue = np.array(ytrue).ravel()
    ypred = np.array(ypred).ravel() + 1e-4
    mean_y = (ytrue + ypred) / 2.
    return np.mean(np.abs((ytrue - ypred) \
        / mean_y))

def MAPE(ytrue, ypred):
    ytrue = np.array(ytrue).ravel() + 1e-4
    ypred = np.array(ypred).ravel()
    return np.mean(np.abs((ytrue - ypred) \
        / ytrue))

def train_test_split(X, y, train_ratio=0.7):
    '''
    - X (array like): shape (num_samples, num_periods, num_features)
    - y (array like): shape (num_samples, num_periods)
    '''
    num_ts, num_periods, num_features = X.shape
    train_periods = int(num_periods * train_ratio)
    random.seed(2)
    Xtr = X[:, :train_periods, :]
    ytr = y[:, :train_periods]
    Xte = X[:, train_periods:, :]
    yte = y[:, train_periods:]
    return Xtr, ytr, Xte, yte

class StandardScaler:
    
    def fit_transform(self, y):
        self.mean = np.mean(y)
        self.std = np.std(y) + 1e-4
        return (y - self.mean) / self.std
    
    def inverse_transform(self, y):
        return y * self.std + self.mean

    def transform(self, y):
        return (y - self.mean) / self.std

class MaxScaler:

    def fit_transform(self, y):
        self.max = np.max(y)
        return y / self.max
    
    def inverse_transform(self, y):
        return y * self.max

    def transform(self, y):
        return y / self.max


class MeanScaler:
    
    def fit_transform(self, y):
        self.mean = np.mean(y)
        return y / self.mean
    
    def inverse_transform(self, y):
        return y * self.mean

    def transform(self, y):
        return y / self.mean

class LogScaler:

    def fit_transform(self, y):
        return np.log1p(y)
    
    def inverse_transform(self, y):
        return np.expm1(y)

    def transform(self, y):
        return np.log1p(y)


def gaussian_likelihood_loss(z, mu, sigma):
    '''
    Gaussian Liklihood Loss
    Args:
    z (tensor): true observations, shape (num_ts, num_periods)
    mu (tensor): mean, shape (num_ts, num_periods)
    sigma (tensor): standard deviation, shape (num_ts, num_periods)
    likelihood: 
    (2 pi sigma^2)^(-1/2) exp(-(z - mu)^2 / (2 sigma^2))
    log likelihood:
    -1/2 * (log (2 pi) + 2 * log (sigma)) - (z - mu)^2 / (2 sigma^2)
    '''
    negative_likelihood = torch.log(sigma + 1) + (z - mu) ** 2 / (2 * sigma ** 2) + 6
    return negative_likelihood.mean()

def negative_binomial_loss(ytrue, mu, alpha):
    '''
    Negative Binomial Sample
    Args:
    ytrue (array like)
    mu (array like)
    alpha (array like)
    maximuze log l_{nb} = log Gamma(z + 1/alpha) - log Gamma(z + 1) - log Gamma(1 / alpha)
                - 1 / alpha * log (1 + alpha * mu) + z * log (alpha * mu / (1 + alpha * mu))
    minimize loss = - log l_{nb}
    Note: torch.lgamma: log Gamma function
    '''
    batch_size, seq_len = ytrue.size()
    likelihood = torch.lgamma(ytrue + 1. / alpha) - torch.lgamma(ytrue + 1) - torch.lgamma(1. / alpha) \
        - 1. / alpha * torch.log(1 + alpha * mu) \
        + ytrue * torch.log(alpha * mu / (1 + alpha * mu))
    return - likelihood.mean()

def batch_generator(X, y, num_obs_to_train, seq_len, batch_size):
    '''
    Args:
    X (array like): shape (num_samples, train_periods, num_features)
    y (array like): shape (num_samples, train_periods)
    num_obs_to_train (int): 训练的历史窗口长度
    seq_len (int): sequence/encoder/decoder length
    batch_size (int)
    '''
    num_ts, num_periods, _ = X.shape
    if num_ts < batch_size:
        batch_size = num_ts
    t = random.choice(range(num_obs_to_train, num_periods-seq_len)) # 从num_obs_to_train和num_periods-seq_len-1之间随机选一个整数,作为预测点
    batch = random.sample(range(num_ts), batch_size) # 从num_ts条数据中随机选择batch_size条
    X_train_batch = X[batch, t-num_obs_to_train:t, :] # (batch_size, num_obs_to_train, num_features)
    y_train_batch = y[batch, t-num_obs_to_train:t] # (batch_size, num_obs_to_train)
    Xf = X[batch, t:t+seq_len, :] # (batch_size, seq_len, num_features)
    yf = y[batch, t:t+seq_len] # (batch_size, seq_len)
    return X_train_batch, y_train_batch, Xf, yf

Model

class Gaussian(nn.Module):

    def __init__(self, hidden_size, output_size):
        '''
        Gaussian Likelihood Supports Continuous Data
        Args:
        input_size (int): hidden h_{i,t} column size
        output_size (int): embedding size
        '''
        super(Gaussian, self).__init__()
        self.mu_layer = nn.Linear(hidden_size, output_size)
        self.sigma_layer = nn.Linear(hidden_size, output_size)

        # initialize weights
        # nn.init.xavier_uniform_(self.mu_layer.weight)
        # nn.init.xavier_uniform_(self.sigma_layer.weight)
    
    def forward(self, h): # h为神经网络隐藏层输出 (batch, hidden_size)
        _, hidden_size = h.size()
        sigma_t = torch.log(1 + torch.exp(self.sigma_layer(h))) + 1e-6
        mu_t = self.mu_layer(h)
        return mu_t, sigma_t # (batch, output_size)

class NegativeBinomial(nn.Module):

    def __init__(self, input_size, output_size):
        '''
        Negative Binomial Supports Positive Count Data
        Args:
        input_size (int): hidden h_{i,t} column size
        output_size (int): embedding size
        '''
        super(NegativeBinomial, self).__init__()
        self.mu_layer = nn.Linear(input_size, output_size)
        self.sigma_layer = nn.Linear(input_size, output_size)
    
    def forward(self, h): # h为神经网络隐藏层输出 (batch, hidden_size)
        _, hidden_size = h.size()
        alpha_t = torch.log(1 + torch.exp(self.sigma_layer(h))) + 1e-6
        mu_t = torch.log(1 + torch.exp(self.mu_layer(h)))
        return mu_t, alpha_t # (batch, output_size)

def gaussian_sample(mu, sigma):
    '''
    Gaussian Sample
    Args:
    ytrue (array like)
    mu (array like) # (num_ts, 1)
    sigma (array like): standard deviation # (num_ts, 1)
    gaussian maximum likelihood using log 
        l_{G} (z|mu, sigma) = (2 * pi * sigma^2)^(-0.5) * exp(- (z - mu)^2 / (2 * sigma^2))
    '''
    # likelihood = (2 * np.pi * sigma ** 2) ** (-0.5) * \
    #         torch.exp((- (ytrue - mu) ** 2) / (2 * sigma ** 2))
    # return likelihood
    gaussian = torch.distributions.normal.Normal(mu, sigma)
    ypred = gaussian.sample()
    return ypred # (num_ts, 1)

def negative_binomial_sample(mu, alpha):
    '''
    Negative Binomial Sample
    Args:
    ytrue (array like)
    mu (array like)
    alpha (array like)
    maximuze log l_{nb} = log Gamma(z + 1/alpha) - log Gamma(z + 1) - log Gamma(1 / alpha)
                - 1 / alpha * log (1 + alpha * mu) + z * log (alpha * mu / (1 + alpha * mu))
    minimize loss = - log l_{nb}
    Note: torch.lgamma: log Gamma function
    '''
    var = mu + mu * mu * alpha
    ypred = mu + torch.randn() * torch.sqrt(var)
    return ypred
class DeepAR(nn.Module):

    def __init__(self, input_size, embedding_size, hidden_size, num_layers, lr=1e-3, likelihood="g"):
        super(DeepAR, self).__init__()

        # network
        self.input_embed = nn.Linear(1, embedding_size)
        self.encoder = nn.LSTM(embedding_size+input_size, hidden_size, \
                num_layers, bias=True, batch_first=True)
        if likelihood == "g":
            self.likelihood_layer = Gaussian(hidden_size, 1)
        elif likelihood == "nb":
            self.likelihood_layer = NegativeBinomial(hidden_size, 1)
        self.likelihood = likelihood
    
    def forward(self, X, y, Xf):
        '''
        Args:
        num_time_series = batch_size
        X (array like): shape (num_time_series, num_obs_to_train, num_features)
        y (array like): shape (num_time_series, num_obs_to_train)
        Xf (array like): shape (num_time_series, seq_len, num_features)
        Return:
        mu (array like): shape (num_time_series, num_obs_to_train + seq_len)
        sigma (array like): shape (num_time_series, num_obs_to_train + seq_len)
        '''
        if isinstance(X, type(np.empty(2))): # 转换为tensor
            X = torch.from_numpy(X).float()
            y = torch.from_numpy(y).float()
            Xf = torch.from_numpy(Xf).float()
        num_ts, num_obs_to_train, _ = X.size()
        _, seq_len, num_features = Xf.size()
        ynext = None
        ypred = []
        mus = []
        sigmas = []
        h, c = None, None
        # 遍历所有时间点
        for s in range(num_obs_to_train + seq_len): # num_obs_to_train为历史序列长度,seq_len为预测长度
            if s < num_obs_to_train: # Encoder,ynext为真实值
                if s == 0: ynext = torch.zeros((num_ts,1)).to(device)
                else: ynext = y[:, s-1].view(-1, 1) # (num_ts,1) # 取上一时刻的真实值
                yembed = self.input_embed(ynext).view(num_ts, -1) # (num_ts,embedding_size)
                x = X[:, s, :].view(num_ts, -1) # (num_ts,num_features)
            else: # Decoder,ynext为预测值
                if s == num_obs_to_train: ynext = y[:, s-1].view(-1, 1) # (num_ts,1) # 预测的第一个时间点取上一时刻的真实值
                yembed = self.input_embed(ynext).view(num_ts, -1) # (num_ts,embedding_size)
                x = Xf[:, s-num_obs_to_train, :].view(num_ts, -1) # (num_ts,num_features)
            x = torch.cat([x, yembed], dim=1) # (num_ts, num_features + embedding)
            inp = x.unsqueeze(1) # (num_ts,1, num_features + embedding)
            
            if h is None and c is None:
                out, (h, c) = self.encoder(inp) # h size (num_layers, num_ts, hidden_size)
            else:
                out, (h, c) = self.encoder(inp, (h, c))
            hs = h[-1, :, :] # (num_ts, hidden_size)
            hs = F.relu(hs) # (num_ts, hidden_size)
            mu, sigma = self.likelihood_layer(hs)  # (num_ts, 1)
            mus.append(mu.view(-1, 1))
            sigmas.append(sigma.view(-1, 1))
            if self.likelihood == "g":
                ynext = gaussian_sample(mu, sigma) #(num_ts, 1)
            elif self.likelihood == "nb":
                alpha_t = sigma
                mu_t = mu
                ynext = negative_binomial_sample(mu_t, alpha_t) #(num_ts, 1)
            # if without true value, use prediction
            if s >= num_obs_to_train and s < num_obs_to_train + seq_len: #在预测区间内
                ypred.append(ynext)
        ypred = torch.cat(ypred, dim=1).view(num_ts, -1) #(num_ts, seq_len)
        mu = torch.cat(mus, dim=1).view(num_ts, -1) #(num_ts, num_obs_to_train + seq_len)
        sigma = torch.cat(sigmas, dim=1).view(num_ts, -1) #(num_ts, num_obs_to_train + seq_len)
        return ypred, mu, sigma

Load Data

num_epoches = 100
step_per_epoch = 3 #在一个epoch中,从训练集中提取step_per_epoch次训练数据
lr = 1e-3
n_layers = 1
hidden_size = 50
embedding_size = 10 #将上一时刻的真实值编码为embedding_size长度
likelihood = "g"
seq_len = 60 #预测的未来窗口长度
num_obs_to_train = 168  #训练的历史窗口长度
num_results_to_sample = 10
show_plot = True
run_test = True
standard_scaler = True
log_scaler = False
mean_scaler = False
batch_size = 64
sample_size = 100

device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 读取数据
data = pd.read_csv("LD_MT200_hour.csv", parse_dates=["date"])
data["year"] = data["date"].apply(lambda x: x.year)
data["day_of_week"] = data["date"].apply(lambda x: x.dayofweek)
data = data.loc[(data["date"].dt.date >= date(2014, 1, 1)) & (data["date"].dt.date <= date(2014, 3, 1))]
print(data.shape)
plt.figure(figsize=(16, 4)) 
plt.plot(data['MT_200'])
data.head()

Insert image description here

# 数据预处理
features = ["hour", "day_of_week"]
# hours = pd.get_dummies(data["hour"])
# dows = pd.get_dummies(data["day_of_week"])
years = data["year"]
hours = data["hour"]
dows = data["day_of_week"]
# MT_200 = data["MT_200"]
# yscaler = StandardScaler()
# MT_200 = yscaler.fit_transform(MT_200)
X = np.c_[np.asarray(hours),np.asarray(dows)] #X:(len,features)
num_features = X.shape[1]
num_periods = len(data)
X = np.asarray(X).reshape((-1, num_periods, num_features))
y = np.asarray(data["MT_200"]).reshape((-1, num_periods))
print("X_shape=",X.shape) # (series_num,len,features_num)
print("y_shape=",y.shape) # (series_num,len)
# X = np.tile(X, (10, 1, 1))
# y = np.tile(y, (10, 1))

输出:
X_shape= (1, 1440, 2)
y_shape= (1, 1440)
# 滑动窗口
def sliding_window(DataSet, width, multi_vector = True): #DataSet has to be as an Array
    if multi_vector: #三维 (num_samples,length,features)
        num_samples,length,features = DataSet.shape
    else: #二维 (num_samples,length)
        DataSet = DataSet[:,:,np.newaxis] #(num_samples,length,1)
        num_samples,length,features = DataSet.shape

    x = DataSet[:,0:width,:] #(num_samples,width,features)
    x = x[np.newaxis,:,:,:] #(1,num_samples,width,features)
    for i in range(length - width):
        i += 1
        tmp = DataSet[:,i:i + width,:]#(num_samples,width,features)
        tmp = tmp[np.newaxis,:,:,:] #(1,num_samples,width,features)
        x = np.concatenate([x,tmp],0) #(i+1,num_samples,width,features)
    return x
    
width = num_obs_to_train + seq_len 
X_data = sliding_window(X, width, multi_vector = True) #(len-width+1,num_samples,width,features)
Y_data = sliding_window(y, width, multi_vector = False) #(len-width+1,num_samples,width,1)
print("x的维度为:",X_data.shape)
print("y的维度为:",Y_data.shape)
# 取其中一类序列
i = 0
X_data = X_data[:,i,:,:]
Y_data = Y_data[:,i,:,0]
print("x的维度为:",X_data.shape)
print("y的维度为:",Y_data.shape)

输出:
x的维度为: (1213, 1, 228, 2)
y的维度为: (1213, 1, 228, 1)
x的维度为: (1213, 228, 2)
y的维度为: (1213, 228)
# SPLIT TRAIN TEST
from sklearn.model_selection import train_test_split

Xtr, Xte, ytr, yte = train_test_split(X_data, Y_data, 
                                    test_size=0.3, 
                                    random_state=0,
                                    shuffle=False)
print("X_train:{},y_train:{}".format(Xtr.shape,ytr.shape))
print("X_test:{},y_test:{}".format(Xte.shape,yte.shape))

输出:
X_train:(849, 228, 2),y_train:(849, 228)
X_test:(364, 228, 2),y_test:(364, 228)
# 标准化
yscaler = None
if standard_scaler:
    yscaler = StandardScaler()
elif log_scaler:
    yscaler = LogScaler()
elif mean_scaler:
    yscaler = MeanScaler()
if yscaler is not None:
    ytr = yscaler.fit_transform(ytr)
#构造Dtaloader
Xtr=torch.as_tensor(torch.from_numpy(Xtr), dtype=torch.float32)
ytr=torch.as_tensor(torch.from_numpy(ytr),dtype=torch.float32)     
Xte=torch.as_tensor(torch.from_numpy(Xte), dtype=torch.float32)
yte=torch.as_tensor(torch.from_numpy(yte),dtype=torch.float32)

train_dataset=torch.utils.data.TensorDataset(Xtr,ytr) #训练集dataset
train_Loader=torch.utils.data.DataLoader(train_dataset,batch_size=batch_size)

Train

Args:

  • X (array like): shape (num_samples, num_periods, num_features)
  • y (array like): shape (num_samples, num_periods)
  • epochs (int): number of epochs to run
  • step_per_epoch (int): steps per epoch to run
  • num_obs_to_train (int): The length of the history window for training
  • seq_len (int): output horizon
  • likelihood (str): what type of likelihood to use, default is gaussian
  • num_skus_to_show (int): how many skus to show in test phase
  • num_results_to_sample (int): how many samples in test phase as prediction
# 定义模型和优化器
num_ts, num_periods, num_features = X.shape
model = DeepAR(num_features, embedding_size, 
    hidden_size, n_layers, lr, likelihood).to(device)
optimizer = Adam(model.parameters(), lr=lr)
random.seed(2)

losses = []
cnt = 0    
    
# training
print("开启训练")
progress = ProgressBar()
for epoch in progress(range(num_epoches)):
#     print("Epoch {} starts...".format(epoch))
    for x,y in train_Loader:
        x = x.to(device) # (batch_size, num_obs_to_train+seq_len, num_features) 
        y = y.to(device) # (batch_size, num_obs_to_train+seq_len)
        Xtrain = x[:,:num_obs_to_train,:].float() # (batch_size, num_obs_to_train, num_features)
        ytrain = y[:,:num_obs_to_train].float() # (batch_size, num_obs_to_train)
        Xf = x[:,-seq_len:,:].float() # (batch_size, seq_len, num_features)
        yf = y[:,-seq_len:].float() # (batch_size, seq_len)             
               
        ypred, mu, sigma = model(Xtrain, ytrain, Xf) # ypred:(batch_size, seq_len), mu&sigma:(batch_size, num_obs_to_train + seq_len)
        
        # ypred_rho = ypred
        # e = ypred_rho - yf
        # loss = torch.max(rho * e, (rho - 1) * e).mean()
        ## gaussian loss       
        ytrain = torch.cat([ytrain, yf], dim=1) # (batch_size, num_obs_to_train+seq_len) 
        if likelihood == "g":
            loss = gaussian_likelihood_loss(ytrain, mu, sigma)
        elif likelihood == "nb":
            loss = negative_binomial_loss(ytrain, mu, sigma)
        losses.append(loss.item())
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        cnt += 1
        
# 绘制loss
if show_plot:
    plt.plot(range(len(losses)), losses, "k-")
    plt.xlabel("Period")
    plt.ylabel("Loss")
    plt.show()

Insert image description here

Test

# test 
print("开启测试")
X_test_sample = Xte[:,:,:].reshape(-1,num_obs_to_train+seq_len,num_features).to(device) # (num_samples, num_obs_to_train+seq_len, num_features)
y_test_sample = yte[:,:].reshape(-1,num_obs_to_train+seq_len).to(device) # (num_samples, num_obs_to_train+seq_len)

X_test = X_test_sample[:,:num_obs_to_train,:] # (num_samples, num_obs_to_train, num_features)
Xf_test = X_test_sample[:, -seq_len:, :] # (num_samples, seq_len, num_features)
y_test = y_test_sample[:, :num_obs_to_train] # (num_samples, num_obs_to_train)
yf_test = y_test_sample[:, -seq_len:] # (num_samples, seq_len)
if yscaler is not None:
    y_test = yscaler.transform(y_test)
result = []
n_samples = sample_size # 采样个数
for _ in tqdm(range(n_samples)):
    y_pred, _, _ = model(X_test, y_test, Xf_test) # ypred:(num_samples, seq_len)
    y_pred = y_pred.cpu().numpy()
    if yscaler is not None:
        y_pred = yscaler.inverse_transform(y_pred)
    result.append(y_pred[:,:,np.newaxis]) # y_pred[:,:,np.newaxis]:(num_samples, seq_len,1)
#     result.append(y_pred.reshape((-1, 1)))

result = np.concatenate(result, axis=2) # (num_samples, seq_len, n_samples)
p50 = np.quantile(result, 0.5, axis=2) # (num_samples, seq_len)
p90 = np.quantile(result, 0.9, axis=2) # (num_samples, seq_len)
p10 = np.quantile(result, 0.1, axis=2) # (num_samples, seq_len)

Insert image description here

i = -1 #选取其中一条序列进行可视化
if show_plot: # 序列总长度为:历史窗口长度(num_obs_to_train)+预测长度(seq_len)
    plt.figure(1, figsize=(20, 5))
    plt.plot([k + seq_len + num_obs_to_train - seq_len for k in range(seq_len)], p50[i,:], "r-") # 绘制50%分位数曲线
    # 绘制10%-90%分位数阴影
    plt.fill_between(x=[k + seq_len + num_obs_to_train - seq_len for k in range(seq_len)], y1=p10[i,:], y2=p90[i,:], alpha=0.5)
    plt.title('Prediction uncertainty')
    yplot = y_test_sample[i,:].cpu() #真实值 (1, seq_len+num_obs_to_train)
    plt.plot(range(len(yplot)), yplot, "k-")
    plt.legend(["P50 forecast", "P10-P90 quantile", "true"], loc="upper left")
    ymin, ymax = plt.ylim()
    plt.vlines(seq_len + num_obs_to_train - seq_len, ymin, ymax, color="blue", linestyles="dashed", linewidth=2)
    plt.ylim(ymin, ymax)
    plt.xlabel("Periods")
    plt.ylabel("Y")
    plt.show()

Insert image description here

#评价指标
pred_up  = p90[:,:].reshape(-1,seq_len) #(num_samples,seq_len)
pred_mid = p50[:,:].reshape(-1,seq_len) #(num_samples,seq_len)
pred_low = p10[:,:].reshape(-1,seq_len) #(num_samples,seq_len)
true = yf_test.cpu().detach().numpy()[:,:].reshape(-1,seq_len) #(num_samples,seq_len)
test_samples, seq_len = true.shape
u = 0.9-0.1

# 1. PICP(PI coverage probability) 要求大于等于区间分位数
PICP = 0
for i in range(test_samples):
    count = 0
    for j in range(seq_len):
        if true[i,j] > pred_low[i,j] and true[i,j] < pred_up[i,j]:
            count += 1
    picp = count / seq_len
    PICP += picp
PICP = PICP / test_samples
print("PICP:",PICP)

# 2. PINAW(PI normalized averaged width) 用于定义区间的狭窄程度,在保证准确性的前提下越小越好
PINAW = 0
for i in range(test_samples):
    width = 0
    true_max = np.max(true[i,:])
    true_min = np.min(true[i,:])
    for j in range(seq_len):
        width += (pred_up[i,j]-pred_low[i,j])
    width /= seq_len
    pinaw = (width / (true_max-true_min))
    PINAW += pinaw
PINAW = PINAW / test_samples
print("PINAW:",PINAW)

# 3. CWC(coverage width-based criterion) 综合考虑区间覆盖率和狭窄程度, 越小越好
g = 90 #取值在50-100
error = math.exp(-g * (PICP - u))
if PICP >= u: 
    r = 0
else:
    r = 1
CWC = PINAW * (1 + r * error)
print("CWC:",CWC)

# 4. CRPS(continuous ranked probability score) 综合评价指标,量化一个连续概率分布(理论值)与确定性观测样本(真实值)间的差异,可视为平均绝对误差(MAE)在连续概率分布上的推广
# https://avoid.overfit.cn/post/302f7305a414449a9eb2cfa628d15853
def crps(y_true, y_pred, sample_weight=None):
    num_samples = y_pred.shape[0]
    absolute_error = np.mean(np.abs(y_pred - y_true), axis=0)
    if num_samples == 1:
        return np.average(absolute_error, weights=sample_weight)
    y_pred = np.sort(y_pred, axis=0) #(3,60)
    diff = y_pred[1:] - y_pred[:-1] #一阶差分
    weight = np.arange(1, num_samples) * np.arange(num_samples - 1, 0, -1)
    weight = np.expand_dims(weight, -1)
    per_obs_crps = absolute_error - np.sum(diff * weight, axis=0) / num_samples**2
    return np.average(per_obs_crps, weights=sample_weight)
CRPS = 0
for i in range(test_samples):
    y_pred = np.concatenate([pred_up[i,None,:],pred_mid[i,None,:],pred_low[i,None,:]],axis=0) #(3,60)
    y_true = true[i,:] #(1,60)
    c = crps(y_true,y_pred)
    CRPS += c
CRPS = CRPS / test_samples
print("CRPS:",CRPS)

# 5. P50 quantile MAE 
MAE = 0 
for i in range(test_samples):
    error = 0
    for j in range(seq_len):
        error += np.abs(true[i,j]-pred_mid[i,j])
    mae = error / seq_len
    MAE += mae
MAE = MAE / test_samples
print("P50 quantile MAE:",MAE)

# 6. # P50 quantile MAPE
MAPe = 0
for i in range(test_samples):
    mape = MAPE(true[i,:], pred_mid[i,:])
    MAPe += mape
MAPe = MAPe / test_samples
print("P50 quantile MAPE: {}".format(MAPe))

输出:
PICP: 0.9852106227106228
PINAW: 0.23486948616378414
CWC: 0.23486948616378414
CRPS: 96.34482641472391
P50 quantile MAE: 86.1149369879084
P50 quantile MAPE: 0.034020405319259976

Guess you like

Origin blog.csdn.net/cyj972628089/article/details/130836308