I recently read a very interesting paper: Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case . This might be an interesting project to implement something similar from scratch to learn more about time series forecasting.
Prediction task:
In time series forecasting, the goal is to predict the future value of a time series based on its historical value. Some examples of time series forecasting tasks are as follows:
-
Predicting Influenza Prevalence Cases: Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case
-
Energy production forecasting: Energy consumption forecasting using a stacked non-parametric Bayesian approach
-
Weather Forecasting: MetNet: A Neural Weather Model for Precipitation Forecasting
For example, we could store energy consumption data for a city for several months and then train a model that would be able to predict the city's future energy consumption. This can be used to estimate energy demand, so energy companies can use the model to estimate the optimal value of energy that needs to be produced at any given time.
Source code & technology exchange
The source code, data, and technical exchange improvements of this project can all be obtained by adding the exchange group, which has more than 2,000 members. The best way to add notes is: source + interest direction, so that it is convenient to find like-minded friends
Method ①, add WeChat account: dkl88191, remarks: from CSDN + research direction
Method ②, WeChat search official account: Python learning and data mining, background reply: tf time series
Time Series Forecasting Example
The model we will use is an encoder-decoder Transformer, where the encoder part takes as input a historical time series, and the decoder part predicts future values in an autoregressive manner.
The decoder is connected with the encoder using an attention mechanism. In this way, the decoder can learn to "focus" on the most useful part of the historical values in the time series before making predictions.
The decoder uses masked self-attention so that the network cannot cheat during a training run by predicting future values in order to predict past values.
Encoder subnetwork:
Decoder subnetwork:
Full model:
Autoregressive encoding/decoding Transformer
This architecture can be built using PyTorch as follows:
encoder_layer = nn.TransformerEncoderLayer(
d_model=channels,
nhead=8,
dropout=self.dropout,
dim_feedforward=4 * channels,
)
decoder_layer = nn.TransformerDecoderLayer(
d_model=channels,
nhead=8,
dropout=self.dropout,
dim_feedforward=4 * channels,
)
self.encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=8)
self.decoder = torch.nn.TransformerDecoder(decoder_layer, num_layers=8)
data
Every time I implement a new method, I like to try it out on synthetic data first, so it's easier to understand and debug. This reduces data complexity and focuses more on the implementation/algorithm.
I wrote a small script that generates non-trivial time series with different periods, offsets and modes.
def generate_time_series(dataframe):
clip_val = random.uniform(0.3, 1)
period = random.choice(periods)
phase = random.randint(-1000, 1000)
dataframe["views"] = dataframe.apply(
lambda x: np.clip(
np.cos(x["index"] * 2 * np.pi / period + phase), -clip_val, clip_val
)
* x["amplitude"]
+ x["offset"],
axis=1,
) + np.random.normal(
0, dataframe["amplitude"].abs().max() / 10, size=(dataframe.shape[0],)
)
return dataframe
Generated time series example
The model is then trained on all these time series simultaneously:
training loss
result
We are now using this model to predict the future value of these time series with somewhat mixed results:
Incorrect
Examples of mispredictions
correct
Examples of correct predictions
The results weren't as good as I expected, especially given that it's usually easy to make good predictions on synthetic data, but they're still promising.
The model's predictions are a bit out of sync with slight amplitude overestimations for some bad examples. In good examples, the prediction is very close to reality, removing the noise.
I may need to debug my code a bit more and before optimizing the hyperparameters, I can expect better results.