Practical case: Using Transformer network for time series model prediction (with complete Python code)

I recently read a very interesting paper: Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case . This might be an interesting project to implement something similar from scratch to learn more about time series forecasting.

Prediction task:

In time series forecasting, the goal is to predict the future value of a time series based on its historical value. Some examples of time series forecasting tasks are as follows:

  • Predicting Influenza Prevalence Cases: Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case

  • Energy production forecasting: Energy consumption forecasting using a stacked non-parametric Bayesian approach

  • Weather Forecasting: MetNet: A Neural Weather Model for Precipitation Forecasting

For example, we could store energy consumption data for a city for several months and then train a model that would be able to predict the city's future energy consumption. This can be used to estimate energy demand, so energy companies can use the model to estimate the optimal value of energy that needs to be produced at any given time.

Source code & technology exchange

The source code, data, and technical exchange improvements of this project can all be obtained by adding the exchange group, which has more than 2,000 members. The best way to add notes is: source + interest direction, so that it is convenient to find like-minded friends

Method ①, add WeChat account: dkl88191, remarks: from CSDN + research direction
Method ②, WeChat search official account: Python learning and data mining, background reply: tf time series

Time Series Forecasting Example

The model we will use is an encoder-decoder Transformer, where the encoder part takes as input a historical time series, and the decoder part predicts future values ​​in an autoregressive manner.

The decoder is connected with the encoder using an attention mechanism. In this way, the decoder can learn to "focus" on the most useful part of the historical values ​​in the time series before making predictions.

The decoder uses masked self-attention so that the network cannot cheat during a training run by predicting future values ​​in order to predict past values.

Encoder subnetwork:

Decoder subnetwork:

Full model:

Autoregressive encoding/decoding Transformer

This architecture can be built using PyTorch as follows:

encoder_layer = nn.TransformerEncoderLayer(
    d_model=channels,
    nhead=8,
    dropout=self.dropout,
    dim_feedforward=4 * channels,
)
decoder_layer = nn.TransformerDecoderLayer(
    d_model=channels,
    nhead=8,
    dropout=self.dropout,
    dim_feedforward=4 * channels,
)

self.encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=8)
self.decoder = torch.nn.TransformerDecoder(decoder_layer, num_layers=8)

data

Every time I implement a new method, I like to try it out on synthetic data first, so it's easier to understand and debug. This reduces data complexity and focuses more on the implementation/algorithm.

I wrote a small script that generates non-trivial time series with different periods, offsets and modes.

def generate_time_series(dataframe):

    clip_val = random.uniform(0.3, 1)

    period = random.choice(periods)

    phase = random.randint(-1000, 1000)

    dataframe["views"] = dataframe.apply(
        lambda x: np.clip(
            np.cos(x["index"] * 2 * np.pi / period + phase), -clip_val, clip_val
        )
        * x["amplitude"]
        + x["offset"],
        axis=1,
    ) + np.random.normal(
        0, dataframe["amplitude"].abs().max() / 10, size=(dataframe.shape[0],)
    )

    return dataframe

Generated time series example

The model is then trained on all these time series simultaneously:

training loss

result

We are now using this model to predict the future value of these time series with somewhat mixed results:

Incorrect

Examples of mispredictions

correct

Examples of correct predictions

The results weren't as good as I expected, especially given that it's usually easy to make good predictions on synthetic data, but they're still promising.

The model's predictions are a bit out of sync with slight amplitude overestimations for some bad examples. In good examples, the prediction is very close to reality, removing the noise.

I may need to debug my code a bit more and before optimizing the hyperparameters, I can expect better results.

Guess you like

Origin blog.csdn.net/m0_59596937/article/details/128424817#comments_27596120
Recommended