[Data Mining] Time and Sequence Prediction Using LSTM

1. Description

        Every day, humans make passive predictions when performing tasks such as crossing a street, estimating the speed and distance to a car, or catching a ball by guessing its speed and positioning their hands accordingly. These skills are acquired through experience and practice. However, forecasting complex phenomena such as the weather or the economy can be difficult due to the many variables involved. Time and sequence forecasting is used in this context, relying on historical data and mathematical models to make predictions about future trends and patterns. In this article, we will see an example of using mathematical concepts to make predictions using the airline dataset.

 

2. Part 1:

2.1 Mathematical concepts

        In the context of the time series forecasting algorithm used in this paper, instead of manually calculating the slope and intercept of a line, the algorithm uses a neural network with LSTM layers to learn the underlying patterns and relationships in the time series data. Neural networks are trained on a portion of the data and then used to make predictions on the remainder. In this algorithm, the prediction of the next time step is based on the time steps of the previous n_inputs, similar to the concept of predicting y(T+1) using y(t) in the linear regression example. However, the predictions in this algorithm are not generated using simple linear equations, but are generated using the activation function of the LSTM layer. Activation functions allow the model to capture non-linear relationships in the data, making it more effective at capturing complex patterns in time series data.

2.2 Activation function

Photography: @learnwithutsav

        The activation function used in the LSTM model is the Rectified Linear Unit (ReLU) activation function. This activation function is often used in deep learning models because it is simple and effective in dealing with the vanishing gradient problem. In LSTM models, a ReLU activation function is applied to the output of each LSTM unit to introduce non-linearities in the model and allow it to learn complex patterns in the data. The ReLU function has a simple thresholding behavior where any negative input is mapped to zero and any positive input is passed through unchanged, making it computationally efficient.

3. Part 2:

3.1 Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('airline-passengers.csv', index_col='Month', parse_dates=True)
df.index.freq = 'MS'
df.shape
df.columns
plt.figure(figsize=(20, 4))
plt.plot(df.Passengers, linewidth=2)
plt.show()

        The code imports three important libraries: numpy, pandas, and matplotlib. The pandas library is used to read the "airline passengers.csv" file and set the "month" column as an index, allowing the data to be analyzed over time. The code then uses the matplotlib library to create a line graph showing the number of airline passengers over time. Finally, display the plot using the "plt.show" function. This code is useful for anyone interested in analyzing time series data, and it demonstrates how to use pandas and matplotlib to visualize data trends.

nobs = 12
df_train = df.iloc[:-nobs]
df_test = df.iloc[-nobs:]
df_train.shape
df_test.shape

        This code creates two new dataframes "df_train" and "df_test" by splitting the existing time series dataframe "df" into train and test sets. The "nobs" variable is set to 12, which means that the last 12 observations of "df" will be used for testing, while the rest of the data will be used for training. The training set is stored in "df_train" and consists of all but the last 12 rows of "df", while the test set is stored in "df_test" and consists of only the last 12 rows of "df". The "shape" attribute is then used to print the number of rows and columns in each data frame, thus confirming that the split was done correctly. This code can be used to prepare time series data for modeling and testing purposes by splitting the time series data into two groups.

3.2 Model Architecture

Image credit: @learnwithutsav

from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(df_train)
scaled_train = scaler.transform(df_train)
scaled_test = scaler.transform(df_test)
n_inputs = 12
n_features = 1
generator = TimeseriesGenerator(scaled_train, scaled_train, length = n_inputs, batch_size =1)

for i in range(len(generator)):
    X, y = generator[i]
    print(f' \n {X.flatten()} and {y}')

        This code snippet demonstrates how to use Keras' "TimeseriesGenerator" class and scikit-learn's "MinMaxScaler" class to generate input and output arrays for a time series forecasting model. The code first creates an instance of the "MinMaxScaler" class and fits it to the training dataset ("df_train") in order to scale the data. The scaled data is then stored in the "scaled_train" and "scaled_test" dataframes. The number of time steps ('n_inputs") is set to 12 and the number of features ('n_features') is set to 1. Create a "TimeseriesGenerator" object using the "scaled_train" data with a window length of "n_inputs" and a batch size of 1. Finally, the loop Used to iterate over the "generator" object and print out the input and output arrays for each time step. The "x" and "y" variables represent the input and output arrays for each time step, respectively. The "flatten()" method is used to Converts the input array to a 1D array for easy printing.Overall, this code is useful for preparing time series data for forecasting models using the sliding window method.

X.shape

This code returns the shape of the array or matrix "X". The "shape" attribute is an attribute of NumPy arrays and returns a tuple representing the dimensions of the array. The code doesn't provide any other context, so it's not clear what the shape of the "X" is. The output will be in the following format (rows, columns).

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

model = Sequential()
model.add(LSTM(200, activation='relu', input_shape = (n_inputs, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

model.summary()

        This code demonstrates how to use Keras to create an LSTM neural network model for time series forecasting. First, import the necessary Keras classes, including Sequential, Dense, and LSTM. The model is created as a "sequential" object, and an LSTM layer with 200 neurons, a "relu" activation function, and an input shape defined by "n_inputs" and "n_features" is added. The LSTM layer output is then passed to a "dense" layer with a single output neuron. The model is compiled using the "adam" optimizer and the mean squared error ("mse") loss function. The 'summary()' method is used to display a summary of the architecture, including the number of parameters and the shape of the input and output tensors for each layer. This code can be used to create LSTM models for time series forecasting as it provides an easy-to-follow example that can be adapted to different datasets and forecasting problems.

3.3 Training phase

model.fit(generator, epochs = 50)

        This code trains an LSTM neural network model for 50 epochs using the "fit()" method in Keras. A "TimeseriesGenerator" object generates batches of input/output pairs for the model to learn from. The 'fit()' method updates the model parameters using backpropagation based on the loss function and optimizer defined during model compilation. By training a model, it learns to make predictions on new, unseen data based on patterns learned in the training data.

plt.plot(model.history.history['loss'])

last_train_batch = scaled_train[-12:]

last_train_batch = last_train_batch.reshape(1, 12, 1)

last_train_batch

model.predict(last_train_batch)

This code uses a trained LSTM neural network model to make predictions on new data points. The training data is selected, scaled and adjusted to the appropriate format for the model. The "predict()" method is called on the model, taking the reshaped data as input, and the output is the predicted value for the next time step in the time series. This is an important step in time series forecasting using LSTM models.

<span style="background-color:#f9f9f9"><span style="color:#242424">scaled_test[0]</span></span>

This code prints the first element of the scaled test data array. The "scaled_test" variable is a NumPy array of test data transformed with a "MinMaxScaler" object. Printing the first element of this array will show the scaled values ​​for the first time step in the test data.

3.4 Forecast

y_pred = []

first_batch = scaled_train[-n_inputs:]
current_batch = first_batch.reshape(1, n_inputs, n_features)

for i in range(len(scaled_test)):
    batch = current_batch
    pred = model.predict(batch)[0]
    y_pred.append(pred)
    current_batch = np.append(current_batch[:,1:, :], [[pred]], axis = 1)


y_pred


scaled_test

This code uses a trained LSTM model to generate predictions on the test data. It iterates through each element in the scaled test data using a for loop. In each iteration, the current batch is used to make predictions using the model's "predict()" method. The predicted values ​​are then added to the "y_pred" list and the current batch is updated. Finally, the "y_pred" list is printed along with the "scaled_test" data to compare predicted values ​​with actual values. This step is critical for evaluating the performance of the LSTM model on the test data.

df_test

y_pred_transformed = scaler.inverse_transform(y_pred)

y_pred_transformed = np.round(y_pred_transformed,0)

y_pred_final = y_pred_transformed.astype(int)

y_pred_final

This code uses the "inverse_transform()" method of the scaler object to transform the predictions generated in the previous step back to the original scale. The converted value is rounded to the nearest integer using the 'round()' function and converted to an integer using the 'astype()' method. Print the resulting array of predicted values ​​"y_pred_final" to show the final predicted values ​​for the test data. This step is important for evaluating the accuracy of the predictions of the LSTM model at the original scale of the data.

df_test.values, y_pred_final

df_test['Predictions'] = y_pred_final

df_test

        The code above shows the predictions generated by the LSTM model added to the original test dataset. First, the "values" attribute is used to extract the values ​​of the "df_test" data frame, which are then paired with the predicted values ​​"y_pred_final". Then, add a new column called "prediction" to the "df_test" dataframe to store the predicted values. Finally, print the 'df_test' dataframe with the newly added 'prediction' column. This step is important to visually compare the actual values ​​of the test dataset to the predicted values ​​and to assess the accuracy of the model.

plt.figure(figsize=(15, 6))
plt.plot(df_train.index, df_train.Passengers, linewidth=2, color='black', label='Train Values')
plt.plot(df_test.index, df_test.Passengers, linewidth=2, color='green', label='True Values')
plt.plot(df_test.index, df_test.Predictions, linewidth=2, color='red', label='Predicted Values')
plt.legend()
plt.show()

This code block is using the library to generate the plot. It first sets the graph size, then plots the training data as a black line, the true test values ​​as a green line, and the predicted test values ​​as a red line. It also adds a legend to the plot, and uses that method to display the legend.matplotlibshow()

3.5 mean square error

        The mean square error (MSE) is a measure of how close the regression line is to a set of points. It is calculated by taking the average of the squared differences between predicted and actual values. The square root of the MSE is called root mean square error (RMSE), which is a common measure of forecast accuracy. In this code block, the RMSE is calculated using the functions in the module and  the functions in the module . RMSE is used to evaluate the accuracy of LSTM model predictions compared to the true values ​​in the test set. mean_squared_errorsklearn.metricssqrtmath

from sklearn.metrics import mean_squared_error
from math import sqrt

sqrt(mean_squared_error(df_test.Passengers, df_test.Predictions))

        This code calculates the root mean square error (RMSE) between actual passenger values ​​( ) and predicted passenger values ​​( ) in the test set. RMSE is a common metric for evaluating the performance of a regression model. It measures the average distance between predicted and actual values, taking into account the square of the difference between them. RMSE is a useful metric because it penalizes large errors more severely than small errors, making it a good indicator of the overall accuracy of a model's predictions.df_test.Passengersdf_test.Predictions

        In summary, we implemented a time series forecasting model in Keras using the LSTM algorithm. We trained the model on the monthly airline passenger dataset and used it to predict the next 12 months. The model performed well, with a root mean square error of 30.5. Visualizations of true, predicted, and trained values ​​show that the model is able to capture general trends and seasonality in the data. This demonstrates the power of LSTMs in capturing complex temporal relationships in time series data and their potential for accurate forecasting.

4. Conclusion

        The example in this article is a typical time series processing method, which can be used as a classic. Readers can spend some more time digesting this case; it turns out that tools like LSTM can be used not only for NLP, but for any time series, such as stock forecasting.

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/131906939