2023.7.16 The fifty-ninth weekly report

Table of contents

foreword

Literature reading: Spatiotemporal LSTM models for prediction across multiple spatiotemporal scales

background

Ideas for this article

The problem this article addresses

methodology

SPATIAL

Automated Machine Learning Models

data processing

model performance

the code

LSTM multivariate predictive model written in Python

Summarize


foreword

This week, I studied an article that uses LSTM to solve problems related to spatiotemporal prediction.This paper presents a DL framework based on bidirectional LSTM to extend the learning of time-series signals. The resulting model provides a scalable forecasting system that adjusts naturally to spatiotemporal patterns. The cost of model training was reduced by an order of magnitude as we trained a single model for each dataset rather than each signal. SPATIAL adjusts naturally to missing data, and the random distribution of missing data between sensors can be leveraged to lessen the effect of data gaps on learning (i.e., since typically sensors exhibit data gaps at different times, combining information from multiple sensors in a single framework can improve learning on noisy and error-prone data). Finally, extending the recurrent structure of LSTM-type approaches for time series applications to the spatiotemporal direction has the potential to translate to many different industrial applications such as weather, transport, and epidemiology.In addition, in terms of coding, I am currently reviewing the foundational knowledge intensively.

This week, I learned an article using lstm to solve problems related to space-time prediction. This paper proposes a bidirectional LSTM-based deep learning framework to extend the learning of time series signals. The resulting model provides a scalable forecasting system that naturally adapts to spatiotemporal patterns. The cost of model training is reduced by an order of magnitude because we train a model per dataset rather than per signal. SPATIAL adapts naturally to missing data and can exploit the random distribution of missing data across sensors to reduce the impact of data gaps on learning (i.e., combining information from multiple sensors since usually sensors exhibit data gaps at different times in a framework that improves learning on noisy and error-prone data). Finally, extending the recurrent structure of LSTM-type methods for time-series applications to the spatio-temporal direction has the potential to translate to many different industrial applications such as weather, traffic, and epidemiology. Other than that, I'm currently cramming the basics when it comes to code.

Literature reading: Spatiotemporal LSTM models for prediction across multiple spatiotemporal scales

--Fearghal O'Donncha, Yihao Hu, Paulito Palmes, Meredith Burke, Ramon Filgueira, Jon Grant,
A spatio-temporal LSTM model to forecast across multiple temporal and spatial scales,
Ecological Informatics,
Volume 69,
2022,
101687,
ISSN 1574-9541,
https://doi.org/10.1016/j.ecoinf.2022.101687.

background

Accurate predictions of ocean processes require an understanding of spatial and temporal dependencies. These relationships are expressed in the classic Navier-Stokes equations, which underlie most modern ocean forecasting systems ( Chorin, 1968 ). On the other hand, machine learning (ML) has achieved great success on problems with clear spatial (e.g. image processing) or temporal (e.g. speech recognition) dependencies. However, for space-time applications, the science is still underdeveloped.

Traditionally, predictions of ocean processes have relied on physics-based methods that solve a set of governing equations. The main challenges of physics-based approaches are the enormous computational expense (often requiring high-performance computing facilities) deployed at high resolution on a wide range of spatial and temporal scales, and the complexity of configuring and parametrizing the models, often requiring expert users.

Machine learning (ML) methods for predicting ocean processes are in their infancy and have traditionally been limited by data sparsity challenges. To circumvent this limitation, there has been recent interest in using ML to develop low-cost approximations or alternatives to physics-based models, but since the models themselves are used to train deep learning networks, the accuracy and spatial range are limited to traditional methods prescribed level.

In summary, machine learning applications for ocean processes have mainly focused on processing gridded data from numerical model outputs, or large volumes of surface water data sampled through remote sensing. These datasets are suitable for image processing algorithms, which are widely used in machine learning in various fields such as facial recognition and computer vision. Applications trained on in situ sensor data tend to treat each time series signal independently as a univariate or multivariate regression problem. Naturally, this loses information on spatial dependencies that is crucial for a comprehensive assessment of natural systems.

Ideas for this article

This paper proposes a framework that scales model accuracy by explicitly learning the spatial and temporal components of distinct but related time-series signals. Our approach is derived from the Long Short-Term Memory (LSTM) algorithm, the most widely used deep neural network framework for time series ( Hochreiter and Schmidhuber, 1997 ). The advantage of LSTM is that it can be adjusted to the appropriate memory size, which should be considered as part of the time series signal. Previous applied studies rarely considered temporal and spatial dependencies . We refer to our framework as Spatial-LSTM or SPATIAL.

With SPATIAL, we extend these capabilities to connections between different sensors. The algorithm was applied to three real-world datasets: ocean current velocity, temperature, and dissolved oxygen. Each dataset exhibits completely different characteristics. Ocean current patterns are influenced by external physical drivers such as tidal effects and surface wind stresses, as well as density-driven currents from changes in temperature and salinity. Variations in temperature can be explained by a variety of exogenous processes, such as solar radiation, air temperature, seabed heat transfer, and external heat fluxes from rivers and the open ocean ( Pidgeon and Winant, 2005 ). Ocean oxygen content, on the other hand, is a biogeochemical process influenced by hydrodynamics (horizontal and vertical mixing, residence time, etc.), weather (temperature reduces oxygen solubility), nutrient loading (anthropogenic enrichment) and respiration (Caballero- Alfonso et al., 2015 ). The combination of nonlinear response and sensitivity to multiple opaque variables makes these challenging prediction problems for traditional physical modeling approaches. Instead, our approach aims to learn the spatial and temporal patterns of data to provide a more flexible predictive framework.

The problem this article addresses

This paper addresses two fundamental challenges associated with contextual applications of machine learning:

1) Data sparsity, especially in challenging marine environments.

2) Environmental datasets are inherently interconnected in both spatial and temporal directions, while classical ML methods only consider one of them at a time.

methodology

SPATIAL is the first attempt to use a bidirectional LSTM model in the spatial and temporal directions of a time series signal. More specifically, the signals we wish to predict (ocean currents, temperature, and dissolved oxygen) are known to have certain dependencies on neighboring signals based on geographic proximity or domain expertise. We implement a bidirectional LSTM model on the spatial orientation of the relevant sensors and train the model to learn both spatial and temporal structure.

SPATIAL

An extension of parameter sharing supported by recurrent networks is the bidirectional LSTM, which processes sequential data in both backward and forward directions ( Schuster and Paliwal, 1997 ). In effect, it trains two LSTMs on the input data. The first LSTM is on the original input data and the other is on a reversed copy of the input data ( Imrana et al., 2021  ). This naturally applies to sequential or time-series data, and has demonstrated improved performance over unidirectional LSTMs in areas such as language processing ( Wang et al., 2016 ) and speech recognition ( Graves et al., 2013b ). Intuitively, a bidirectional LSTM processes input data in two directions using a forward hidden layer and a backward hidden layer. While bidirectional LSTMs have been applied to time series forecasting before, this paper is the first to apply it to spatial and temporal dimensions.

Figure 1 (b) provides a schematic illustration of our spatial implementation. A unidirectional LSTM is applied in the time direction of each sensor, and across sensors, a series of stacked bidirectional layers enable learning across different time series. The number of stacked layers is a hyperparameter chosen during training. The input data to the network consists of an  m × n  ×  l array (where m is the number of sensors, n is the number of time points, and l  is the number of lags used to make predictions), and the labels consist of the corresponding  m  ×  n  × k array Composition, where  k  is the prediction window. In this study, k  is equal to 1 because we used a rolling forecast implementation for forecasting. That is, the model makes a prediction for the specified time step, then takes the next hour's actual value from the test set and feeds it to the model to predict the next time step.

Figure 2  summarizes the architecture of our spatial model. Individual sensor time-series vectors are combined into matrices of size m  ×  n  × l, where l  represents the number of lags the model uses to make predictions. This can be thought of as a model hyperparameter to be optimized during model training. Data is passed through a series of stacked bidirectional LSTM layers, and the exact number is optimized during model design and training. For our experiments, we explored 1 to 10 LSTM layers while employing a Rectified Linear Unit (ReLU) activation function ( Nair and Hinton, 2010 ).

An optional masking layer allows the user to specify that certain data points are masked or removed during model training to eliminate outliers. However, this feature should be used judiciously in LSTM implementations, since masking of different points can affect the temporal continuity of the data and hinder model learning. For this reason, we did not apply a mask in the model, but used the data interpolation routine described in Section 3.4 .

We implemented the algorithm in python using the Keras library bidirectional layer wrapper. Keras allows us to efficiently "convert" regular or unidirectional LSTMs to bidirectional LSTMs using their high-level API ( Gulli and Pal, 2017 ). The full source code of SPATIAL is  publicly released under the Apache license at https://github.com/IBM/spatial-lstm  to ensure reproducibility of the framework.

Figure 1. (a) Schematic of a classic LSTM architecture for predicting a single time-series signal, while (b) presents our spatial implementation, where spatial (sensor-to-sensor) and temporal patterns are explicitly learned by the network. In this representation, m  is the number of sensors, n  is the training time period, and k  is the time period we wish to predict.

Figure 2. SPATIAL ingests multidimensional inputs, enabling deep neural networks to extract features from different sensors and use the learned information to predict time series for each sensor. In other words, for each sensor, predictions are based not only on its previous time series, but also on information from other sensors. The input data consists of m sensor datasets with n time  steps . To generate a forecast, pass the values ​​from the previous  l  time steps to the model to generate the corresponding time  k  forecast

Automated Machine Learning Models

Gartner — a respected enterprise research and advisory firm — identified the automation of ML model deployment as one of the top ten key technology trends for 2020 ( Cearley et al., 2019 ). Known as AutoML or AutoAI, these approaches are designed to help automate the steps and processes involved in the lifecycle of creating, deploying, managing, and operating AI models ( Dickson, 2020  ). Gartner highlighted its ability to "democratize AI," which enables the development of low-code ML models that don't require a high level of data science experience to set up and parameterize the model. ( Cearley et al., 2019  ). Various AutoML or AutoAI products exist, the most prominent of which are IBM's AutoAI, Google's autoML, and H20.ai's H2O.

The basic idea of ​​the AutoAI approach can be thought of as "AI of AI". Using machine learning, it is designed to query user data and discover optimal structures, data transformations, and tunable parameters (or hyperparameters) for machine learning regression and classification. AutoAI methods are particularly valuable for benchmarking studies because they can be easily replicated by others and do not require a high level of data science expertise. Many tools, such as IBM AutoAI offer free plans, especially suitable for scientific and academic research.

To evaluate the relative performance of SPATIAL with respect to existing methods, we included  the two best baseline models produced by the AutoML ( Drori et al., 2018 ) method:

  • IBM AutoAI (IBM, 2020): A technology aimed at automating the end-to-end AI lifecycle, from data cleaning to algorithm selection, to model deployment and monitoring in ML workflows (Wang et al., 2020 )  .
  • AutoMLPipeline (AMLP) ( Palmes, 2020 ): An open-source toolbox that provides semi-automated ML model generation and prediction capabilities.

The models generated by the aforementioned AutoML techniques were used to benchmark the predictive skills of the SPATIAL framework, provide additional insights into the data characteristics of environmental datasets, and evaluate the ease of deployment of different model frameworks. While the AutoML framework provides a large number of algorithms as options, we restrict all algorithms to a single algorithm for all sensors to provide a more standardized comparison and simplify interpretation.

Based on average performance, the selection models for AutoAI and AMLP are XGBoost and Random Forest, respectively. Random Forests (RF) excel in complex forecasting problems characterized by large numbers of explanatory variables and nonlinear dynamics. RF is a classification and regression method based on the aggregation of a large number of decision trees. A decision tree is a conceptually simple yet powerful predictive tool that breaks down a data set into smaller and smaller subsets while progressively developing associated decision trees. The resulting intuitive paths from explanatory variables to outcomes help provide models that are easy to interpret.

While XGBoost shares many characteristics and strengths with RF (namely, interpretability, predictive performance, and simplicity), a key difference that facilitates performance gains is that decision trees are built sequentially rather than independently . The XGBoost algorithm was developed at the University of Washington in 2016, and since its launch, it has been credited with winning numerous Kaggle competitions and used in several industry applications. XGBoost provides algorithmic improvements such as a sparse-aware algorithm for sparse data and a weighted quantile sketch for approximate tree learning , as well as optimizations for distributed computing to build a scalable tree boosting system that can handle tens of billion examples ( Chen and Guestrin, 2016 ).

data processing

Schematic diagram of the data preprocessing and interpolation methods used in this paper.

model performance

Figure 4 provides a time-series plot comparing the five models to observations (we select one sensor from each dataset to illustrate). All models closely capture short-term fluctuations and seasonal patterns in the data. The same is true for the ADCP data, albeit with relatively high volatility. In general, deep learning models outperform machine learning models, which may be due to the enhanced ability of deep learning methods to adapt to the nonlinearity of datasets.

 Figure 4. Spatial performance (green lines) compared to two baseline ML and two DL models.

The baseline model consists of an XGBoost model (pink line) configured and deployed by AutoAI and a random forest model (orange line) generated using the AMLP pipeline. The DL model consists of a univariate LSTM (purple) and a CNN model (blue). Black circles represent observations. Pictured are temperature (top), dissolved oxygen (middle), and current velocity (bottom) 24-hour forecasts for each model forecast over a 24-hour period, with a 30-minute stride (new forecasts are generated every 30 minutes)

Performance comparisons show that existing algorithms provide strong predictive skills on these datasets. This performance is broadly replicated or improved by our SPATIAL model, which reports the lowest error across all three datasets ( Table 3 ), slightly outperforming CNN and LSTM models. That is, we achieve performance comparable to existing state-of-the-art by employing more straightforward data preprocessing as well as model generation and implementation pipelines that support fewer models by grouping sensor data. Our spatial model has many practical advantages over existing algorithms, in particular:

  • 1.

    Data preprocessing pipelines are simplified as the SPATIAL pipeline loads all data into a single array that is fed to the network (the model does not rely on exogenous variables or feature transformations).

  • 2.

    By feeding data from multiple sensors that are geographically close or share certain features, the network can capture certain physical relationships that exist in nature and potentially be used to regulate models.

  • 3.

    Processing all sensors at the same time can greatly improve computational efficiency - training only one model instead of a model for each signal.

  • 4.

    Finally, as the number of datasets increases (number of sensors and duration of the study period), deep learning models are expected to achieve higher performance. The SPATIAL approach, which allows a single model to learn from multiple sensors, is expected to amplify the performance gains compared to a single CNN or LSTM model.

Points 1 and 2 above are closely related. In forecasting, we want to use the simplest model that best reflects what we know about the system. The proposed SPATIAL framework simplifies the model training and deployment process for environment modeling by:

  • Only one model needs to be trained and maintained;
  • No transformation or matrix manipulation of input data is required;
  • Naturally generate time series for the desired forecast period, rather than generating individual values ​​for each direct model (although classical methods can use iterative methods to generate time series forecasts through multi-step forecasting, forecasting skills are often limited (Hamzaçebi et al . , 2009 )).

the code

LSTM multivariate predictive model written in Python

import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, LSTM

# 读取数据集
data = pd.read_csv('dataset.csv')

# 将数据集转换为numpy数组格式
data = np.array(data)

# 划分训练集和测试集
train_size = int(len(data) * 0.8)
train_data = data[:train_size]
test_data = data[train_size:]

# 归一化处理
max_value = np.max(data)
min_value = np.min(data)
scalar = max_value - min_value
train_data = list(map(lambda x: x / scalar, train_data))
test_data = list(map(lambda x: x / scalar, test_data))

# 定义生成多变量序列数据的函数
def generate_multivariate_sequences(dataset, num_steps):
    X, y = [], []
    for i in range(len(dataset)-num_steps):
        X.append(dataset[i:i+num_steps, :-1])
        y.append(dataset[i+num_steps, -1])
    return np.array(X),np.array(y)

# 设置超参数
num_steps = 7
input_dim = 5
hidden_dim = 10
output_dim = 1
epochs = 100
batch_size = 32

# 生成训练集和测试集的序列数据
X_train, y_train = generate_multivariate_sequences(train_data, num_steps)
X_test, y_test = generate_multivariate_sequences(test_data, num_steps)

# 定义LSTM模型
model = Sequential()
model.add(LSTM(hidden_dim, input_shape=(num_steps, input_dim)))
model.add(Dense(output_dim))

# 编译模型
model.compile(loss='mse', optimizer='adam')

# 训练模型
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(X_test, y_test))

# 对测试集进行预测
y_pred = model.predict(X_test)

# 反归一化处理
y_pred = y_pred * scalar + min_value
y_test = y_test * scalar + min_value

# 计算均方根误差
rmse = np.sqrt(np.mean(np.square(y_pred - y_test)))
print('RMSE:', rmse)

Notes:

  • Lines 1 to 4: Import the required libraries.
  • Lines 6 to 10: Read and preprocess the data set, convert the data set to numpy array format, divide the training set and test set, and perform normalization.
  • Lines 13 to 23: Define a function that generates multivariate sequence data. The input parameters include the data set and step size, and the output parameters include the input sequence X and the output sequence y.
  • Lines 26 to 33: Set hyperparameters, including step size, input dimension, hidden layer dimension, output dimension, number of iterations, and batch size.
  • Lines 36 to 39: Generate the sequence data of the training set and the test set, using the generate_multivariate_sequences function just defined.
  • Lines 42 to 46: Define the LSTM model, including an LSTM layer and a fully connected layer.
  • Line 49: Compile the model, use the mean square error as the loss function, and use the Adam optimizer.
  • Line 52: Train the model, use the training set for training, and use the test set for validation.
  • Lines 55 to 57: Predict the test set and perform denormalization to obtain the predicted value and real value of the original data.
  • Line 60: Calculate the root mean square error (RMSE), which measures the prediction accuracy of the model.

Summarize

I think this week's literature reading selection is very good, and it feels very suitable for the direction of my thesis. If you have time, you can learn more.

The main direction now is still to understand and use the code. This is the ability I currently lack, and I should spend more time here.

Guess you like

Origin blog.csdn.net/weixin_43971717/article/details/131755945