Short-term OD Passenger Flow Prediction of Urban Rail Transit under Partial Observability Based on Deep Learning

 1. Article information

The article introduced this time is "Deep learning for short-term origin–destination passenger flow prediction under partial observability in urban railway systems" published in Neural Computing and Applications (2019) in 2022.

2. Summary

Abstract Short-term OD flow prediction is an important content of urban rail transit operation planning, control and management. While the problem of inbound and outbound passenger demand forecasting has been studied in various studies, the problem of OD passenger flow forecasting has received much less attention. A key challenge in short-term OD flow forecasting is the partial observability of OD flow information due to trips not completed within a certain time interval. In this paper, a novel deep learning architecture for OD flow prediction in urban rail transit systems is developed, and various mechanisms for data representation and processing of partial information are investigated. The deep learning framework consists of three main components, including multiple LSTM networks with an attention mechanism to capture short/long-term temporal dependencies, a time-shifted map matrix for spatio-temporal correlations, and a partial OD flow observation reconstruction mechanism. The model was validated using smart card data from the Hong Kong subway system and compared with state-of-the-art predictive models. The experiments aim to examine the properties of the proposed method and its individual components. The results show the high accuracy and robustness of the proposed model, as well as the importance of local observations of OD flow information to improve prediction performance. In terms of data representation, predicting OD flow deviations is consistently better than directly predicting OD flow.

3. Introduction

When passengers make a travel request, they can directly observe the real-time and complete OD passenger flow information of the online car-hailing service. However, in public transportation, passengers swipe their cards at the starting point, and they can only see the destination information when they arrive at the destination. Another challenge for short-term OD flow prediction is modeling the lag-time correlation between pairs of different ODs, which has not been fully studied in the literature. That is, the passenger flow of the target OD pair may be affected by the passenger flow of adjacent or distant OD pairs with different entry times. In real-time forecasting models, modeling temporal correlations with different lags is the key to efficiently capture spatiotemporal correlations.

This paper studies the problem of real-time prediction of OD passenger flow in urban rail transit system. A new deep learning structure called Time-Shifted Spatio-temporal Network (TS-STN) is proposed. It consists of three modules that capture spatial, temporal, and real-time information in predictions: 1) the spatial module models the spatial correlation between OD pairs using a graph convolution model with a time-shifted graph matrix; 2) the temporal The domain module learns short-term/long-term dependencies of OD flows using an attention-based LSTM network. Specific dense layers are designed for different OD pairs to capture the heterogeneous flow dependencies of different OD pairs; 3) The real-time information module designs a novel mechanism to make full use of partially/fully observed real-time information, including partially observed OD flows and fully observed incoming demand until forecasts are made. The main contributions are:

• A new deep learning method for short-term OD passenger flow forecasting based on partial observability is proposed. It obtains the spatio-temporal correlation of OD flow and partial information of real-time OD flow observation.

• A graph convolution model with a time-shifted graph matrix is ​​proposed to explicitly capture the heterogeneous spatiotemporal dependencies among OD streams.

• The method is validated using empirical data from urban railway systems and systematic experiments to evaluate its predictive performance as a function of data representation (input/output), modeling technique (module function) and information availability.

4. Model building

1. Problem Definition

The problem of this research is to predict the short-term OD passenger flow of urban rail transit system. Considering a metropolitan area network with N OD pairs, the traffic time step t of the OD pair is denoted as , which is the number of trips from the origin station to the terminal station at time t. ={ } denotes the OD flow over time for i (eg from time step 1 to t). ={ } is the traffic of all N OD pairs at the tth time step. In time interval t, the task of short-term OD traffic forecasting at the network level is to predict OD traffic { } for ≥ 1 time period in the future, given observed historical and real-time OD traffic, up to time interval t. Observations may include OD flows for previous time periods as well as previous days. We call this problem one-step ahead forecasting. Note that for OD flow forecasting, one challenge is that the level of observability of information is different because different ODs have different travel times. Another challenge is modeling the lag time dependence between pairs of different outer diameters.

2. Method

In this section, we elaborate the overall framework of the proposed Time-Shifted Spatio-temporal Network (TS-STN) for the multi-step, short-term OD traffic forecasting problem. Figure 3 shows the TS-STN architecture. It consists of three main modules that capture the spatiotemporal dependence of OD flow over time (interval, day, week) and space:

Spatial Module: It uses a Time-Shifted Graph Convolution (TSGC) model to capture the spatial correlation between OD pairs. At time step t, take { } and the timing shift map matrix as the input of the information, and input the output to the interval LSTM in the timing module.

Temporal module: It consists of three LSTM models, namely interval LSTM/daily LSTM/weekly LSTM. These are used to model the time dependence between the target forecast interval and the previous time interval, and within the same interval in previous days/weeks, respectively. An attention mechanism is developed to learn daily and weekly similarities, which captures the most relevant long-term information for prediction and respects the time-dependent heterogeneity of OD pairs. Specifically, the temporal attention mechanism takes the input of the historical hidden state vector from the daily/weekly LSTM and the current hidden state vector from the interval LSTM. It outputs daily and weekly context vectors.

· Real-time information module: use two types of real-time information, fully observed inbound demand and partially observed OD flow. The output is an OD flow matrix estimated using the station entrance demand and historical allocation matrices, and a reconstructed OD flow using partially observed OD flows.

Finally, for different OD pair groups, a fully connected dense layer with independent parameters inputs the real-time estimated/constructed OD flow, the target hidden state of the interval LSTM and the daily/weekly context vector, and outputs the OD flow prediction. In the case of multi-step forecasting, the previous historical steps and the most recent predicted OD flow are used to predict the OD flow at subsequent steps.

a3a482b6a4222cab68cefd1dcbad92cb.png

2.1 Time-lapse graph convolution

Existing methods have some disadvantages, for example, unlike sites or regions in a graph that can be regarded as nodes, constructing a suitable adjacency matrix for OD pairs is not trivial and more challenging. This is because each OD pair consists of two stations; it is difficult to reflect the connectivity between OD pairs based on network topology characteristics. Second, lag time correlations between OD pairs were not captured using correlation matrices proposed in the literature. To overcome this problem, we propose a Temporal Shift Graph Convolution (TSGC). We first introduce a temporal displacement correlation matrix . A metro network is defined as a graph G with N nodes, representing N OD pairs. Suppose we have two time series observations and of equal length, denoting the historical OD demand for i and j, respectively. Then, the time-shifted correlation expressions for i and j with time lag s are:

1f2cb28131d39a17e5565c1e1131fc2a.jpeg

Among them, cov( , ) is the covariance between two vectors, var( ) is the variance of the vector, and is the time series lagged by s time intervals. Inclusion of weakly correlated regions/pairs may degrade predictive performance. Therefore, we establish the corrected time-displacement correlation matrix , which is the corrected correlation matrix, by filtering the matrix unit. Specifically, we assign a value of 0 for pairs with weak cross-correlations (eg, values ​​smaller than the row-average cross-correlation value in ).

At time t, a modified time-shifted correlation matrix is ​​used to capture the spatiotemporal correlation between t and interval ts. A multi-layer TSGC takes and all N pairs of OD streams at time ts as input, and outputs a feature matrix , which can be written as Equation 3. is the corrected time-shifted correlation matrix after adding self-joins. is the degree matrix of and is the layer-specific weight matrix to be trained in the TSGC model. Considering the dependence between s lag intervals and time t, perform TSGC operation on according to formula (3), and output feature matrix { }. Finally, the final output feature matrix of all n pairs at t is obtained by the fusion operation U (eg, average, max, sum) from equation (4) and connected with , and fed into the temporal module at time t.

b7ca12038b076e00c76985cbd9233f66.png

2.2 LSTMs with Temporal Attention

We apply LSTM to model the dependency of OD flow on recent time intervals, called Interval-LSTM (ILSTM). Furthermore, long-term temporal patterns (e.g., daily/weekly similarity) can improve forecasting performance. Training LSTMs with long sequences is challenging. It increases the risk of vanishing gradients, which significantly weakens the effect of learning periodicity. To solve this problem, we further design two LSTMs, Daily-LSTM (DLSTM) and Weekly-LSTM (WLSTM), as shown in Equations (6b)-(6c), to learn long-term temporal dependencies. To obtain daily similarity, for the kth day, the OD from the previous p days in the same time interval t, denoted as [ ] is fed into DLSTM. Similarly, the OD is from the same day in interval t in the previous q weeks, denoted as input to WLSTM to capture weekly patterns.

b2b3a2f6259f27e0469e8bac1c3ac5c5.png

Temporal Attention Mechanism The temporal attention mechanism is further adopted to improve the prediction accuracy. First, it utilizes more historical hidden states from prediction intervals of previous days/weeks, rather than relying purely on the hidden states from the last step of DLSTM and WLSTM. Second, long-term temporal heterogeneity can be learned by assigning different weights to relevant intervals at different levels.

2.3 Partial OD flow information

A key difficulty in OD flow forecasting for subway systems is that the short-term intervals (such as intervals t, t-1, ...) cannot be observed due to incomplete travel. Intuitively, incorporating more information from different sources in the model is beneficial to improve the predictive performance. Two types of information can help complete unobserved OD flow information in real time.

• By utilizing inbound demand for which full OD flow was not available in the previous period, we estimate OD flow based on the average ratio of OD flow to inbound demand.

• OD flow can be partially observed in some recent intervals, eg, when some run-in was completed at the start of the interval, this can be used to extrapolate the OD flow for the entire interval.

Both the estimated OD flow and partial OD flow are transformed through a fully connected layer and integrated into our model structure. Algorithm 1 demonstrates the main process of the proposed ST-TSN model.

a47f668504889feb5d1c06456d3cb125.png

5. Experimental results and analysis

1. Dataset description

This study uses smart card data from the Hong Kong Mass Transit Railway system. The metro network consists of 10 heavy rail lines, 163 stations, a light rail network and feeder bus services. As of the end of 2019, heavy railways transported an average of 4.68 million passengers per day. The AFC system is closed, and there are transaction records at the entry and exit locations, thus giving a complete record of a trip. The dataset is from the AFC system from April 9th ​​to June 24th, 2018. Raw data were preprocessed into OD matrices with 15 min granularity. The last two weeks of data are used as the test set, and the remaining days are used as the training set. In addition, 12 typical OD pairs with different characteristics are selected for experiments. The 12 selected OD pairs varied in demand level and travel distance, including direct and interchange trips (see Figure 4).

c37b46d3aecc445c7d7bd8abe90262c3.png

The model performance is evaluated using mean absolute error (MAE) and root mean square error (RMSE), see equations (8a)-(8b).

2af70bda59fc1082caafb9984148e731.png

2. Model development

The final predictive model. Four components are considered during model development, namely input/output representation, temporal module, spatial module and real-time information module.

• For the input/output representation, we tested the performance in absolute value, distribution fraction of origin station ingress demand, and deviation of OD flow.

• For time modules, short- and long-term time dependencies between forecast intervals and historical intervals on the same day, taking into account previous days/weeks.

• For the spatial module, three different adjacency matrix construction methods were tested, including correlation matrix, modified correlation matrix, and time-shifted correlation matrix.

• For the real-time information module, we considered two kinds of information, including estimated OD flow and partial observed OD flow.

To find the best representation for model input/output, we start the process by selecting a base model. The base model treats the OD stream as a time series and uses LSTMs to model temporal dependencies. Note that instead of using a conventional version of LSTM, we develop an attention-based LSTM (ALSTMs-SL) with separate dense layers to model temporal dependencies. In ALSTMs-SL, a daily/weekly LSTM is used at the same time interval as previous days/weeks for long-term information. The motivation for choosing the same time interval is that passenger flows often have distinct cyclical (daily/weekly) patterns due to travelers' habitual activities and travel choices. A temporal attention mechanism is used to place more emphasis on the most relevant time steps. In prediction, dense layer separation of different OD pairs is important to capture the unique features and heterogeneous contributions of OD flow.

3. Model comparison

The proposed model is compared with benchmark models in the literature, including classical statistical methods, state-of-the-art machine learning and deep learning models. Table 7 shows the results of the proposed model and the baseline model for 15 minutes, 30 minutes and 45 minutes ahead predictions on the test dataset. This method achieved the lowest MAE and RMSE in all prediction intervals.

b1e67560b9ed9b58d7be083da8396ce0.png

Table 8 presents the training and prediction times of the ANN, LSTM, Seq2Seq-Att models on a personal computer (Intel(R) Xeon(R) Silver 4114 CPU @ 2.20 GHz). During the training phase, the proposed TS-STN runs slower than the other three models due to the spatial and real-time information modules. For prediction, the total time cost on the test set is reported. Although the TS-STN model takes a long time to calculate, it is acceptable in practical applications.

927636e7a201e50254b2beb94be04bf2.png

4. Sensitivity analysis

The proposed model (TS-STN) is sequentially built by adding temporal, spatial and real-time information components to the network. It may not be optimal due to complex interactions between deep learning modules. Therefore, in this subsection, we conduct a sensitivity analysis to verify the reliability and robustness of the proposed model. In the following experiments, the established TS-STN is used as the basic model. We examine many variants by changing individual component settings. For the convenience of discussion, "T", "S" and "P" represent time, space and real-time partial information modules, respectively.

Table 9 summarizes the results of various variants of the proposed TS-STN. The results support the validity of using OD flow deviations instead of absolute values ​​as model input/output. Table 9 also shows the effect of different settings on the prediction performance. Changing settings in the temporal module has a greater impact on model performance than modifications in the spatial module.

9a49438e2e2d9a6b2151897a602effc8.png

6 Conclusion

Short-term OD flow forecasting is suitable for a variety of transportation applications. In this paper, a deep learning-based model is developed to predict short-term OD flows under local observability in urban rail transit systems. It also systematically studies various mechanisms of deep learning model components, including data representation, spatiotemporal correlation, and partial observations. A new deep learning architecture (TS-STN) is proposed. The method utilizes an attention-based LSTM network and a graph convolutional network with a time-shifted correlation graph matrix to integrate temporal and spatial modules. Additionally, partially observed OD flows (due to incomplete travel when predicted) are also incorporated into the model architecture. A case study was conducted using smart card data from a busy urban rail system. The results confirm the superior performance (accuracy and robustness) of the proposed model in both single-step and multi-step forecasting of OD pairs with different characteristics compared to existing forecasting models. Empirical studies have shown that using OD flow deviations from the mean works much better than directly predicting OD flow (which is a common practice in the literature). When designing deep learning structures, it is important to consider the spatiotemporal correlation with time lags and make full use of partially observed OD flow information to improve prediction performance. Interesting directions for future research include integrating data from multiple sources (e.g. social media, meteorology), especially for demand forecasting under abnormal conditions (e.g. adverse weather, special events), and online forecasting using streaming data to identify Possible problems with missing data, noise and delayed observations.

Attention

Welcome to the WeChat public account "When Traffic Meets Machine Learning"! If you are in the field of rail transit, road traffic, and urban planning like me, you can also add WeChat: Dr_JinleiZhang, note "Join the group", join the traffic big data exchange group! Hope we make progress together!

Guess you like

Origin blog.csdn.net/zuiyishihefang/article/details/129742668