Literature Read the report - Move, Attend and Predict

Citation

Al-Molegi A , Martínez-Ballesté, Antoni, Jabreel M . Move, Attend and Predict: An Attention-based Neural Model for People’s Movement Prediction[J]. Pattern Recognition Letters, 2018:S016786551830182X.


Overview

Herein and previously read track prognosis several articles which adopt neural network to enhance circulation vignettes predicted trajectory, to apply a larger time span (hours minimum, the device collected by the GPS, punch card, etc.) of place transform prediction on. Specifically, the definition Move, Attend and Predict (MAP) input model, by model (two-dimensional address, time stamp), and the output is predicted based on past information of the address of the next address , by the attention model RNN encoder, force model and forecast model consists of three parts, in general, the structure is relatively simple, some of the technical journal review is limited to a slight lag time, but it left me some inspiration to stay in the method of assessing part of the experiment, later in the article given.


HighLights

  1. Time information and attention mechanisms : previous studies such as STF-RNN network (location independent heating value, the calorific value of the time point only) fitted together as a tuple neural network input cycle. MAP model is used and another idea is introduced attention model, used alone RNN process and save the output two-dimensional address information, time stamp information calculated attention places and generates weight vectors involved identity attention model.
  2. Interpretability of neural networks : article in the experimental section of the validity of the definition of time-stamped data, attention mechanisms, embedded dimensions discussed carefully, through a more intuitive way to visualize the results:
    • Attention mechanism to make the model more concerned about the time of the most recent information
    • Timestamps in hours and should be set to the best time to leave the place
    • Bottleneck dimensional model fitted with lifting For 24-hour
  3. Discretization measure : a data processing method and apparatus that limit the model locations are discrete and finite , and therefore the same evaluation model of the pedestrian trajectories before and prognosis of ADE FDE continuous index, respectively, accuracy, recall and F1- Score.


Future Work

  1. RNN try to use the more advanced units such as GRN, LSTM.
  2. Considerations include more information, such as the distance between the pedestrian interaction and locations.
  3. To overcome the model can not predict the unknown location of the problem, the introduction of probability predictions unknown location.


model

MAP model consists of three parts: the location information model (left part of the gray area), attention model (right part of the gray area), the classifier (on)

specification

Pedestrians, a given \ (W \) of \ (p_i = (l_i, t_i ) \ \ (1 \ le i \ le w) \) tuple - respectively represent the location and the time stamp indicates the last track sequence pedestrian .

  • Limited set in place, and therefore is already hot encoded coding (dimension N).
  • Dividing the time stamp in hours, which represents the time corresponding thereto are to leave the place, but also uses a hot encoded (dimension M).

This model is based on object \ (W \) information, pedestrian estimated next location: \ (P (I +. 1 menthoxypropane {} | P_i, ..., P_1) \)


Location Information Model

Location information model is the basic RNN structure is first \ (N \) dimension independent heating value locations through embedded matrix \ (Le \) generates \ (d_L \) -dimensional vector, then as each step input RNN participation coding, last RNN output (dimension \ (D_R \) ) as a summary vector computing model involved in attention and classifiers:

\ [Le_i l_i = \ With cdot, \ \ = r_i RNN (le_i; RNN W_ {}) \]

\[r_i \in \R^{d_r}\]

[Note]: Model subscript numbers in reverse order to \ (I \) ending until the \ (I-W. 1 + \) , so \ (r_i \) is the last output RNN.


Attention Model

Question

Please refer careful attention model structure diagram of the MAP clear, "the object of attention in the end who is?."

Be fitted after the stamp is formed of W embedding vectors instead RNN output model, the model region separated RNN that needs attention and belt mechanisms.

Attention Mechanism mechanisms with three indicators (Query, Key and Value) to describe the specific mechanisms of attention on this model:

  • Query: Summary Vector from RNN network \ (r_i \)
  • = = The Value Key \ (\ Omega \) : one hot stamp value after embedding processing by \ (W \) vectors, \ (te_i t_i = \ CDOT Te, \ \ \ {{IW TE_}, .., i \} = \ Omega \)

Attention weights by the Query and Key multiply points and normalized get:

\[\alpha =softmax(\Omega \cdot r_i) \ \ \alpha \in \R^{w \times 1}\]

Attention vector element-wise multiplication by weights and attention Value:

\[\eta = \alpha * \Omega \ \ \eta \in \R^{w \times d_r}\]


Classifier

Classifier essence is a comprehensive information model RNN encoder and attention two parts of the model, a simple linear transformation, and the probability that the predicted probability for each location with softmax compression. Here's comprehensive function reference article presents two, one is spliced, the other is adding, \ (W_F \) needs vary depending on the policy dimensions.

\[\hat y = softmax(F(r_i, \eta) \cdot W_F+b)\]


Optimize

Finally, optimization and model loss function optimization method used is ADADELTA, loss function based directly softmax continuous probability distribution is calculated output required when a discrete (or derivative discretization not reverse propagation), which assessment methods It is different.

\[Loss = - \Sigma^n_{i=1}y_i \cdot log(\hat y_i)\]

[Note]: The above formula is only one loss, i n traversed is limited locations, \ (y_i \) values and only the 0. 1, \ (\ Hat y_i \) is \ (\ hat y \) in the specific values.



Model Assessment

data set

MAP data sets compared to the previous model to predict the trajectory of a broader set of data, the time span in hours minimum unit, and also the location of discrete and limited.

  • Geolife: GPS device records when the original information, the author in the Log information into the track information, the first to use algorithms to detect in some areas, then use DBSCAN clustering algorithm ( \ (\ varepsilon = 100, \ MinPts = 3 \) ) form discrete fixed location, the position information is represented by only the pedestrian these fixed locations.
  • Gowalla: stamp data at a fixed location punch machine recorded.


Quantitative evaluation

  1. Evaluation: Since the discrete locations is limited , so is the use of prediction, and a probability distribution calculated softmax sample, which makes the prediction model and the continuous distribution of MAP before evaluation model (ADE, FDE) different.
    • Take the first N most likely location prediction set most locations \ (N L_ {,} U \) , the true location set \ (L_u \) .
    • Accuracy - Prediction Location set how much real hit : \ (Precision @ N = {. 1 \ over | the U-|} \ Sigma _ {U \ in the U-} {| L_u \ bigcap of P_ {N, U} | \ over | of P_ {N, U} |} \) .
    • Recall - real locations set number is predicted : \ (the Recall @ N = {. 1 \ over | the U-|} {\ Sigma_ {U \ in the U-}} {| L_u \ bigcap of P_ {N, U} | \ over | L_u |} \)
    • \(F1-Score@N = 2 \times {Precision@N \times Recall@N \over Precision@N + Recall@N}\)
  2. Summary conclusions (For details, see text)
    • RNN foundation to enhance the predictive power of the model, considering the time factor is added, to further enhance the ability to predict.
    • Data embedded (embedding that) significantly enhance the predictive ability of the model, since the buried layer can be a good model to extract latent semantic information .
    • MAP best performance ......


Neural Network of interpretable

Articles made a deep study may explain in terms of neural networks, to further strengthen the rationality of data definitions and model design.

Attentional mechanisms

First, the article "noticed" objects are time independent heating value intercalated layer was \ (W \) time embedded vectors , article discusses the calculation of attention weights \ (\ Alpha \) considerations different to note the impact force generated. Set known track length \ (W = 2 \) , two comparison are \ (\ alpha_1 = softamx (g (r_i, \ Omega)) \ \ VS \ \ \ alpha_2 = softmax (g (r_i)) \) Finally, explore the \ (\ alpha \) weight distribution, the following figure is considered space + time case1, case2 only consider space.

Conclusion : The time factor $ \ Omega \ (taking into account the resulting \) \ Alpha $ in line with human cognitive outcomes - are more concerned about the more recent time from the current time point embedded vector.

Hidden layer (embedding time) dimension optimum value

To ensure the correctness of the model dimension, and the dimension of the hidden layer embedded vector dimension consistent time, according to the experimental results, in \ (d_r = 24 \) around the peak, can result in higher or lower decreased predictive power. This model helps explain the effect of 24-hour study system, high or low dimension are formed in a period of time will result in defined time stamp (hour) do not match.

Defined time stamp

时间戳的定义由两方面问题,一是单位的定义(小时,时辰,天,月……),二是选择哪个时刻与对应地点相对应,经过对比,最终得出:

  • 小时制度效果更好。
  • 选择离开该地点的时刻作为时间戳效果更好,印证了“离开时刻对预测下一步地点最具影响力”的人为认知。

Guess you like

Origin www.cnblogs.com/sinoyou/p/11407732.html