8.18 Transformer series of articles to read

STGM: Spatio-Temporal Graph Mixformer for Traffic Forecasting

Insert image description here
A spatiotemporal graph mixer (STGM) network is proposed, which is a highly optimized model with low memory footprint. We address the above limitations by leveraging a novel attention mechanism to capture the correlation between temporal and spatial dependencies. Specifically, we use convolutional layers with variable fields of view per head to capture long- and short-term temporal dependencies. Additionally, we trained an estimator model that represents the node's contribution to the desired prediction. The estimated value is fed back to the attention mechanism together with the distance matrix. At the same time, we use gating mechanisms and mixer layers to further select and merge different perspectives.
code address

Attention Is Not All You Need Anymore

Insert image description here
Replace attention with an Extractor
Insert image description here

LEARNING IMAGE DERAINING TRANSFORMER NETWORK WITH DYNAMIC DUAL SELF-ATTENTION

Insert image description here
The first module is that one is complete and the other is sparse, merging the two together. The
second module belongs to multi-scale feature extraction.

SST: A Simplified Swin Transformer-based Model for Taxi Destination Prediction based on Existing Trajectory

Insert image description here
This article does not use the sliding window of swin transformer, but uses patches from small to large.

Guess you like

Origin blog.csdn.net/qq_45745941/article/details/132354402