Article Directory
Summary
Paper link: https://arxiv.org/abs/2303.08810
Code link: https://github.com/rayleizhu/BiFormer
As the core building block of vision transformers, attention is a powerful tool for capturing long-range dependencies. However, this power comes at a price: it imposes a huge computational burden and memory footprint, since all space