Yolov5 lightweight: CVPR2023|RIFormer: a minimalist ViT architecture that can achieve SOTA performance without TokenMixer

 1. Introduction to RIFormer

 Paper: https://arxiv.org/pdf/2304.05659.pdf

        This paper proposes the RepIdentityFormer scheme based on the heavy parameter mechanism to study the architecture system without Token Mixer . Then, the author improved the learning architecture to break the limitations of the Token Mixer-free architecture and summarized the optimization strategy. Coupled with the optimization strategy mentioned above, this paper constructs an extremely simple and excellent performance visual backbone, and it also has the advantage of high inference efficiency. 

 Why do you do this?

        Token Mixer is a very important component of the ViT backbone. It is used to adaptively aggregate location information in different airspaces. However, conventional self-attention often suffers from high computational complexity and high latency. Removing Token Mixer directly will lead to incomplete structural priors, which will lead to serious performance degradation.

        Token Mixer is a key module for spatial information aggregation in the ViT architecture, but due to the use of self-attention mechanism, its calculation amount, memory consumption and image size are strongly related

         Heavy parameter methods have been widely used in various fields. The TokenMixer module of RIFormer reasoning can be regarded as a combination of LN+Identity

Guess you like

Origin blog.csdn.net/m0_63774211/article/details/131105525