Check out the teacher's video and blog method to go through the details
content
VisionTransformer[VIT]
thinking about
Model interpreting go to these reference links to learn
Inductive bias
MAE_link:
Self-pretrained method don't need people to label dataset ,that's an effervescent work.
DeTR
Detr and vit
What is the difference and connection between detr and vit, can they be integrated together? - Know the difference and connection between detr and vit, can you integrate them together? - Know almost
References
// simple version
//full version
11.1 Vision Transformer (vit) Network Detailed Explanation CSDN blog https://blog.csdn.net/lgzlgz3102/article/details/109140622
//Overall supplement, very detailed
Intensive reading of the ViT paper paragraph by paragraph [Intensive reading of the paper] _ beep beep _bilibili //Supporting video
Intensive reading of ViT thesis paragraph by paragraph [Intensive reading of the paper] - bilibili.com //Supporting Notes
Explanation of each module of Yolo series
A complete explanation of the core basic knowledge of Yolov5 of the Yolo series - Zhihu (zhihu.com)
// Multiple interaction layers Multi-Head Attention (MSA) and Multi-Layer Perceptron (MLP)
//encoder, the handwritten explanation is very clear
//Why choose layer normalization