Article directory
1. GreedyNAS-C
GreedyNAS-C is a convolutional neural network discovered using the GreedyNAS neural architecture search method. The basic building blocks used are the inverse residual block (from MobileNetV2) and the squeeze and excitation block.
2. RegionViT
RegionViT consists of two tokenization processes that convert images into region tokens (upper path) and local tokens (lower path). Each tokenization is a convolution with a different patch size, the patch size for region labeling is 2 8 2 28^2282although 42 4^242 local tag CCfor size projection asC , which means a zone token covers7 2 7^272Local labeling based on spatial locality, resulting in a window size of7 2 7^272 . In phase 1, two sets of tokens are passed through the proposed region-to-native converter encoder. However, in the later stage, in order to balance the computational load and obtain feature maps of different resolutions, the method uses a downsampling process to halve the spatial resolution while doubling the channel size on regional and local markers before entering the next stage. . Finally, at the end of the network, it simply averages the remaining region labels as the final embedding for classification, while detection uses all local labels at each stage as it provides more fine-grained location information. Through the pyramid structure, ViT can generate multi-scale features and thus can be easily extended to more vision applications, such as object detection, not just image classification.
3. DenseNAS-B
DenseNAS-B is a mobile convolutional neural network discovered through the DenseNAS neural architecture search method. The basic building block is MBConvs (or reverse bottleneck residuals) in the MobileNet architecture.
4. DenseNAS-C
DenseNAS-C is a mobile convolutional neural network discovered through the DenseNAS neural architecture search method. The basic building block is MBConvs (or reverse bottleneck residuals) in the MobileNet architecture.
5. DiCENet
DiCENet is a convolutional neural network architecture that utilizes dimensional convolution (and dimensional fusion). Dimensional convolution applies lightweight convolutional filtering on each dimension of the input tensor, while dimensional fusion effectively combines these dimensional representations; allowing DiCE units in the network to efficiently encode the spatial and channel information contained in the input tensor.
6. uNetXST
uNet neural network architecture, taking multiple (X) tensors as input and including spatial transformation units (ST)
7. CSPPeleeNet
CSPPeleeNet is a convolutional neural network and object detection backbone, and we apply the cross-stage partial network (CSPNet) method to PeleeNet. CSPNet divides the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. Using a split and merge strategy allows more gradients to flow through the network.
8. PocketNet
PocketNet is a family of face recognition models discovered through neural architecture search. Training is based on multi-step knowledge distillation.
9. OODformer
OODformer is a Transformer-based OOD detection architecture that leverages the contextualization capabilities of Transformer. Using the transformer as the main feature extractor enables the exploitation of object concepts and their discriminative properties as well as their co-occurrence through visual attention.
OODformer uses ViT and its data-efficient variant DeiT. Each encoder layer consists of multi-head self-attention and multi-layer perceptual blocks. The combination of MSA and MLP layers in the encoder jointly encodes attribute importance, association relevance and co-occurrence. [class] tags (representations of images) integrate multiple attributes and their associated characteristics through a global context. The [class] tag of the last layer is used for OOD detection in two ways; firstly, it is passed for softmax confidence score and secondly for latent space distance calculation.
10. DeepSIM
DeepSIM is a generative model for conditional image processing based on a single image. The network learns to map the original representation of the image to the image itself. In operation, the generator allows complex image changes to be made by modifying the original input representation and mapping it through the network. The choice of raw representation affects the ease and expressiveness of the operation, and can be automatic (e.g. edges), manual, or hybrid (e.g. edges splitting the top).
十一、Conditional Position Encoding Vision Transformer(CPVT)
CPVT (Conditional Positional Coding Visual Transformer) is a visual transformer that utilizes conditional positional coding. Apart from the new encoding, it follows the same architecture as ViT and DeiT.
12. ESPNetv2
ESPNetv2 is a convolutional neural network that utilizes sets of point-wise and depthwise dilated separable convolutions to learn representations from a large effective receptive field with fewer FLOPs and parameters.
Thirteen, Shuffle Transformer
The Shuffle Transformer module consists of the Shuffle Multi-Head Self-Attention module (ShuffleMHSA), Neighbor-Window Connection module (NWC) and MLP module. In order to introduce cross-window connections while maintaining efficient computation of non-overlapping windows, a strategy of using WMSA and Shuffle-WMSA alternately in consecutive Shuffle Transformer blocks is proposed. The first window-based Transformer block uses a regular window partitioning strategy, and the second window-based Transformer block uses window-based self-attention and spatial shuffling. In addition, a neighbor window connection module (NWC) is added in each block to enhance the connection between neighbor windows. Therefore, the proposed shuffling transformer block can build rich cross-window connections and enhance representation. Finally, successive Shuffle Transformer blocks are calculated as follows:
14. ECA-Net
ECA-Net is a convolutional neural network that utilizes efficient channel attention modules.
15. CSPDenseNet
CSPDenseNet is a convolutional neural network and object detection backbone, and we apply the cross-stage partial network (CSPNet) method to DenseNet. CSPNet divides the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. Using a split and merge strategy allows more gradients to flow through the network.