Article directory
-
- 1. Inception-ResNet-v2-C
- 2. Inception-ResNet-v2-B
- 3. Inception-ResNet-v2-A
- 四、Inception-ResNet-v2 Reduction-B
- 五、Convolutional Block Attention Module
- 六、Efficient Channel Attention
- 七、ShuffleNet V2 Downsampling Block
- 8. FBNet Block
- 九、Neural Attention Fields
- Ten, Inception-A
- 11. Inception-B
- 12. Inception-C
- Thirteen, Reduction-B
- 14. Global Context Block
- Fifteenth, One-Shot Aggregation
- 16. Pyramidal Residual Unit
- 十七、Pyramidal Bottleneck Residual Unit
- 18. Fractal Block
- 十九、Transformer in Transformer
- 20. Dilated Bottleneck Block
1. Inception-ResNet-v2-C
Inception-ResNet-v2-C is an image model nugget of 8 x 8 grid used in the Inception-ResNet-v2 architecture. It largely follows the ideas of Inception modules and grouped convolutions, but also includes residual connections.
2. Inception-ResNet-v2-B
Inception-ResNet-v2-B is an image model nugget of 17 x 17 grid used in the Inception-ResNet-v2 architecture. It largely follows the ideas of Inception modules and grouped convolutions, but also includes residual connections.
3. Inception-ResNet-v2-A
Inception-ResNet-v2-A is an image model nugget of 35 x 35 grid used in the Inception-ResNet-v2 architecture.
四、Inception-ResNet-v2 Reduction-B
Inception-ResNet-v2 Reduction-B is the image model block used in the Inception-ResNet-v2 architecture.
五、Convolutional Block Attention Module
Convolutional Block Attention Module (CBAM) is the attention module of convolutional neural networks. Given an intermediate feature map, this module sequentially infers attention maps along two independent dimensions (channel and spatial) and then multiplies the attention maps by the input feature maps for adaptive feature refinement.
六、Efficient Channel Attention
Efficient Channel Attention is an architectural unit based on squeeze and excitation blocks that can reduce model complexity without reducing dimensionality. It is proposed as part of the ECA-Net CNN architecture.
After channel-level global average pooling without dimensionality reduction, ECA captures local cross-channel interactions by considering each channel and its k-neighbors. ECA can be implemented quickly and efficiently by implementing 1D size convolution k, where the kernel size k represents the coverage of local cross-channel interactions, i.e. how many neighbors participate in the attention prediction of one channel.
七、ShuffleNet V2 Downsampling Block
ShuffleNet V2 Downsampling Block is the spatial downsampling block used in the ShuffleNet V2 architecture. Unlike regular ShuffleNet V2 blocks, the channel splitting operator is removed, so the number of output channels is doubled.
8. FBNet Block
FBNet Block is an image model block used in the FBNet architecture discovered through DNAS Neural Architecture Search. The basic building blocks used are depthwise convolutions and residual connections.
九、Neural Attention Fields
NEAT, or Neural Attention Domain, is the feature representation of the end-to-end imitation learning model. NEAT is a continuous function that maps positions in bird's-eye view (BEV) scene coordinates to waypoints and semantics, using intermediate attention maps to iteratively compress high-dimensional 2D image features into compact representations. This enables the model to selectively focus on relevant regions in the input while ignoring information irrelevant to the driving task, effectively associating images with BEV representations. Furthermore, visualizing the model’s attention map using NEAT intermediate representation can improve interpretability.
Ten, Inception-A
Inception-A is the image model block used in the Inception-v4 architecture.
11. Inception-B
Inception-B is the image model block used in the Inception-v4 architecture.
12. Inception-C
Inception-C is the image model block used in the Inception-v4 architecture.
Thirteen, Reduction-B
Reduction-B is the image model block used in the Inception-v4 architecture.
14. Global Context Block
Global context blocks are image model blocks used for global context modeling. The aim is to combine the advantages of a simplified non-local block with efficient modeling of long-range dependencies and a squeeze excitation block with lightweight computation.
Fifteenth, One-Shot Aggregation
One-Shot Aggregation is an image model nugget that replaces dense blocks by aggregating intermediate features. It is proposed as part of the VoVNet architecture. Bidirectional connections are used between each convolutional layer. One way is connected to subsequent layers to produce features with larger receptive fields, while the other way is aggregated only once into the final output feature map. The difference from DenseNet is that the output of each layer is not routed to all subsequent intermediate layers, which makes the input size of the intermediate layers constant.
16. Pyramidal Residual Unit
A pyramid residual unit is a type of residual unit in which the number of channels gradually increases with the depth at which the layer occurs, similar to a pyramid structure whose shape gradually becomes wider from the top downwards. It was introduced as part of the PyramidNet architecture.
十七、Pyramidal Bottleneck Residual Unit
A pyramid bottleneck residual unit is a residual unit in which the number of channels gradually increases with the depth at which the layer occurs, similar to a pyramid structure whose shape gradually becomes wider from top to bottom. It also contains the bottleneck of using 1x1 convolutions. It was introduced as part of the PyramidNet architecture.
18. Fractal Block
十九、Transformer in Transformer
Transformer is a self-attention-based neural network originally applied to NLP tasks. Recently, purely transformer-based models have been proposed to solve computer vision problems. These visual transformers typically treat images as a series of patches, ignoring the intrinsic structural information inside each patch. In this paper, we propose a novel Transformer-iN-Transformer (TNT) model for modeling block-level and pixel-level representations. In each TNT block, the outer transformer block is used to process patch embeddings, and the inner transformer block extracts local features from pixel embeddings. Pixel-level features are projected into the patch embedding space through a linear transformation layer and then added to the patch. By stacking TNT blocks, we build a TNT model for image recognition.
20. Dilated Bottleneck Block
Dilated Bottleneck Block is the image model block used in the DetNet convolutional neural network architecture. It adopts a bottleneck structure with dilated convolution to effectively expand the receptive field.