[Computer Vision | CNN] Collection of common algorithms introduction to Image Model Blocks (2)

一、ShuffleNet Block

The ShuffleNet block is an image model block that leverages channel shuffling operations along with deep convolutions to enable efficient architectural design. It is proposed as part of the ShuffleNet architecture. The starting point is the residual block unit in ResNets, which is then modified using pointwise group convolution and channel shuffling operations.

Insert image description here

二、Efficient Spatial Pyramid

Efficient Spatial Pyramid (ESP) is an image model nugget based on the decomposition principle, which decomposes a standard convolution into two steps: (1) pointwise convolution and (2) dilated convolution of the spatial pyramid. Pointwise convolutions help reduce the computational effort, while the spatial pyramid of dilated convolutions resamples feature maps to learn representations from a large effective receptive field. This improves efficiency compared to other image modules such as the ResNeXt module and the Inception module.

Insert image description here

三、Hourglass Module

The hourglass module is an image patch module mainly used for pose estimation tasks. The hourglass design is motivated by the need to capture information at all scales. While local evidence is crucial for identifying features such as faces and hands, the final pose estimation requires a coherent understanding of the whole body. A person's orientation, the arrangement of limbs, and the relationship of adjacent joints are among the many cues best recognized at different scales in an image. Hourglass is a simple, minimal design capable of capturing all of these features and combining them together to output pixel-level predictions.

The network must have some mechanism to efficiently process and integrate features across scales. Hourglass uses a single pipeline with skip layers to preserve spatial information at each resolution. The network reaches a minimum resolution of 4x4 pixels, allowing the application of smaller spatial filters to compare features across the entire space of the image.

The setup of the hourglass is as follows: convolutional and max-pooling layers are used to process features to very low resolution. At each max-pooling step, the network branches and applies more convolutions at the original pre-pooling resolution. After reaching the lowest resolution, the network starts top-down upsampling of the sequence and feature combination across scales. To pool information from two adjacent resolutions, we perform nearest neighbor upsampling on the lower resolution and then perform element-wise summation of the two sets of features. The topology of an hourglass is symmetrical, so for every layer going down, there is a corresponding layer going up.

After the network's output resolution is reached, two consecutive rounds of 1x1 convolutions are applied to produce the final network predictions. The output of the network is a set of heatmaps, where for a given heatmap, the network predicts the probability that a joint is present at each pixel.

Insert image description here

四、SRGAN Residual Block

SRGAN Residual Block is the residual block used in the SRGAN generator for image super-resolution. It is similar to the standard residual block, although it uses the PReLU activation function to aid training (preventing sparse gradients during GAN training).

Insert image description here

5. Reduction-A

Reduction-A is an image model block used in the Inception-v4 architecture.

Insert image description here

6. Ghost Module

Ghost modules are image patches for convolutional neural networks designed to generate more features by using fewer parameters. Specifically, ordinary convolutional layers in deep neural networks are divided into two parts. The first part involves ordinary convolutions, but their total number is controlled. Given the intrinsic feature maps of the first part, a series of simple linear operations are applied to generate more feature maps.

Insert image description here
Insert image description here
Insert image description here

七、ENet Initial Block

The ENet initial block is the image model block used in the ENet semantic segmentation architecture. Max Pooling is performed using non-overlapping 2 × 2 windows, and the convolution has 13 filters, resulting in a total of 16 feature maps after concatenation. This is largely inspired by the Inception module.

Insert image description here

8. ENet Bottleneck

ENet Bottleneck is an image model nugget used in the ENet semantic segmentation architecture. Each block consists of three convolutional layers: dimensionally reduced 1 × 1 projection, main convolutional layer, and 1 × 1 expansion. We place Batch Normalization and PReLU between all convolutions. If the bottleneck is downsampling, add a max-pooling layer to the main branch. Additionally, the first 1 × 1 projection is replaced by a 2 × 2 convolution with stride 2 in both dimensions. We zero-pad the activations to match the number of feature maps.

Insert image description here

九、ENet Dilated Bottleneck

ENet Dilated Bottleneck is an image model block used in the ENet semantic segmentation architecture. It is the same as the regular ENet Bottleneck, but uses dilated convolutions.

Insert image description here

10. Res2Net Block

The Res2Net block is an image model block that builds hierarchical residual-like connections within a single residual block. It is proposed as part of the Res2Net CNN architecture.

This block represents multi-scale features at the granular level and increases the receptive field range of each network layer. This filter channel is replaced with a set of smaller filter banks, each filter bank having channels. These smaller filter banks are connected in a hierarchical residual-like manner to increase the number of scales that the output features can represent. Specifically, we divide the input feature maps into several groups. A set of filters first extracts features from a set of input feature maps. The output features of the previous set are then sent to the next set of filters along with another set of input feature maps.

This process is repeated several times until all input feature maps have been processed. Finally, the feature maps of all groups are concatenated and sent to another set of filters to fully fuse the information. With any possible path from an input feature to an output feature, each time it passes through a filter, many equivalent feature scales are produced due to combinatorial effects.

One way to think about these blocks is that they expose new dimensions, scale, in addition to the existing dimensions of depth, width, and cardinality.

Insert image description here

11. Ghost Bottleneck

Ghost BottleNeck is a skip connection block, similar to the basic residual block in ResNet, which integrates multiple convolutional layers and shortcuts, but instead stacks Ghost modules (two stacked Ghost modules). It is proposed as part of the GhostNet CNN architecture.

The first Ghost module acts as an expansion layer, increasing the number of channels. The ratio of the number of output channels to the number of input channels is called the expansion ratio. The second Ghost module reduces the number of channels to match the shortcut paths. Then connect the shortcut between the input and output of the two Ghost modules. Batch Normalization (BN) and ReLU non-linearity are applied after each layer, but ReLU is not used after the second Ghost module as recommended by MobileNetV2. The above Ghost bottleneck is for the case of stride=1. For the case of stride=2, the shortcut is implemented by a downsampling layer, and a depth convolution of stride=2 is inserted between the two Ghost modules. In fact, the main convolution in the Ghost module here is point-wise convolution to improve its efficiency.

Insert image description here

12. ShuffleNet V2 Block

huffleNet V2 Block is an image model block used in the ShuffleNet V2 architecture, where speed is the optimized metric (rather than an indirect metric like FLOPs). It uses a simple operator called channel splitting. At the beginning of each unit, the input
feature channel is divided into two branches, respectively. After G3, one branch remains as the identity. The other branch consists of three convolutions with the same input and output channels to satisfy G1. They are different from the original ShuffleNet in that the convolutions are no longer grouped. This is partly to follow G2 and partly because the split operation already produces two groups. After convolution, the two branches are connected. Therefore, the number of channels remains unchanged (G1). The same "channel shuffling" operation as in ShuffleNet is then used to implement information communication between the two branches.

The motivation behind channel splitting is that alternative architectures using pointwise group convolutions and bottleneck structures result in increased memory access costs. Additionally, more network fragmentation with group convolutions reduces parallelism (less GPU-friendly), and element-wise addition operations have lower FLOPs but higher memory access costs. Channel splitting is an alternative where we can maintain a large number of channels of equal width (which minimizes memory access costs) without using dense convolutions or too many groups.

Insert image description here

Thirteen, Split Attention

Insert image description here
Insert image description here
Insert image description here

14. Selective Kernel

The Selective Kernel unit is a bottleneck block composed of a series of 1×1 convolutions, SK convolutions and 1×1 convolutions. It is proposed as part of the SKNet CNN architecture. In general, all large kernel convolutions in the original bottleneck block in ResNeXt are replaced by the proposed SK convolutions, enabling the network to choose an appropriate receptive field size in an adaptive manner.

Insert image description here
Insert image description here

15. DPN Block

Dual-path network blocks are image model blocks used in convolutional neural networks. The idea of ​​this module is to enable the sharing of common functionality while maintaining the flexibility to explore new functionality through a dual-path architecture. In this sense, it combines the advantages of ResNets and DenseNets. It is proposed as part of the DPN CNN architecture.

We formulate such a dual-path architecture as follows:

Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/wzk4869/article/details/132911433