8. Rough reading of paper No. 17

MixMatch: A Holistic Approach to Semi-Supervised Learning(2019)

Insert image description here
Insert image description here
sharpening formula
Insert image description here

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features(2019)

Insert image description here
Insert image description here
pictures mixed up

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Insert image description here
Insert image description here
Update and exchange of two low- and high-frequency information

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Insert image description here
Insert image description here
Is the nested UNet

Bootstrap Your Own Latent A New Approach to Self-Supervised Learning

Insert image description here
Use the average teacher model to perform losses on two different outputs

CBAM: Convolutional Block Attention Module

Insert image description here

Insert image description here
Insert image description here

class GhostModule(nn.Module):
    def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):
        super(GhostModule, self).__init__()
        self.oup = oup
        init_channels = math.ceil(oup / ratio)
        new_channels = init_channels*(ratio-1)

        self.primary_conv = nn.Sequential(
            nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False),
            nn.BatchNorm2d(init_channels),
            nn.ReLU(inplace=True) if relu else nn.Sequential(),
        )

        self.cheap_operation = nn.Sequential(
            nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False),
            nn.BatchNorm2d(new_channels),
            nn.ReLU(inplace=True) if relu else nn.Sequential(),
        )

    def forward(self, x):
        x1 = self.primary_conv(x)
        x2 = self.cheap_operation(x1)
        out = torch.cat([x1,x2], dim=1)
        return out[:,:self.oup,:,:]

It is easier to understand through the code. The essence of this is to reduce the parameter
code address of the convolution.

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Insert image description here
It is a loss of their probability distribution using cross-entropy computers with different enhancements.

Res2Net: A New Multi-scale Backbone Architecture(2019)

Insert image description here
increase multi-scale

Barlow Twins: Self-Supervised Learning via Redundancy Reduction(2021)

Insert image description here
In fact, it is relatively simple. It is the loss of the result of different transformations of the same image going through the same network.

Emerging Properties in Self-Supervised Vision Transformers(2021)

Insert image description here
Insert image description here
After different enhanced versions, the teacher network is then averaged to calculate the loss

MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE,AND MOBILE-FRIENDLY VISION TRANSFORMER(2022)

Insert image description here
It is equivalent to adding convolution to reduce the number of parameters.

Supervised Contrastive Learning(2020)

Insert image description here
Probably, self-supervised comparative learning treats another dog as a negative example, and supervised solution to this problem

RepVGG: Making VGG-style ConvNets Great Again(2021)

Insert image description here
Just look at the Internet

Pay Attention to MLPs(2021)

Insert image description here
A bit like the idea of ​​MLP-mixer

Dual Path Networks(2017)

Insert image description here
I don't know if it is based on the two branches of channel division

Visual Attention Network(2022)

Insert image description here
Insert image description here
Insert image description here

PVT v2: Improved Baselines with Pyramid Vision Transformer(2021)

Insert image description here
Insert image description here

Swin Transformer V2: Scaling Up Capacity and Resolution

Insert image description here
It’s just that the calculation method of qkv has changed.

MetaFormer Is Actually What You Need for Vision(2022)

Insert image description here
Insert image description here

CvT: Introducing Convolutions to Vision Transformers(2021)

Insert image description here
Insert image description here
Some use MLP to generate tokens, and some use convolution. When using convolution, pay attention to dimension transformation.

Guess you like

Origin blog.csdn.net/qq_45745941/article/details/132336518