Deep learning paper: FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows and its PyTorch implementation

Deep learning paper: FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows and its PyTorch implementation
FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows
PDF: https://arxiv.org/pdf/2111.07677v2.pdf
PyTorch code: https ://github.com/shanglianlm0525/CvPytorch
PyTorch code: https://github.com/shanglianlm0525/PyTorch-Networks

1 Overview

Most existing representation-based methods use deep convolutional neural networks to extract normal image features and characterize the corresponding distributions through non-parametric distribution estimation methods. The anomaly score is calculated by measuring the distance between the features of the test image and the estimated distribution. However, current methods cannot effectively map image features to a tractable underlying distribution and ignore the relationship between local and global features necessary to identify anomalies. To this end, FastFlow implemented using 2D regularized flows is proposed and used as a probability distribution estimator. The proposed FastFlow solves the problem that the original one-dimensional normalized flow model destroys the inherent spatial position relationship of the two-dimensional image and limits the capabilities of the flow model. At the same time, the complexity of the inference is very high, which limits the practical value.
Insert image description here
FastFlow is a probability distribution estimator based on two-dimensional regularized flow. It can be used as a plug-in module with any deep feature extractor such as ResNet and Vision Transformer for unsupervised anomaly detection and localization. During the training phase, FastFlow learns to convert the input visual features into a processable distribution, and during the inference phase it evaluates the likelihood of identifying anomalies.

2 FastFlow

The FastFlow structure is as follows:
Insert image description here

2-1 Feature Extractor

Feature Extractor extracts representative features from the input image through ResNet or visual transformer. An important challenge in anomaly detection tasks is the grasp of global relationships to distinguish anomaly areas from other local parts. Therefore, when using the visual transformer (ViT) as a feature extractor, it is possible to only use features of a specific layer because ViT has a stronger ability to capture the relationship between local patches and global features. But for ResNet, you need to directly use the features of the last layer in the first three blocks and put these features into three corresponding FastFlow models.

2-2 2D Flow Model

Treat X as a two-dimensional data distribution, use 2D normalizing flow to transform it into a standard normal distribution, and obtain an H×W×D dimensional latent space matrix Z . The two-dimensional regularization flow is a reversible, efficient, and parallel transformation. It uses a special regularization flow called RealNVP, which can achieve reversible, efficient, and parallel transformation. RealNVP uses a masking technique to divide the input vector into two parts, one part remains unchanged, and the other part is added to the previous part through an affine transformation. Then swap the positions of the two parts and repeat this process multiple times to get the final output vector.

In inference, the features of abnormal images should be outside the distribution and therefore have lower likelihood than normal images, and the likelihood can be used as an abnormality score. Specifically, the 2D probabilities for each channel are summed to obtain a final probability map, which is upsampled to the resolution of the input image using bilinear interpolation.

The PyTorch code is as follows:

FastFlow(
  (feature_extractor): FeatureListNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act1): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act1): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act2): ReLU(inplace=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act1): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act2): ReLU(inplace=True)
      )
    )
    (layer2): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act1): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act2): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act1): ReLU(inplace=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act2): ReLU(inplace=True)
      )
    )
    (layer3): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act1): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act2): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act1): ReLU(inplace=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act2): ReLU(inplace=True)
      )
    )
  )
  (norms): ModuleList(
    (0): LayerNorm((64, 64, 64), eps=1e-05, elementwise_affine=True)
    (1): LayerNorm((128, 32, 32), eps=1e-05, elementwise_affine=True)
    (2): LayerNorm((256, 16, 16), eps=1e-05, elementwise_affine=True)
  )
  (nf_flows): ModuleList(
    (0): SequenceINN(
      (module_list): ModuleList(
        (0): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (1): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (2): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (3): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (4): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (5): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (6): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (7): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
      )
    )
    (1): SequenceINN(
      (module_list): ModuleList(
        (0): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (1): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (2): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (3): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (4): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (5): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (6): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (7): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
      )
    )
    (2): SequenceINN(
      (module_list): ModuleList(
        (0): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (1): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (2): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (3): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (4): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (5): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (6): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
        (7): AllInOneBlock(
          (softplus): Softplus(beta=0.5, threshold=20)
          (subnet): Sequential(
            (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
            (1): ReLU()
            (2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=same)
          )
        )
      )
    )
  )
)

3 Experiments

Quantitative Results
Insert image description here
Complexity Analysis
Insert image description here

Guess you like

Origin blog.csdn.net/shanglianlm/article/details/133308147
Recommended