Anomalib实战之一：自定义数据集

Anomalib支持多种格式的数据集，包括MVTec AD和BeanTech等最先进的异常检测基准数据集。对于希望在自定义数据集上使用该库的用户，anomalib还提供了一个Folder datamodule，可以从文件系统的文件夹中加载数据集。本文的目的是使用Folder datamodule在自定义数据集上训练anomalib模型。这将使用户能够灵活地使用anomalib来处理各种类型的数据，并进行异常检测任务。

本文中，我们将使用榛果玩具数据集。该数据集包含多个文件夹，每个文件夹包含一组图像。colour和crack文件夹代表两种缺陷。本文忽略mask文件夹。anomalib将使用colour文件夹中的所有图像作为验证数据集的一部分，然后随机划分good文件夹图像用于训练和验证。

Step 1: 安装Anomalib

pip install anomalib

Step 2: 搜集自定义数据

Anomalib支持多种图像扩展名，如".jpg"、“.jpeg”、“.png”、“.ppm”、“.bmp”、“.pgm”、“.tif”、“.tiff"和”.webp"。可以从具有任何这些扩展名的图像中收集数据集。

Step 3: 格式化数据

根据使用情况和收集方式，自定义数据集可以具有不同的格式：

包含好的和坏的图像的数据集。
包含好的和坏的图像以及用于像素级评估的掩码真值的数据集。
包含好的和坏的图像，并已经分为训练集和测试集的数据集。

anomalib的Folder datamodule可以处理这些用例。

Step 4: 修改配置文件

要运行Anomalib的训练，需要一个YAML配置文件。训练配置参数分为5个部分：数据集(dataset)、模型(model)、项目(project)、日志(logging)和训练器(trainer)。Anomalib中使用自定义数据集，只需要更改配置文件中的数据集部分即可。

这里选择Padim算法，复制示例配置文件并修改数据集部分。

cp anomalib/models/padim/config.yaml custom_padim.yaml

4-1 classification

数据目录

Hazelnut_toy
├── colour
│  ├── 00.jpg
│  ├── 01.jpg
│  ...
├── good
│  ├── 00.jpg
│  ├── 01.jpg
│  ...

分类数据集没有ground truth的二值化掩膜（mask），因此需要将task字段设置为classification而不是segmentation。同时将metrics部分的pixel部分注释掉。

# Replace the dataset configs with the following.
dataset:
  name: hazelnut
  format: folder
  path: ./datasets/hazelnut_toy
  normal_dir: good # name of the folder containing normal images.
  abnormal_dir: colour # name of the folder containing abnormal images.
  mask: null # optional
  normal_test_dir: null # name of the folder containing normal test images.
  task: classification # classification or segmentation
  extensions: null
  split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
  image_size: 256
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  test_split_mode: from_dir # options: [from_dir, synthetic]
  val_split_mode: same_as_test # options: [same_as_test, from_test, sythetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  transform_config:
    train: null
    val: null
  create_validation_set: true
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16
    
...

metrics:
  image:
    - F1Score
    - AUROC
#  pixel:
#    - F1Score
#    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

注意：每个dir值可以以列表的形式输入多个文件夹

4-2 segmentation

Anomalib不仅可以对零件进行缺陷分类，还可以用于缺陷分割。要实现这一点，只需在与good和colour文件夹相同的目录级别上添加一个名为mask的文件夹。该文件夹应包含colour文件夹中缺陷的二进制图像。
数据目录

Hazelnut_toy
├── colour
│  ├── 00.jpg
│  ├── 01.jpg
│  ...
├── good
│  ├── 00.jpg
│  ├── 01.jpg
└── mask
├── 00.jpg
├── 01.jpg
...

在配置文件中填充mask字段，并将任务更改为segmentation，即可使用Anomalib对缺陷进行分割。

# Replace the dataset configs with the following.
dataset:
  name: hazelnut
  format: folder
  path: ./datasets/hazelnut_toy
  normal_dir: good # name of the folder containing normal images.
  abnormal_dir: colour # name of the folder containing abnormal images.
  mask: mask # optional
  normal_test_dir: null # name of the folder containing normal test images.
  task: segmentation # classification or segmentation
  extensions: null
  split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
  image_size: 256
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  test_split_mode: from_dir # options: [from_dir, synthetic]
  val_split_mode: same_as_test # options: [same_as_test, from_test, sythetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  transform_config:
    train: null
    val: null
  create_validation_set: true
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16
    
...

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
  	- F1Score
	- AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

Step 5: 训练

python tools/train.py --config custom_padim.yaml

Step 6: 可视化

6-1 Logging and Experiment Management

Anomalib提供了几种记录和跟踪实验的方式。这些可以单独或组合使用。
如要选择保存预测图像的结果，请将配置文件中可视化部分的log_images参数更改为true。

results
└── padim
    └── Hazelnut_toy
        ├── images
        │   ├── colour
        │   │   ├── 00.jpg
        │   │   ├── 01.jpg
        │   │   └── ...
        │   └── good
        │       ├── 00.jpg
        │       ├── 01.jpg
        │       └── ...
        └── weights
            └── model.ckpt

6-2 Logging to Tensorboard and/or W&B

要使用TensorBoard和/或W&B记录器和/或Comet记录器，请确保在配置文件的日志部分将logger参数设置为comet、tensorboard、wandb或[tensorboard，wandb]。下图显示了保存到TensorBoard的示例配置。

visualization:
    show_images: False # show images on the screen
    save_images: False # save images to the file system
    log_images: True # log images to the available loggers (if any)
    image_save_path: null # path to which images will be saved
    mode: full # options: ["full", "simple"]

    logging:
    logger: [comet, tensorboard, wandb] #Choose any combination of these 3
    log_graph: false