Anomalib practical one: custom data set

Anomalib supports datasets in multiple formats, including state-of-the-art anomaly detection benchmark datasets such as MVTec AD and BeanTech. For users who wish to use the library on custom datasets, anomalib also provides a Folder datamodule that can load datasets from a folder on the file system. The purpose of this article is to use the Folder datamodule to train anomalib model on a custom dataset. This will enable users to use anomalib flexibly to handle various types of data and perform anomaly detection tasks.

In this article, we will use the Hazelnut Toys dataset . The dataset contains multiple folders, each containing a set of images. The color and crack folders represent two types of defects. This article ignores the mask folder. anomalib will use all images in the color folder as part of the validation dataset, and then randomly divide the good folder images for training and validation.

Step 1: Install Anomalib

pip install anomalib

Step 2: Collect custom data

Anomalib supports a variety of image extensions, such as ".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff" and " .webp". Datasets can be collected from images with any of these extensions.

Step 3: Format data

Custom datasets can have different formats depending on usage and how they are collected:

  1. A dataset containing good and bad images.
  2. A dataset containing good and bad images and ground truth masks for pixel-level evaluation.
  3. A dataset that contains good and bad images and has been split into training and test sets.

anomalib's Folder datamodule can handle these use cases.

Step 4: Modify configuration file

To run Anomalib training, a YAML configuration file is required. Training configuration parameters are divided into 5 parts: dataset, model, project, logging and trainer. To use a custom data set in Anomalib, you only need to change the data set part in the configuration file.

Select the Padim algorithm here, copy the sample configuration file and modify the data set part.

cp anomalib/models/padim/config.yaml custom_padim.yaml

4-1 classification

data directory

Hazelnut_toy
├── colour
│  ├── 00.jpg
│  ├── 01.jpg
│  ...
├── good
│  ├── 00.jpg
│  ├── 01.jpg
│  ...

The classification data set does not have a binary mask of ground truth, so the task field needs to be set to classification instead of segmentation. At the same time, comment out the pixel part of the metrics part.

# Replace the dataset configs with the following.
dataset:
  name: hazelnut
  format: folder
  path: ./datasets/hazelnut_toy
  normal_dir: good # name of the folder containing normal images.
  abnormal_dir: colour # name of the folder containing abnormal images.
  mask: null # optional
  normal_test_dir: null # name of the folder containing normal test images.
  task: classification # classification or segmentation
  extensions: null
  split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
  image_size: 256
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  test_split_mode: from_dir # options: [from_dir, synthetic]
  val_split_mode: same_as_test # options: [same_as_test, from_test, sythetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  transform_config:
    train: null
    val: null
  create_validation_set: true
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16
    
...

metrics:
  image:
    - F1Score
    - AUROC
#  pixel:
#    - F1Score
#    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

Note: Each dir value can be entered as a list of multiple folders

4-2 segmentation

Anomalib can not only classify defects in parts, but can also be used for defect segmentation. To achieve this, just add a folder called mask on the same directory level as the good and color folders. This folder should contain the binary image of the defect in the color folder.
data directory

Hazelnut_toy
├── colour
│  ├── 00.jpg
│  ├── 01.jpg
│  ...
├── good
│  ├── 00.jpg
│  ├── 01.jpg
└── mask
├── 00.jpg
├── 01.jpg
...

Fill in the mask field in the configuration file and change the task to segmentation to use Anomalib to segment defects.

# Replace the dataset configs with the following.
dataset:
  name: hazelnut
  format: folder
  path: ./datasets/hazelnut_toy
  normal_dir: good # name of the folder containing normal images.
  abnormal_dir: colour # name of the folder containing abnormal images.
  mask: mask # optional
  normal_test_dir: null # name of the folder containing normal test images.
  task: segmentation # classification or segmentation
  extensions: null
  split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
  image_size: 256
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  test_split_mode: from_dir # options: [from_dir, synthetic]
  val_split_mode: same_as_test # options: [same_as_test, from_test, sythetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  transform_config:
    train: null
    val: null
  create_validation_set: true
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16
    
...

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
  	- F1Score
	- AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

Step 5: Training

python tools/train.py --config custom_padim.yaml

Step 6: Visualization

6-1 Logging and Experiment Management

Anomalib provides several ways to record and track experiments. These can be used individually or in combination.
To choose to save the results of predicted images, change the log_images parameter in the visualization section of the configuration file to true.

results
└── padim
    └── Hazelnut_toy
        ├── images
        │   ├── colour
        │   │   ├── 00.jpg
        │   │   ├── 01.jpg
        │   │   └── ...
        │   └── good
        │       ├── 00.jpg
        │       ├── 01.jpg
        │       └── ...
        └── weights
            └── model.ckpt

6-2 Logging to Tensorboard and/or W&B

To use TensorBoard and/or the W&B logger and/or the Comet logger, make sure the logger parameter is set to comet, tensorboard, wandb, or [tensorboard, wandb] in the log section of the configuration file. The image below shows an example configuration saved to TensorBoard.

visualization:
    show_images: False # show images on the screen
    save_images: False # save images to the file system
    log_images: True # log images to the available loggers (if any)
    image_save_path: null # path to which images will be saved
    mode: full # options: ["full", "simple"]

    logging:
    logger: [comet, tensorboard, wandb] #Choose any combination of these 3
    log_graph: false

Guess you like

Origin blog.csdn.net/shanglianlm/article/details/132845636