Anomalib supports datasets in multiple formats, including state-of-the-art anomaly detection benchmark datasets such as MVTec AD and BeanTech. For users who wish to use the library on custom datasets, anomalib also provides a Folder datamodule that can load datasets from a folder on the file system. The purpose of this article is to use the Folder datamodule to train anomalib model on a custom dataset. This will enable users to use anomalib flexibly to handle various types of data and perform anomaly detection tasks.
In this article, we will use the Hazelnut Toys dataset . The dataset contains multiple folders, each containing a set of images. The color and crack folders represent two types of defects. This article ignores the mask folder. anomalib will use all images in the color folder as part of the validation dataset, and then randomly divide the good folder images for training and validation.
Step 1: Install Anomalib
pip install anomalib
Step 2: Collect custom data
Anomalib supports a variety of image extensions, such as ".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff" and " .webp". Datasets can be collected from images with any of these extensions.
Step 3: Format data
Custom datasets can have different formats depending on usage and how they are collected:
- A dataset containing good and bad images.
- A dataset containing good and bad images and ground truth masks for pixel-level evaluation.
- A dataset that contains good and bad images and has been split into training and test sets.
anomalib's Folder datamodule can handle these use cases.
Step 4: Modify configuration file
To run Anomalib training, a YAML configuration file is required. Training configuration parameters are divided into 5 parts: dataset, model, project, logging and trainer. To use a custom data set in Anomalib, you only need to change the data set part in the configuration file.
Select the Padim algorithm here, copy the sample configuration file and modify the data set part.
cp anomalib/models/padim/config.yaml custom_padim.yaml
4-1 classification
data directory
Hazelnut_toy
├── colour
│ ├── 00.jpg
│ ├── 01.jpg
│ ...
├── good
│ ├── 00.jpg
│ ├── 01.jpg
│ ...
The classification data set does not have a binary mask of ground truth, so the task field needs to be set to classification instead of segmentation. At the same time, comment out the pixel part of the metrics part.
# Replace the dataset configs with the following.
dataset:
name: hazelnut
format: folder
path: ./datasets/hazelnut_toy
normal_dir: good # name of the folder containing normal images.
abnormal_dir: colour # name of the folder containing abnormal images.
mask: null # optional
normal_test_dir: null # name of the folder containing normal test images.
task: classification # classification or segmentation
extensions: null
split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
image_size: 256
train_batch_size: 32
test_batch_size: 32
num_workers: 8
normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
test_split_mode: from_dir # options: [from_dir, synthetic]
val_split_mode: same_as_test # options: [same_as_test, from_test, sythetic]
val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
transform_config:
train: null
val: null
create_validation_set: true
tiling:
apply: false
tile_size: null
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16
...
metrics:
image:
- F1Score
- AUROC
# pixel:
# - F1Score
# - AUROC
threshold:
method: adaptive #options: [adaptive, manual]
manual_image: null
manual_pixel: null
Note: Each dir value can be entered as a list of multiple folders
4-2 segmentation
Anomalib can not only classify defects in parts, but can also be used for defect segmentation. To achieve this, just add a folder called mask on the same directory level as the good and color folders. This folder should contain the binary image of the defect in the color folder.
data directory
Hazelnut_toy
├── colour
│ ├── 00.jpg
│ ├── 01.jpg
│ ...
├── good
│ ├── 00.jpg
│ ├── 01.jpg
└── mask
├── 00.jpg
├── 01.jpg
...
Fill in the mask field in the configuration file and change the task to segmentation to use Anomalib to segment defects.
# Replace the dataset configs with the following.
dataset:
name: hazelnut
format: folder
path: ./datasets/hazelnut_toy
normal_dir: good # name of the folder containing normal images.
abnormal_dir: colour # name of the folder containing abnormal images.
mask: mask # optional
normal_test_dir: null # name of the folder containing normal test images.
task: segmentation # classification or segmentation
extensions: null
split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
image_size: 256
train_batch_size: 32
test_batch_size: 32
num_workers: 8
normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
test_split_mode: from_dir # options: [from_dir, synthetic]
val_split_mode: same_as_test # options: [same_as_test, from_test, sythetic]
val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
transform_config:
train: null
val: null
create_validation_set: true
tiling:
apply: false
tile_size: null
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16
...
metrics:
image:
- F1Score
- AUROC
pixel:
- F1Score
- AUROC
threshold:
method: adaptive #options: [adaptive, manual]
manual_image: null
manual_pixel: null
Step 5: Training
python tools/train.py --config custom_padim.yaml
Step 6: Visualization
6-1 Logging and Experiment Management
Anomalib provides several ways to record and track experiments. These can be used individually or in combination.
To choose to save the results of predicted images, change the log_images parameter in the visualization section of the configuration file to true.
results
└── padim
└── Hazelnut_toy
├── images
│ ├── colour
│ │ ├── 00.jpg
│ │ ├── 01.jpg
│ │ └── ...
│ └── good
│ ├── 00.jpg
│ ├── 01.jpg
│ └── ...
└── weights
└── model.ckpt
6-2 Logging to Tensorboard and/or W&B
To use TensorBoard and/or the W&B logger and/or the Comet logger, make sure the logger parameter is set to comet, tensorboard, wandb, or [tensorboard, wandb] in the log section of the configuration file. The image below shows an example configuration saved to TensorBoard.
visualization:
show_images: False # show images on the screen
save_images: False # save images to the file system
log_images: True # log images to the available loggers (if any)
image_save_path: null # path to which images will be saved
mode: full # options: ["full", "simple"]
logging:
logger: [comet, tensorboard, wandb] #Choose any combination of these 3
log_graph: false