A review of industrial image anomaly localization (detection) based on deep learning

 
  

Click on " Xiaobai Learning Vision " above , and choose to add " Star " or " Top "

重磅干货,第一时间送达

The following article comes from: 五柳西安@知识

Author: Wu Liu Xi'an

Original link: https://zhuanlan.zhihu.com/p/545953517

This article is only for academic sharing, if there is any infringement, please contact the background to delete the article

5f7d9f387a7e1f1f9b69b1ba967b6e48.gif

guide

This paper helps researchers in this field get started quickly by providing a comprehensive overview of recent results in unsupervised anomaly localization in industrial images using deep learning.

The Institute of Automation of the Chinese Academy of Sciences, Beijing Technology and Business University, and the Indian Institute of Technology jointly published the latest overview of industrial anomaly localization (detection). A 20-page review with a total of 126 references!  This review classifies and introduces industrial anomaly localization methods according to different models/methods. The latest method is as of February 2022! At the same time, the review also includes a performance comparison on the complete MVTec AD dataset, and gives future research directions for multiple industrial anomaly localization!

004e712e9f0345bdc7d9f5c4364d1f93.jpeg

论文题目:Deep Learning for Unsupervised Anomaly Localization in Industrial Images: A Survey

Publisher: Institute of Automation, Chinese Academy of Sciences, Beijing Technology and Business University, Indian Institute of Technology

Paper address: https://arxiv.org/abs/2207.10298

1. Overview

Currently, deep learning-based visual detection has achieved great success with the help of supervised learning methods. However, in real industrial scenarios, the scarcity of defect samples, the cost of annotation, and the lack of defect prior knowledge may lead to failure of supervision-based methods. In the past 5 years, unsupervised anomaly localization algorithms have been more widely used in industrial inspection tasks. This paper aims to help researchers in this field get started quickly by comprehensively reviewing the latest results in unsupervised anomaly localization in industrial images using deep learning. This review analyzes more than 120 important literatures covering different aspects of industrial anomaly localization, mainly covering various concepts, challenges, classifications, benchmark datasets, and quantitative performance comparison of the mentioned methods. While reviewing the research results so far, this paper provides a detailed prediction and analysis of several future research directions. This review provides detailed technical information for researchers interested in industrial anomaly localization as well as those who wish to apply it to anomaly localization in other fields.

2. Definition of abnormal localization

770c0b1f1d06ba0b8f357de18b070759.jpeg

What is AL?

The human visual system has an inherent ability to perceive anomalies—people can not only distinguish defective images from non-defective images, even if they have never seen any defective samples before, but can also easily point out those locations in the image that are abnormal. Anomaly localization (AL, anomaly localization) was introduced into academia for the same purpose, that is, to teach machines to "discover" abnormal regions in an unsupervised manner. In deep learning methods, "unsupervised" means that the training phase contains only normal images, without any defective samples. AL methods under the unsupervised paradigm first avoid the difficulty of collecting abnormal or defect samples, which cannot be avoided in supervised methods; because in industrial scenarios, normal images without defects are far more than abnormal samples. Second, the labeling cost of training samples in supervised methods can be eliminated in unsupervised methods. Finally, unsupervised methods also avoid the effects of labeling bias, which is common in supervised methods. Since the training data has only normal classes, it can be called "semi-supervised". However, to be consistent with most existing methods, we drop the term "unsupervised" or "semi-supervised" in the following and refer to it only as AL.

The difference between AD and AL: Anomaly detection AD (anomaly detection) is often mentioned in computer vision, and outlier detection or one class classification is another term for AD. Figure 1 shows the difference between AD (anomaly detection) and AL. AD refers to the task of distinguishing defective images from most non-defective images at the image level, focusing only on the image category, normal or abnormal. On the other hand, AL, also known as anomaly segmentation, is used to generate pixel-level anomaly localization results. It not only focuses on image categories, but also pays more attention to the detailed location of anomalies. The darker the color in the anomaly heat map, as shown in Figure 1, the more likely there is an anomaly at that location.

What is an exception?

Generally speaking, abnormalities in the industrial field usually refer to defects, which not only include three types of damage (scratches, bruises, crushes, etc.), discoloration, bright marks and other texture changes, but also more complex defects that require Functional deficit in further logical judgment. For example, whether the transistor pin is inserted into the pin, whether it is installed wrongly, reversely or less. The first row of the figure below shows the texture defects on the MVTec AD dataset, and the second row shows the functional defects on the MVTec AD dataset. Most of the defect types in MVTec AD are texture defects, and a small part of the defects are functional defects. Functional defects mainly exist in the transistor data set, so this data set is the most difficult to detect among the 15 MVTec AD data sets.

8567a73ea14aba0abe74924a710f10da.jpeg2dc0ae47e53477196a81ae1b32d7830f.jpeg

The image above traces the history of AL for industrial images. Most non-deep learning based AL models rely on sparse coding [14, 15] and dictionary learning [16]. Since 2017, due to the great success of deep learning techniques in the field of computer vision, more and more deep learning methods have emerged [19]. GAN models [17, 22] and AE reconstruction networks [18] are first used for deep AL models. To compare the impact of AL consistently, MTVec Corporation proposes a complete industrial AL dataset, namely the MVTec AD dataset [20]. Later, models based on feature embeddings became more effective and efficient and became popular AL architectures. Knowledge distillation [21, 26] and pretrained feature comparison [23, 25, 30] are examples of typical models. Then, several self-supervised learning based methods are applied to the above tasks [24, 29]. Flow-based generative models [28] and ViT models [27] are also embedded in AL networks as better methods. Despite its short history, AL research has published hundreds of papers, and we comprehensively selected influential papers published in prestigious journals and conferences; this survey focuses on major advances over the past five years. Due to the introduction of the MVTec AD data set, in the past two years, a large number of methods have been proposed in a state of blowout, and the indicators of this data set have also been brushed to a very high level. This can be obtained from the paper with code website (https://paperswithcode.com /sota/anomaly-detection-on-mvtec-ad) can be seen.

How is this review different from previous reviews?

e9ef2570e908ebd2cac00b22df3d9314.jpeg

Multiple reviews related to AD/AL are listed in the article, involving early non-deep learning AD methods [6], deep learning-based AD methods [5, 7–9], limited AL models [10] or focusing only on Research in the fields of AD/AL [11] of GAN. However, few reviews have devoted to complete and comprehensive AL methods for abnormal localization. On the other hand, most existing surveys only focus on AD methods for image-level classification, which can easily overlook subtle abnormal regions in industrial scenes. Moreover, in the past five years, all methods have evolved from image-level comparison (reconstruction or generation) to feature-level comparison, and also from simple defect synthesis proxy tasks to self-supervised methods based on contrastive learning. Our work systematically reviews recent advances in unsupervised artificial intelligence. These include in-depth analysis and discussion of many aspects of the field that have not been explored before. In particular, we summarize and discuss existing methods for addressing various problems and challenges, provide roadmaps and taxonomy, review existing datasets and evaluation metrics, perform a comprehensive performance comparison of state-of-the-art methods, and Insights are offered for future directions. We hope that our review will provide new insights and inspiration, facilitate a deeper understanding of AL, and encourage research on the open topic presented here.

3. Classification of representative methods

202a5275e796497a911c31dd114f1a21.jpeg

We divide the current methods into 5 major sub-methods, and give a detailed introduction and comparative analysis of each sub-method. In each subsection, we further subdivide its representative articles. However, some jobs fall into more than one category. Therefore, we use the Venn diagram in Figure 4 in the article to divide the work, and the overlapping area includes the intersection of methods.

mainly include:

1) Method based on image reconstruction : This is the earliest method, and it is also very intuitive. It is expected that the AE autoencoder can reconstruct the abnormal image into a normal image, and then make a difference between the reconstructed image and the normal image to obtain the positioning result. The main improvements include network structure, hidden space and loss function improvements. The problem with this method is that it is difficult to ensure that the abnormal area in the abnormal image is well reconstructed as normal, and the reconstruction effect of the normal area in the image is consistent with the input, so the result of the difference between the two cannot fully represent the abnormal area.

2) Method based on generating network : The representative methods are VAE, GAN and Normalizing Flow (NF). VAE introduces a method similar to CAM to find the gradient to judge the abnormal position. GAN mainly improves the effect of generated or reconstructed images through the setting of multiple generators and discriminators. However, both GANs and VAEs lack accurate evaluation and reasoning of probability distributions, which often leads to low-quality fuzzy results in VAEs, and GAN training also faces challenges such as mode collapse and post collapse. NF can better solve the above problems. At the same time, NF will be combined with the following feature-based method, which is currently the best method for MVTec AD.

3) Methods based on deep feature modeling : mainly include knowledge distillation and feature modeling. In particular, feature modeling can be subdivided into many subcategories, such as: KNN, SOM, Gaussian modeling, etc. For details, see the article.

4) Self-supervision-based methods : mainly divided into proxy tasks and contrastive learning. Proxy tasks include common reconstruction, completion, relative relationship prediction and attribute repair, etc.

5) Method based on one-class classification : This method is mainly used for abnormal detection AD. If the image is divided into sliding windows, all AD methods are also applicable to AL. In addition, it can also be combined with the previous 4 methods.

4. Experimental evaluation and comparative analysis

Datasets : To be precise, there are three datasets commonly used for AL positioning: NanoTWICE, MVTec AD, and BTAD. These three data sets are also the most cited in AL papers. Of course, some supervised segmentation data sets will also be used for evaluation, including KolektorSDD, KolektorSDD2 and MT Defect.

7e157c16f6b3c75e321fbbb1a9737518.jpeg7a506cb841f4431006ac4ba63c251f50.jpeg

Performance on the MVTec AD dataset: Tables 10 and 11 in the article summarize the performance of the current AL methods (mainly released from 2017 to 2021) on the MVTec AD dataset. We observe that most methods achieve baseline performance with the help of AE. Some attempts have been devoted to designing more powerful modules, such as image inpainting and GAN generative networks. For the RIAD method, the pixel AUROC on the MVTec AD dataset has reached 94.2% [53]. However, experimental results show that these reconstruction or generation methods purely based on AE autoencoders are difficult to perform well on the MVTec AD dataset.

In contrast, methods based on deep feature embeddings quickly demonstrate their advantages in AL. Results in past papers show that three typical feature comparison methods, ST[21], SPADE[25] and DFR[84], achieve pixel AUROC of 93.9%, 96.5% and 95.0%, respectively, on the MVTec AD dataset . Starting from general feature modeling methods [23], feature embedding-based methods have steadily improved when more efficient strategies are introduced, e.g., introducing feature selection into semi-orthogonal embeddings [87], attention strategies [23, 43], KNN with memory bank [30], self-organizing features [88] and aligned features [92]. Therefore, on the MVTec AD dataset, most methods yield about 93% pixel AUROC and 91% PRO scores.

Furthermore, CFLOW-AD [79] combined with a novel generative network outperforms other state-of-the-art models and achieves the best pixel-wise AUROC on MVTec AD so far. On the other hand, MPAD [50] combined with pre-trained features surpasses other state-of-the-art models and achieves the best PRO score so far on MVTec AD. Here, in Figure 13 in the paper, we show the visualization of AL results of four typical feature embedding methods on MVTec AD, including STPM [81], PatchCore [30], PaDiM [23] and CFLOW-AD [79] . These results were obtained using the standard image library Anomalib [125] maintained by the Intel corporation. Methods based on self-supervised learning can learn visual features from unlabeled images and embed them into the above network structures as additional modules. This method, such as ANOSEG [98], NSA [99] and DRAEM [29] can achieve better results compared to original AE autoencoder based methods. Moreover, compared with image reconstruction or pre-trained features, contrastive learning based methods [92,107] show very competitive performance due to the discriminative information of abnormal regions. The method based on One class classification is usually time-consuming and the positioning results are inaccurate, especially the calculation time of clipping local patches and extracting a single local feature. However, some methods include more complex feature comparison procedures, for example, patch-SVDD [24] and SE-SVDD [113].

In conclusion, artificial intelligence methods based on deep learning can obtain relatively satisfactory results on the MVTec AD dataset by adopting different strategies. In particular, 3 datasets out of 15 are not overcome by most methods; these are tile, wood and transistor datasets. Tile and wood are typical texture datasets that contain multi-scale and multi-type defects, and the current main methods fail to achieve 95% AUROC. The transistor dataset has missing defect types that contain high-level semantic information, namely functional abnormalities. In this dataset, it treats all missing ranges as ground-truth. Therefore, the current main methods also do not achieve the desired performance.

5. Future research directions

Functional anomalies : As can be seen from the advantages and disadvantages mentioned in the above table, the anomaly localization effect of many methods drops significantly on some specific datasets. For example, DFR [84] suffers from poor performance on transistor datasets (see Tables 6, 10 in the article). This is because most of the datasets shown in Table 10 of the article are texture defects, such as scratches and dents, rather than functional abnormalities. Functional exceptions violate fundamental constraints, for example, an allowed object is in an invalid location or a required object is missing. In industrial scenarios, both types are equally important. Currently, Bergmann et al. [126] have proposed a method to jointly detect texture and functional abnormalities. Therefore, research on functional defects or abnormalities will be an important direction in the future.

Publish rich AL datasets : Public anomaly location datasets are not yet large or rich enough compared to real industry scenarios. More complex datasets with varying imaging conditions (such as illumination, perspective, scale, shadows, blur, etc.) should be provided to evaluate the effect of AL algorithms more objectively. Existing MVTec ADs have single imaging, relatively good image quality, and some class alignments. Some existing methods even exploit this property to improve performance. Despite promising results, these methods cannot adapt to real complex industrial scenarios. Therefore, it is necessary to have some realistic and rich industrial datasets.

ViT-based methods : ViT-based methods are currently dominating the field of computer vision due to their superior performance. Some ViT-based works [27, 124, 79] have also been proposed to solve the AL problem. ViT has unique advantages in long-distance feature modeling. Comprehensive consideration of multi-scale abnormal regions is the direction where ViT can be improved. Furthermore, the best framework for AL is NF-based generative models. Therefore, the combination of ViT and NF has always been an important direction.

Meaningful model evaluation : As shown in Figure 13 in the article, there is a bias between high pixel AUROC values ​​and fine localization performance, which may cause model validity issues. Many methods still use the pixel AUROC evaluation index, but the visualization results of AL are not good, and there is a lot of over-checking in the background, that is, the effect of the abnormality is very rough, and the outline of the defect is not fine. Future work is suggested to consider the fine-boundary problem when building models, or choose the IoU metric for model evaluation.

Accurate exception types : There are various types of exceptions in actual industrial scenarios, and the importance of different exception types is different. The existing AD/AL method only gives a single category or location of defects, and cannot obtain detailed defect types, such as scratches, foreign objects, and different colors. This problem challenges the classic paradigm of AD or AL, and it is necessary to develop A learning method that distinguishes anomaly types. There are existing methods [122] for clustering anomaly types and grouping anomalous data into semantically consistent categories, but this is only the beginning.

Unsupervised 3D anomaly localization : With the popularity of 3D sensors, more and more defect detection tasks in industrial scenes are shifting from 2D to 3D. Correspondingly, artificial intelligence in 3D scenes will also become a development trend. Recently, a 3D AD/AL dataset [123] was made public by the company MVTec at the end of 2021. Therefore, we believe that 3D AD/AL constitutes a relevant future direction.

This article is only for academic sharing. If there is any infringement, please contact to delete the article.

—THE END—

aafabcc5a83ebaf98cebb3cb52013644.jpeg

Guess you like

Origin blog.csdn.net/qq_42722197/article/details/131238591