anomaly detection study notes (for personal use)

Record the learning process and daily plan of learning anomaly detection.

Domain background knowledge

Determining the task of anomaly detection: MVTec AD dataset. Image-level or pixel-level detection tasks. The background of the data set is relatively simple, but the defects detected are generally relatively small defects.

Three detection methods commonly used at this stage:
1. Anomaly anomaly detection based on reconstruction: autoencoder and generative adversarial network (GANS)
Autoencoders focus on the reconstruction ability of samples, while GAN focuses on the generation ability of samples.
2. Synthesizing-based method
3. Embedding-based method. This type of method uses pre-trained features from ImageNet. For example SimpleNet. It maps ImageNet features to the target domain, and then adds Gaussian noise to the feature map to simulate defects. The two feature maps are used together in the discriminator to judge and generate loss.
4. zero/few-shot detection
The biggest advantage of zero/few-shot detection is that I only pass One model can complete the detection of all defects. There is no need to have a classifier for each category. However, it should be noted that not all models follow the above definition. Like PatchCore, it also has a few-shot implementation, but it means that the amount of data used is n (way) × k (shot) pictures. In fact, a single classifier is set for each category. Considering the project requirements, it is better to pick a model to detect all defects.

Specific algorithm (zero/few-shot)

RegAD: RegAD is a cold start (only uses normal samples). Based on the MVTecAD data set, there are two training settings. The first is leave-one-setting, which first trains on normal images outside the target category, then performs normal distribution estimation based on the target's support data set, and finally performs inference on the target's test set. Another setting is to train the support data of each category, which is also training a single-category classifier. The training method provided by the github source code is for setting(i), and we are also focusing on this method. If the welded_seam of this project can be of the same scale as MVTecAD, it can be done. This algorithm is not a single-class classifier for each category though. But it also requires a normal distribution estimation process, which is 4 seconds on MVTecAD.
There is a question: How to apply it in engineering? Input several pictures, generate a defect model in four seconds, and then perform inference. Isn't this essentially a single-class detector? I just had another question. For this project, all support data sets are the same. It can be like this! Now train on the currently provided vertical weld data set, then use the normal samples in the horizontal welds as support, and the abnormal samples as infer. This way the support data set is different.

WinCLIP: This is a multi-modal model based on CLIP in an unsupervised form. Input N picture-text pairs, of which N*N-N are negative samples, and only N text-picture pairs are positive samples. My guess: It uses CLIP's pre-trained model, and then fine-tunes it on the MVTec AD dataset, and the text instructions are divided into four. However, this is not open source, but the first-place paper in the following competition is modified based on WinCLIP, and it has source code.

Algorithm Deployment (Project)

1. Data set. First, crop the vertical gas welding image given by the teacher last time (but in actual use, it should be used to detect horizontal welds). Extract the weld part and cut out small grids. The small grids without abnormalities are positive samples, and the defective ones are negative samples. Currently there are two categories: good and bubble. However, there may be undercut images in good, and the images in bubble are not necessarily what Party A calls bubbles. This part needs to be further finalized with the teacher.
2. There is currently only one defect, so you can first use the single-class classifier method to test it. The anomalib framework under openvino is selected. Currently waiting for the graphics card.
3. If the previous step of verification is successful, you can try RegAD again. This may require labeling another undercut defect data set and then training with leave-one-setting.
4. I haven’t read April GAN’s paper yet. It is the zero-shot champion of this competition. The other champion of the few-shot track does not seem to be open source, but you can also pay attention to other rankings. algorithm.

daily plan

6.29

  • After reading April GAN’s paper
  • If you have free space on your graphics card, use anomalib to run the homemade welded_seam data set.
  • Build an environment for April GAN ​​or RegAD

Summary:
I haven’t finished the April GAN ​​paper, but I went to Li Mu to learn about CLIP (Contrastive Language-Image Pre-Training) Made up for it. I used anomalib's padim and patchcore to run a self-made data set. The image-level detection accuracy of both is above 90%, but the test results of patchcore are better. However, the defect location in the feature heat map is not particularly accurate, and it is estimated that it is not easy to perform pixel processing. level detection. Through the issue on github, I know that his max_epoch is 1 (because CNN only appears in feature extraction, if there are more epochs, it will be the same as the first time). I don’t understand this sentence, I’ll read it tomorrow.

6.30

  • Understand these parameters in anomalib, especially the division of train and val data sets
  • Inference results in anomalib
  • After reading April GAN’s paper
  • Build an environment for April GAN ​​or RegAD

Summary:
I have finished reading April GAN’s paper. This is an improved model based on CLIP, or Win-CLIP. Zero-shot missions are available. Both image and text encoders use CLIP, and then provide some image-text pairs to train linear layers for feature conversion, so that zero-shot inference can be performed. If you want to use few-shot, store the support image features in the memory bank. The memory bank features at each stage are compared with the feature maps in the zero-shot for pre-similarity to obtain the anomaly map and finally the anomaly score.
Regarding the division of the data set, currently in the self-made data set, 20% of normal and 50% of bubble are used as val, and the contents of val and test are the same.
Finally, there is reasoning. anomalib does inference by the way in the last stage of training. When calling the model detection data set in the project, you must pass test.py. Tomorrow we will see whether it is used for test or infer, because if it is used for test, the category of the given data must be certain, and infer is the unknown category of data to be detected.

7.2

  • Do inference deployment of anomalib
  • Build the April GAN ​​environment
  • Run April GAN

Summary:
The reasoning of anomalib is to use the py file tools/inference/lightning_inference.py. Inference time is 400ms (3060), inference time on server is 350ms. Although there are openvino inference files, the model needs to be converted. There is currently no requirement for inference time, so there is no need to consider this for the time being.
April GAN ​​environment is not difficult to set up. It is currently set up on the server. But there is a problem with the server. Commands based on conda, except for activating the virtual environment, will report an error without pip, and the reason cannot be found. The environment of this project is to create a new 3.8 virtual environment and then install the packages in requirement.txt. (Okay, it’s too late. Although you can create a virtual environment, there will be various problems when packaging, so you still need to try again tomorrow to see if you can create a conda environment, or solve the pip problem that occurs when using conda) < /span>
April GAN ​​will take a closer look at the code tomorrow. First, what does it do to the MVTecAD data set? Can it be used on the welded_seam data set with the same directory structure?

7.4

  • Create the conda environment of April GAN ​​and install the package
  • Look at the data reading code of April GAN
  • Make the split tag of welded_seam and convert it from json to png
  • Try to do the infer of few-shot directly, borrow the features of each stage of zero-shot, and compare it with the support features in memory bank.
  • Run April GAN ​​network

Summary:
The April GAN ​​environment is set up, but the conda environment is still not working. I use the virtual environment that comes with python through source April_GAN/ bin/activate starts the environment. But there is a hidden danger that the horvod package fails to be installed. This is used for distributed training. I don’t know if it will have an impact on subsequent training. Before, I followed this horvodFollow the tutorial. If you really need it later, change the tutorial, then directly create a new virtual environment, install all packages except horvod, and finally install horvod separately.
The splitting tag in png format is ready. However, the directory structure is different from the mvtec format required by April GAN. The train and test directories need to be set under each defect. The train needs to contain 200+ normal samples, and the test contains 20 normal samples and 20 defective samples each. There is still a lack of normal samples.
The data set has been modified and placed on the server.

7.6

  • Change the idea of ​​cutting blocks, perform overlapping cuts, and build infer sets
  • Make a report ppt
  • April GAN ​​and other card training

Summary:
You don’t need to look at the one-classier anomaly-detection algorithm, just look at the algorithm based on zero/few-shot.
This time the block is cut according to stride=100 (the average width and height of the weld is 320*3200). How to design the infer set and test set? Put non-duplicate pictures in test, and put data-enhanced pictures in infer? Or should we put the original image and the enhanced image in the test, and put the unseen image and its enhanced image in the infer? I think it should be the latter. After finishing the report in the afternoon, I will extract the ppt.

7.7
Now that the report is finished, run the zero-shot and few-shot based on mvtecAD and VISA on the data set on April-GAN, and just look at the heat map. If Party A returns the data set, we can run a few-shot of other types of defects based on several types of weld defects.

Based on April-GAN, run zero and few shots on welded_seam. The results are as follows:

zero-shot
Training mapping based on visa data set
Insert image description here
Training mapping based on mvtec data set
Insert image description here
few-shot
Training mapping based on visa data set, using
5shot
Insert image description here
10shot
Insert image description here
20shot< in welded_seam /span> The teacher gave some new data and extracted some bubbles and normal samples. What do these indicators mean? There is a certain misjudgment, or other defects will be detected as bubbles. Now we use mvtec or visa to do feature mapping. If we use the normal samples or undercuts of this project to do feature mapping, and then do inference on bubbles, will the effect be better? Now it seems that the higher the shot, the better the effect. For 5 shots, the inference time is about 1.4s for one picture, for 10 shots, it takes 2s, and for zero-shot, it takes 0.5s. Then now we see that the problem is: 20shot 10shot 5shot
Insert image description here
Training mapping based on mvtec data set, using

Insert image description here

Insert image description here

Insert image description here



7.14&7.15
It is still a project. Although it cannot be completed, it can still be talked about. So do your best!

  • Look at the train part of the paper and code, and think about how to use the welded_seam data set to run train
  • Improve visualization and map the defect map back to the original image
  • There are also some new pictures, and the bubble and normal samples are extracted again
  • Understand the meaning of each indicator

I ran the mvtec data set. This model is actually a two-classification model and cannot tell what the specific defect is.
I read the paper again. In fact, the key lies in the training part of zero-shot. If you want to improve the accuracy of the model, I think the most direct way is to use the undercut data set of welded-seam to do zero-shot. Feature mapping of the shot part, and then perform few-shot inference on the bubble. This is a pitfall. First, visualize the abnormality map to the segmentation results.
Let’s talk about the characteristics of each indicator. For classification, AUROC, F1-max and AP are all very high. With more PRO (intersection and union ratio) split, the F1-max and AP of this task are very low, while AUROC, F1-max and PRO are very high. I still don't know what specific impact the levels of these indicators will have on the segmentation results.

7.17

  • Enable translucent visualization
  • Make an undercut data set, label it and then convert the format (json-png)
  • Generate meta.json
  • Train feature maps and then re-run zero-shot inference

A compromise was made. The yellow circle used for visualization came out and the translucent one was not made.
The undercut data set has been created and named welded_1. Use it to run the train tomorrow

7.18

  • Use the undercut data set to train feature mapping and do zero-shot inference
  • Do 5-shot inference based on the weight of the mvtec data set, generate a mask, and then visualize it
  • Let’s try how to achieve semi-transparent visualization

The first step of training is completed, and the training effect is very poor. Thinking about the reason, I think it is because although the types of defect samples included in the train are too single, the features learned by the model are only for undercuts. It can also be found from the defect map that the identified defect areas are all at the location of the undercut. Therefore, unless the data set is made larger, the feature map will not be trained using a self-made data set. In addition, you can put the self-made undercut data set into the welded_seam test for few-shot inference. It should be noted that the model is single-category, and the smart detection will not return the defect category.
In the second step, in addition to bubble, the undercut class was also added, and it was found that the detection effect was very poor.
The third semi-transparent visualization was implemented, adding a red mask, and setting the transparency of img and mask through the cv2.addWeight function, achieving a combination of the two.
Send the bubble inference results to the teacher tomorrow and tell him that the undercut results are very poor (tried both methods)

7.20

  • Let’s first look at the segmentation head and loss function of April GAN.
  • Is it possible to replace the dividing head with a detection head?
  • Or see if other few-shot target detection algorithms can do this data set

One of the reasons for giving up the segmentation algorithm is that the final defect area is too fine and cannot be drawn out through a bounding box; the other is that the speed of the segmentation algorithm is too slow. Oh, yes, although segmentation is no longer needed, in terms of visualization, the teacher recommends using the outline form without a transparent mask to make it unclear.

7.23

  • Is it possible to replace the dividing head with a detection head?

After reading this morning, I am left with two questions:

Why does adding text prompts improve accuracy?
You can read the WinCLIP paper for this. This part of the job is relatively fixed. It is to input the image category of a batch, then generate 245 sentences, and finally obtain the text features of (768,2) through the tokenizer.
This is because the features of the model come from CLIP, which is a comparative learning method. Take the image feature as x, the text feature as y, and define the agent task as the diagonal x-y pair as a positive sample, and the rest are negative samples for training. The introduction of text information here actually uses a more accurate description method to define through text what is a defective sample and what is a positive sample with an error within the allowable range.

What is the use of the image_feature finally output by the model?
Look at the process diagram of April GAN. I guess this is used for image-level defect detection. You can check whether this is done in test.py.

7.31

  • Determine the areas that need to be modified for semantic segmentation and target detection.
  • Let’s take a look at how yolo’s detection head is designed
  • Look at some networks that support both semantic segmentation and target detection (mask-RCNN? No, yolov5 supports it)

Judging from the train.py file, semantic segmentation based on defect maps is actually not complicated. The 4 (8, 1369, 2) feature maps are transformed into 4 (8, 2, imgsize, imgsize) feature map, softmax is performed on the feature channel 2 to obtain the outlier value corresponding to each pixel, thereby obtaining the defect map. If you want to do target detection, in addition to category information, you also need to return coordinates. Can this feature form be converted into coordinates? Then when it comes to data reading, the mask cannot be read, and the detection frame needs to be read. Finally, the loss function is modified.
yolo is a feature map on three scales, but the four stages extracted by CLIP are at the same scale. During the experiment, only the features of the last stage can be used for detection.
The problem is that on the detection head of yolov5, the output feature map needs to go through the last convolution layer and then be disassembled into (bs,nl,x,y,no*na). Now we need to know the shape of the feature map before and after the convolution layer, and we need to compile it and take a look. The output of the minimum feature map is x(bs,255,20,20). April-GAN is (bs,1369,2), which should theoretically become (bs,2+xywh+conf,w,h). Yolo is oriented to multi-scale problems, and April can convert the feature map output of the four stages into multi-scale feature maps. (Actually, would it be better to turn yolo’s backbone into CLIP???)

8.1

  • Run yolov5’s train.py and see the feature map size changes.
  • Replace bubble with words like potholes and see if there is any improvement in the detection results.
  • Compile it and see what information the first dimension in the encoding characteristics of the text represents.
  • A brief look at the target detection paper based on CLIP

1. It is clear that the detection head in yolov5 changes x(bs,255,20,20) into x(bs,3,20,20,85). The key is the anchor×(xywh+conf+80 category) represented by 255, which is obtained by reducing the dimensionality of higher-dimensional features, but it needs to be raised to 255 in CLIP. This is unreasonable.
2. Because CLIP borrows language prompt words for comparative learning, whether the feature description is accurate also has a certain impact on model detection. After verification, the impact is not significant, indicating that whether it is the original bubble or welding_pits, CLIP is enough to accurately describe the potholes on the weld. The accuracy increases and decreases, and the changes in the defect map are not significant.
3. The shape of text_feature is (768,2). In 1 dimension, the normal and abnormal feature vectors generated by the text encoder are in sequence (768,).
4. I read an article in ViLD. It uses RPN to generate proposals, uses conv to generate image encoding, and the text is encoded through CLIP's text encoder. The two are learned through comparative learning. Among them, the proposal is generated using CLIP as the teacher to guide backbone+RPN+RoIAlign. Hey, actually looking at it this way, proposals with location information can also be generated based on CLIP. How is it done?

8.2

  • Observe the few-shot output of April-GAN
  • Collect few-shot target detection algorithms

1. Now the output of April-GAN in few-shot is the sum of zero-shot and few-shot. Now I only want to see the output of few-shot. In the current output, could it be that too many false detections in normal samples are due to the poor zero-shot results? If the result of few-shot is good, the weight of the two added defect maps can be adjusted or only the few-shot can be output.
If you just look at the few-shot, it is actually easy to generate outliers for the defect-free parts. The model still does not know what a normal image looks like, and it lacks understanding of defect "bubbles" and "solder holes". The model inference speed of CLIP is still too slow, so let’s give it a try first. Finally, if you want to enable it, it is best to use the larger welding_seam data set to train the mapping layer.

8.9

  • Run the bubble data set on anomalib to see the accuracy and inference speed.
  • Try anomalib running efficient_ad

1. Run out and see that the segmentation accuracy is still not high, but the false detection is much smaller than April-GAN. You can create another data set to see the effect
2. Now it is Stuck on homemade dataset. But there is a problem, that is, when I want to train mvtec, I downloaded a lot of data sets. How about I finish downloading these data sets tomorrow, run the mvtec code, and then try to see if the homemade welded_seam can run< /span>

8.13

  • Run through efficient_ad
  • Take a look at efficient_ad’s paper
  • Create a data set for the data collected this time
  • Try again using April GAN ​​on a new dataset

1、跑mvtec时精度很高,自制的welded_seam精度很低。有可能是数据集的组成形式不一样,改成mvtec一样的再试试。要是精度还低,就补充一些正常图像进去。efficient_ad的效果很差,不予采用了就。patchcore的效果还挺不错的,目前F1-score是0.44,AUROC是0.99。F1低的原因我觉得是召回率比较低,这个和阈值也有一定关系,在热度图里可以看到,尽管有些缺陷没有检测出来,但是还是有一定热度的,所以我想把8月数据集里的bubble提出来,做个test看看结果。
2、没用,不看了,去看patchcore
3、制作了一个normal数据集,一共是436张正常的block,补充到welded_seam数据集中了。
4、可试可不试了,因为太慢了。

8.14

  • 用anomalib做pathcore的predict,改变阈值看看效果
  • 把上次采集数据中的bubble,飞溅,焊瘤提取出来,用作predict
  • 看patchcore论文

1、predict跑通了,现在是在热度图上放分割结果。单独跑一下1689的block10和11,因为这个问题是关乎焊瘤的检测效果。因为方案设计的关系,所以截取的焊瘤大多分布在两个block中,在block10中尚且有焊瘤的特征,但在block11中可能就难以观测到了。阈值根据官方手册的说明进行修改。
当前阈值下,bubble的效果还好,其他两种效果就比较差了,是因为没有给这两种数据做test?明天看看论文吧,需要的话就补充上。
2、不做咬边数据集,因为现在采集的图片放大以后,很多看起来都像是有咬边的。如果要加入咬边数据,需要对normal数据进行大规模清洗,可能满足要求的也不会很多,所以就先试试其他四种缺陷的检测效果。
spatter和beading提取好了,bubble中只把“微信图片”的block分好了,其他IMG开头的block没分。
跑predict可以在分好的bubble等四个文件夹中跑,也可以直接在各个图片的文件夹下跑。
3、明天一定要看了,因为这个模型的效果还不错。

8.16

  • 修改阈值跑跑这几个数据集的predict
  • 一定看patchcore的论文。为什么训练一个epoch就够了,还有什么可以优化的参数吗?

1、我认为这个阈值应该是anomaly map里大于这个最近邻距离就判定为异常。但是现在我不知道这个值应该是多少,正常来说应该是0-255之间?0.1,1,100,200都试过没有效果。这个最后再试吧,先调整其他的参数
2、论文看完了看懂了。训练阶段还需要调整的参数是coreset生成时,最近邻的数量以及采样的比例。两个调整方向就是增大num_neighbors和coreset_sampling_ratio。必然会导致推理速度变慢,不过可以试着看看效果如果。最后再看看有没有办法有效地改阈值。后面如果要调试别的缺陷数据集,由于数据很少嘛,memory bank里的东西少,所以这两个值可以适当加大,牺牲训练和推理速度换一点精度。

8.17

  • 增大num_neighbors和coreset_sampling_ratio,看看patchcore的训练结果
  • 这个anomalib能不能调试整个训练过程啊,如果在lighting_net里加点应该可以
  • 最后可以试试怎么改阈值

1、coreset_sampling_ratio修改为0.2和0.3均无法训练,neighbor改了对于模型也没有很大的变化,所以这两个因子有影响但不多,可以在最终确定了patchcore后再进行调参。现在转而去看看改阈值(改验证集)。
2、确实是的
3、在github的issue里找到的:Anomalib目前有一个自适应阈值机制,旨在解决阈值问题。自适应阈值机制为一系列阈值计算验证集上的F1分数。最终的阈值是导致F1得分最高的阈值。这种方法的一个主要缺点是验证集需要包含异常样本,这在现实世界的异常检测问题中可能并不总是可用的。所以这个阈值应该是在val时确定的,修改val集也会修改这个阈值,目前的阈值感觉偏高了,应该把val集变简单(或者和predict相近)会使得阈值降低一点。
现在把8月采集的一部分bubble换成val了,借此来改变阈值,同时扩充了train,明天跑一下看看有没有好一些。

8.18

  • 跑一下拓充后的welded_seam_2数据集
  • 试试reverse distillation网络(有2022和2023CVPR两版)
  • 试试dream网络(分割的F1比较高)

1、在autodl的anomalib服务器上跑了下新数据集,果然在val换成比较简单的数据以后,阈值明显提升了,用这个模型和阈值再在原来welded_seam_1上的val上跑,效果也不错。目前在welded_seam_2的val上F1-score是0.69。后续试试另外两个,如果效果一般,就继续研究该模型的可视化(在原图上标注缺陷区域)和openvino部署。
2、anomalib上的是2022版本的,跑起来效果不好。还有个2023CVPR的reverse distillation++,官方强调得是与patchcore相近的精度和更少的推理速度。现在offical代码也出来了,如果patchcore推理速度不满意可以再试试这个。
3、也不高。除了patchcore都不太好,为啥呢?

8.19

  • 修改可视化
  • openvino推理加速

1、调试的时候找到welded_seam_2对应的pixel_threshold了:51.8008。位置在
src/anomalib/models/components/base/anomaly_module.py的94行
可视化现在已经修改为原图+mask。位置在
src/anomalib/post_processing/visualizer.py的203行
注意:只修改了segmentation的visualization,classification暂时没有修改
2、不用openvino推理加速,2080Ti的infer速度1秒3张。如果要使用openvino的话,查了下官方的github issue,其实说是解决了这个bug。可能需要把整个工程重新替换成最新版的。

April GAN总结

Insert image description hereInput: 518×518 image
Network structure:
1. The image is generated by CLIP encoder feature maps (1369,1024) in each stage, and then go through a linear layer to map the features into shapes that can be coupled with text features (1369,768)
2. Text is based on template + category + status Sentences are formed in the form, and the text features of (768,2) are generated after encoding. This is formed by two (768,) feature vector stacks generated by the normal sentence and abnormal through the text encoder.
3. Multiply the image feature matrix and the text feature matrix to obtain the feature of (1369,2). For the semantic segmentation task, the defect map of (2,w,h) can be directly output
Summary:
For the target detection task, the features of (1369,2) need to be transformed into (anchor×(xywh+conf+2 category) through 1×1 convolution ), w,h), and then the view is (anchor, xywh+conf+2 category, w,h). This step of convolution is unreasonable. I don't think it's suitable to connect the detector head directly. Fundamentally speaking, CLIP extracts features from the entire image and lacks image area features, so it is not suitable for single-stage target detection tasks.
I tried changing the defect name (bubble->welding_pits), but the effect was not obvious; it is not feasible to turn the segmentation problem into a detection problem. First, the shape is limited by text features and cannot be upgraded. Dimension, or CLIP is to extract features for the entire image, not individual areas of the image; secondly, target detection based on CLIP is mostly two-stage, so it is better to go directly to the few-shot algorithm; finally, the detection speed is very slow. Slow and does not meet actual engineering needs. If you want to enable it, it is better to use the larger welding_seam data set to train the mapping layer.

PatchCoreSummary

Insert image description hereTraining phase: Send defect-free Xtrain generated features and store them in the memory bank
Testing phase: After Xtest generates features, perform nearest neighbor search with the features in the memory bank, and return the nearest The neighbor distance matrix can be transformed into a defect map and defect score after reshaping
The network has three innovative points:
1. Locally aware patch features
Input the training images into a backbone based on ImageNet, obtain the mid-level features, and store them in the memory bank. In order to improve the receptive field of the mid-level features, retain the resolution and not deepen the features, with each point as the center and p as the radius, the features of the patch are aggregated to replace the original features at that point.
2. Coreset-reduced patch-feature memory bank
The entire memory bank cannot be used. The method in this article is to use the coreset, that is, the core subset. out, this subset can fully represent the information of the entire bank. The extraction method is minmax facility location, and the goal is to make the extracted mc as close as possible to the original m. This is the so-called training process of the entire model, making the coreset and the original set as close as possible.
3. Anomaly Detection with PatchCore
The training image first passes through the pre-train backbone to obtain the intermediate features and store them in the memory-bank; this feature is too large, so Dimensionality reduction is performed through coreset collection; the test image obtains intermediate features through the network, finds the nearest neighbor in the memory bank, returns the distance to the nearest neighbor, and finally reshapes to (batch_size, h*, w*), which can be obtained by adding them channel by channel. For the anomaly score, the anomaly map can be obtained by expanding the interpolation to (batch_size, h, w).
Summary:
For more details, you can go back to Little Green Whale’s notes. There is also asource code interpretation here.
The inference speed of this model can be used in practice. There are two options here. One is to enter a block without any defects in train, and then put five categories of defects in val. This actually just generates a model. The advantage is that the inference speed is fast, but the disadvantage is that a unified defect threshold needs to be generated for five defects, which will lead to lower accuracy. The other method is to generate five models. Taking bubble as an example, the block in train has no block and has other defects, while the one in val has pictures with only defects such as bubble. The advantage is that each defect corresponds to a model and a threshold, but the disadvantage is that it is slow.

Guess you like

Origin blog.csdn.net/smallbig12/article/details/131451511