Using the padim unsupervised algorithm of the Anomalib project for model training and ONNX deployment of self-made industrial defect data sets (2) - Python code interpretation

Table of contents

foreword

1. Interpretation of input and output of padim algorithm onnx model

Two, padim algorithm Python code processing flow analysis

2.1 Preprocessing part

2.2 Prediction part

2.3 Post-processing part

2.4 Visualization part

3. Summary and Outlook

foreword

        In the previous blog, I completed the model training of the padim algorithm in Anomalib, and got the onnx model and the effect of reasoning. Students who want to see this part can go to the page... For students like me who have not read the paper at all, they got the onnx There is a high probability that the model will be confused in the future. What is the input? What is the output? What kind of pre-processing and post-processing is required? How to draw the same good-looking probability heat map as in the Anomalib project? How to deploy in C++? This blog will take you to analyze these issues one by one. I originally wanted to write it with C++ deployment, but it was too long. Students who want to read the C++ code directly skip this article (but I still recommend reading it), the next article will be issued within three days Orz...

1. Interpretation of input and output of padim algorithm onnx model

        When we don't know the structure in the model, it is often a better way to use Netron tools. Netron website address:

Netron https://netron.app/         Drag the model in, and the network structure will be displayed on the interface. The following is the structure of the input and output parts of padim's onnx model:

        It can be seen that the input size is 1*3*256*256. Students who are familiar with deep learning should know that this is a tensor (Tensor), which comes from a 3-channel RGB image, and its length and width are 256 pixels.

        Conclusion 1: The input is a preprocessed 256*256 RGB image.

        The output is also a tensor, but its size is 1*1*256*256. Seeing the data with the same length and width as the original image, we can boldly guess: the output is the probability heat map we want, or some kind of A score map at pixel locations, though not necessarily in its final form.

        Conclusion 2: The output is some kind of score map, and after post-processing, a probability heat map may be obtained.

        In general, the input and output of this model are not complicated, which is a good thing for our algorithm deployment.

Two, padim algorithm Python code processing flow analysis

        Although we are eager to leave the complex project environment of Anomalib and invest in the next step of C++ deployment. But before that, we must understand the whole process of running the Python code before we can complete the rewriting of C++. This process includes four parts: preprocessing, inference, postprocessing and visualization.

        Since the onnx model is used for inference, according to the official website tutorial, tools/inference/openvino_inference.py should be used for inference here. Open the py file, it is easy to find the following code segment in the infer function:

    for filename in filenames:
        image = read_image(filename, (256, 256))
        predictions = inferencer.predict(image=image)
        output = visualizer.visualize_image(predictions)

        These few lines of code are the root of our exploration of the reasoning process. First use read_image to read in the image, then call the predict method of the inferencer to get the inference result, and finally use the visualizer to visualize the inference result. The above is the process to be restored when using C++ deployment.

        read_image is relatively simple, you can read the source code by yourself. Here it starts from the inferencer. The inferencer here is instantiated by OpenVINOInferencer, read its predict method, you can find the following code:

        processed_image = self.pre_process(image_arr)              # 预处理
        predictions = self.forward(processed_image)                # 预测   
        output = self.post_process(predictions, metadata=metadata) # 后处理

        return ImageResult(
            image=image_arr,
            pred_score=output["pred_score"],
            pred_label=output["pred_label"],
            anomaly_map=output["anomaly_map"],
            pred_mask=output["pred_mask"],
            pred_boxes=output["pred_boxes"],
            box_labels=output["box_labels"],                       # 返回的各项参数

        It can be seen that the output result (dictionary) is obtained after the image has been pre-processed, predicted and post-processed, and the return value is each parameter. According to the names of the parameters of the return value, we can know that the function returns the predicted score, label category, abnormal map, etc.

2.1 Preprocessing part

        First look at the pre_process section, which is defined in openvino_inferencer.py:

    def pre_process(self, image: np.ndarray) -> np.ndarray:
        """Pre process the input image by applying transformations.

        Args:
            image (np.ndarray): Input image.

        Returns:
            np.ndarray: pre-processed image.
        """
        transform = A.from_dict(self.metadata["transform"])
        processed_image = transform(image=image)["image"]

        if len(processed_image.shape) == 3:
            processed_image = np.expand_dims(processed_image, axis=0)

        if processed_image.shape[-1] == 3:
            processed_image = processed_image.transpose(0, 3, 1, 2)

        return processed_image

        The core of this method is transform. By looking at the metadata part of the code, it can be considered that the image has undergone the same standardized preprocessing as ImageNet, that is, the mean value of RGB is [0.406, 0.456, 0.485], and the variance is [0.225, 0.224, 0.229]. After normalization, its size needs to be modified as shown in the Python code.

        Conclusion 3: The preprocessing step includes processing the image according to ImageNet normalization, and processing the size of the image.

2.2 Prediction part

        Next, look at the forward section, just below the pre_process section:

    def forward(self, image: np.ndarray) -> np.ndarray:
        """Forward-Pass input tensor to the model.

        Args:
            image (np.ndarray): Input tensor.

        Returns:
            np.ndarray: Output predictions.
        """
        return self.network.infer(inputs={self.input_blob: image})

        This part is to send the pre_process image into the model for prediction, which is easy to understand. The previous blog said that the internal principles of the neural network are not entangled here, just use it as a black box. After debugging and checking, it is found that the output is indeed a score image with the same size as the original image, which represents the score of each pixel position. The higher the score, the more likely it is an abnormal area.

2.3 Post-processing part

        Then comes the post-processing section, which is the most complex of the four.

    def post_process(self, predictions: np.ndarray, metadata: dict | DictConfig | None = None) -> dict[str, Any]:
        """Post process the output predictions.

        Args:
            predictions (np.ndarray): Raw output predicted by the model.
            metadata (Dict, optional): Meta data. Post-processing step sometimes requires
                additional meta data such as image shape. This variable comprises such info.
                Defaults to None.

        Returns:
            dict[str, Any]: Post processed prediction results.
        """
        if metadata is None:
            metadata = self.metadata

        predictions = predictions[self.output_blob]

        # Initialize the result variables.
        anomaly_map: np.ndarray | None = None
        pred_label: float | None = None
        pred_mask: float | None = None

        # If predictions returns a single value, this means that the task is
        # classification, and the value is the classification prediction score.
        if len(predictions.shape) == 1:
            task = TaskType.CLASSIFICATION
            pred_score = predictions
        else:
            task = TaskType.SEGMENTATION
            anomaly_map = predictions.squeeze()
            pred_score = anomaly_map.reshape(-1).max()

        # Common practice in anomaly detection is to assign anomalous
        # label to the prediction if the prediction score is greater
        # than the image threshold.
        if "image_threshold" in metadata:
            pred_label = pred_score >= metadata["image_threshold"]

        if task == TaskType.CLASSIFICATION:
            _, pred_score = self._normalize(pred_scores=pred_score, metadata=metadata)
        elif task in (TaskType.SEGMENTATION, TaskType.DETECTION):
            if "pixel_threshold" in metadata:
                pred_mask = (anomaly_map >= metadata["pixel_threshold"]).astype(np.uint8)

            anomaly_map, pred_score = self._normalize(
                pred_scores=pred_score, anomaly_maps=anomaly_map, metadata=metadata
            )
            assert anomaly_map is not None

            if "image_shape" in metadata and anomaly_map.shape != metadata["image_shape"]:
                image_height = metadata["image_shape"][0]
                image_width = metadata["image_shape"][1]
                anomaly_map = cv2.resize(anomaly_map, (image_width, image_height))

                if pred_mask is not None:
                    pred_mask = cv2.resize(pred_mask, (image_width, image_height))
        else:
            raise ValueError(f"Unknown task type: {task}")

        if self.task == TaskType.DETECTION:
            pred_boxes = self._get_boxes(pred_mask)
            box_labels = np.ones(pred_boxes.shape[0])
        else:
            pred_boxes = None
            box_labels = None

        return {
            "anomaly_map": anomaly_map,
            "pred_label": pred_label,
            "pred_score": pred_score,
            "pred_mask": pred_mask,
            "pred_boxes": pred_boxes,
            "box_labels": box_labels,
        }

        As mentioned in the previous blog, since we use a self-made data set, the task is classification, so all code segments whose TaskType is SEGMETATION and DETECTION are ignored. The code in this part can be shortened a lot:

    def post_process(self, predictions: np.ndarray, metadata: dict | DictConfig | None = None) -> dict[str, Any]:
        """Post process the output predictions.

        Args:
            predictions (np.ndarray): Raw output predicted by the model.
            metadata (Dict, optional): Meta data. Post-processing step sometimes requires
                additional meta data such as image shape. This variable comprises such info.
                Defaults to None.

        Returns:
            dict[str, Any]: Post processed prediction results.
        """
        if metadata is None:
            metadata = self.metadata

        predictions = predictions[self.output_blob]

        # Initialize the result variables.
        anomaly_map: np.ndarray | None = None
        pred_label: float | None = None
        pred_mask: float | None = None

        # If predictions returns a single value, this means that the task is
        # classification, and the value is the classification prediction score.
        if len(predictions.shape) == 1:
            task = TaskType.CLASSIFICATION
            pred_score = predictions

        # Common practice in anomaly detection is to assign anomalous
        # label to the prediction if the prediction score is greater
        # than the image threshold.
        if "image_threshold" in metadata:
            pred_label = pred_score >= metadata["image_threshold"]

        if task == TaskType.CLASSIFICATION:
            _, pred_score = self._normalize(pred_scores=pred_score, metadata=metadata)
        
        pred_boxes = None
        box_labels = None

        return {
            "anomaly_map": anomaly_map,
            "pred_label": pred_label,
            "pred_score": pred_score,
            "pred_mask": pred_mask,
            "pred_boxes": pred_boxes,
            "box_labels": box_labels,
        }

        Read the source code and find that the meaningful output of this part is only pred_score, and the real processing step is only one line:

_, pred_score = self._normalize(pred_scores=pred_score, metadata=metadata)

        Entering the _normalize section, you can see that the input pred_scores is a tensor. In fact, pred_scores is a probability score map with the same size as the original image. Similarly, the input pred_scores are only processed one step, namely:

            pred_scores = normalize_min_max(
                pred_scores,
                metadata["image_threshold"],
                metadata["min"],
                metadata["max"],
            )

        Then enter the normalize_min_max part, you can see that the function has processed pred_scores as follows:

normalized = ((targets - threshold) / (max_val - min_val)) + 0.5

        Where do the max_val and min_val here come from? Open the training result folder results/padim/tube/run/weights/onnx/metadata.json, you can see the following information at the end of the file (tube is the name of my own dataset):

    "image_threshold": 13.702226638793945,
    "pixel_threshold": 13.702226638793945,
    "min": 5.296699047088623,
    "max": 22.767864227294922

        min is min_val, max is max_val, and its meanings are the minimum and maximum values ​​in the pred_score score map, image_threshold is the calculated threshold, and the pixel position in the score map greater than the threshold is considered to be an abnormal area (defect) , the area smaller than this threshold is considered as a normal area. After this step of standardization, pred_scores is output.

        Conclusion 4: The input of the post-processing part is the score map obtained by the prediction part, and the output is the standardized pred_scores.

2.4 Visualization part

        At this point, the data processing part is over, and the next step is how to visualize the data in the form of a probability heat map. Going back to openvino_inference.py, you can see that the visualizer calls the visualize_image method to process the data result predictions, and uses the show method for visualization.

    for filename in filenames:
        image = read_image(filename, (256, 256))
        predictions = inferencer.predict(image=image)
        output = visualizer.visualize_image(predictions)

        if args.output is None and args.show is False:
            warnings.warn(
                "Neither output path is provided nor show flag is set. Inferencer will run but return nothing."
            )

        if args.output:
            file_path = generate_output_image_filename(input_path=filename, output_path=args.output)
            visualizer.save(file_path=file_path, image=output)

        # Show the image in case the flag is set by the user.
        if args.show:
            visualizer.show(title="Output Image", image=output)

         Enter the visualize_image method. We set the display mode to full in the config.yaml file before, and the _visualize_full method is used in the visualize_image method.

        if self.mode == "full":
            return self._visualize_full(image_result)

        Progress layer by layer, enter the _visualize_full method, same as above, only need to pay attention to the code segment whose task is CLASSIFICATION. In the _visualize_full method, you can see the following code:

        elif self.task == TaskType.CLASSIFICATION:
            visualization.add_image(image_result.image, title="Image")
            if hasattr(image_result, "heat_map"):
                visualization.add_image(image_result.heat_map, "Predicted Heat Map")
            if image_result.pred_label:
                image_classified = add_anomalous_label(image_result.image, image_result.pred_score)
            else:
                image_classified = add_normal_label(image_result.image, 1 - image_result.pred_score)
            visualization.add_image(image=image_classified, title="Prediction")

        The function of visualization.add_image is actually to add pictures to the result of results/padim/tube/run/images. The three pictures added are "Image", "Predicted Heat Map" and "Prediction", which exactly correspond to the output 1*3 Result graph:

         The most concerned thing here is how the Predicted Heat Map is drawn. Enter image_result.heat_map and find that it is generated by calling the superimpose_anomaly_map function:

self.heat_map = superimpose_anomaly_map(self.anomaly_map, self.image, normalize=False)

         Enter the superimpose_anomaly_map function again, and its code segment is as follows:

def superimpose_anomaly_map(
    anomaly_map: np.ndarray, image: np.ndarray, alpha: float = 0.4, gamma: int = 0, normalize: bool = False
) -> np.ndarray:
    """Superimpose anomaly map on top of in the input image.

    Args:
        anomaly_map (np.ndarray): Anomaly map
        image (np.ndarray): Input image
        alpha (float, optional): Weight to overlay anomaly map
            on the input image. Defaults to 0.4.
        gamma (int, optional): Value to add to the blended image
            to smooth the processing. Defaults to 0. Overall,
            the formula to compute the blended image is
            I' = (alpha*I1 + (1-alpha)*I2) + gamma
        normalize: whether or not the anomaly maps should
            be normalized to image min-max


    Returns:
        np.ndarray: Image with anomaly map superimposed on top of it.
    """

    anomaly_map = anomaly_map_to_color_map(anomaly_map.squeeze(), normalize=normalize)
    superimposed_map = cv2.addWeighted(anomaly_map, alpha, image, (1 - alpha), gamma)
    return superimposed_map

        The English notes here are well written. In fact, the probability heat map is the image drawn by superimposing the unprocessed original image and the processed anomaly_map with a certain weight. Before superimposing, the anomaly_map_to_color_map function needs to be processed on the input anomaly_map, and anomaly_map_to_color_map is to convert anomaly_map into a grayscale image in uint8 format (pixel value 0-255), and then draw a pseudo-color image according to the grayscale value:

anomaly_map = cv2.applyColorMap(anomaly_map, cv2.COLORMAP_JET)

        After overlay processing, a clear and bright defect probability heat map is produced.

        Conclusion 5: During the visualization process, the probability heat map is an overlay image of the original image and the pseudo-color anomaly_map.

        So far, we have pulled through the entire process in the interpretation of Python code: from input image to preprocessing, prediction, postprocessing and visualization, and understand the drawing method of probability heat map.

3. Summary and Outlook

        If you want to deploy the model in C++, the time-consuming and cumbersome code reading process of this blog is indispensable. Only by understanding the logic of the Python code in the original project and stripping off the complete process, can it be reproduced in C++. The next blog will focus on C++ code and explain how to use the OnnxRuntime engine to complete the deployment of the model. Thanks for reading and following~

Guess you like

Origin blog.csdn.net/m0_57315535/article/details/131688951