Common evaluation indicators for medical image segmentation (single target) - including source code explanation and indicator defects

1 Know 4 common indicators, TP, TN, FP, FN

insert image description here

  1. TP: True Positive , it is judged as a positive sample, in fact it is also a positive sample, that is, the intersection of blue and red
  2. TN: True Negative , which is judged as a negative sample, is actually a negative sample, that is, the area other than red and blue
  3. FP: False Positive , it is judged as a positive sample, but it is actually a negative sample, that is, the red part except the blue part
  4. FN: False Negative , it is judged as a negative sample, but in fact it is a positive sample, that is, the blue part except the red part

2 Evaluate the accuracy of segmented regions

2.1 Recall Sensitivity TPR(True Positive Rate)

Definition:
insert image description here
Vernacular:
It is the ratio of the area of ​​the real target area in the predicted area to the total target area, 1 is the best, and 0 is the best;
advantage:
it can clearly get the correct proportion of the area of ​​the segmented area to the real area.

defect:

insert image description here

Are the indicators high? High, is it good to divide? No, that's the flaw.

2.2 Specificity (True Negative Rate)

Definition:
insert image description here
Vernacular:
It is the other side of Recall. For single-target segmentation, Recall and Specificity are a transposition, and 1 is the best, and 0 is the most pulling.
Advantages:
You can verify from the side the ratio of the total area outside the target area outside the area you have divided. Defects
:
insert image description here

Suppose the gray is 100, the green is 10, and the blue is 20, then the TNR is more than 0.90. What is the index? It is still a match, and the reality is a mess

2.3 Precision (PPV, accuracy rate)

Definition:
insert image description here
Vernacular:
the proportion of the actual predicted correct part to the total area of ​​the predicted part, and 1 is the best, and 0 is the best.
Advantages:
You can clearly know the ratio of the correct part of your prediction to your predicted part
Defects:
insert image description here

Is the PPV high? It's very high, and the red area occupies a high proportion of the entire small circle, but is the segmentation effect good? not good.

2.4 Dice Coefficient

definition:

insert image description here
Vernacular:
It is the ratio of the area where two objects intersect to the total area, and 1 is the best, and 0 is the best.
Advantages:
Looking at the formula, everyone can say it by themselves, mainly because I think again, why use 2TP? Directly, the upper and lower two are TP, and the indicators can be seen!
Defect:
insert image description here
Defect:
This is similar to the defect of Recall, but the introduction of FP will slightly neutralize the defect of recall, but it can also be seen that this indicator is also possible, but the actual effect segmentation effect is not very good.

2.5 Jaccard Coefficient (IOU)

definition:
insert image description here

Vernacular:
It is very similar to Dice, and it also answered my previous doubts. Haha, you can use TP, even if you change the name, you can also calculate the overlapping area index, and 1 is the best, and 0 is the best.
Pros:
Refer to Dice
Disadvantages:
Refer to Dice

All the above reference indicators are focused, so there are some inherent defects. This kind of defect is a special example. This is not a problem. You can use more when evaluating the segmentation effect. If several indicators are Okay, then it proves that the segmentation effect is really good. If some indicators are good and some are bad, you can draw a picture yourself to see if it is a special situation.

2.6 New Accuracy (accuracy)

这个是我在写文章时特地查阅的时候才知道这个指标和Precision存在区别
definition:
insert image description here

Vernacular:
There is a certain correlation with Precision in translation, and it is easy to mislead people. This represents the proportion of correctly predicted regions (predicted correct target region and predicted correct background region) occupying all regions.
Advantages:
Chat-GPT original words:

Since the positive samples (regions of interest or target objects) are usually small and the negative samples (background) occupy a large proportion, the accuracy rate may be affected by the negative samples and be high . Therefore, Precision is more commonly used in segmentation tasks because it pays more attention to the accuracy of positive samples in classification results, while Accuracy is used to evaluate the overall accuracy.总结的很到位,很不错

3 Evaluate the segmented boundaries

3.1 Hausdorff_95

Definition:
insert image description here
It is to calculate the distance between two boundaries. The formula is max(min(d(x,y))) , don’t think of it as mind . I remember that I wrote the flow chart for the specific calculation process. Why is it gone? , Could it be written in a dream? But it can get to the point, the bigger the value is, the worse it is, and the smaller it is, the better. Throw a link and take a look: https://www.cnblogs.com/icmzn/p/8531719.html

3.2 Continuously updating

4 source code

4.1 Direct use of third-party packages

insert image description here
Install:

pip install medpy

use:

from medpy import metric

def calculate_metric_percase(pred, gt):
    dice = metric.binary.dc(pred, gt)
    jc = metric.binary.jc(pred, gt)
    hd = metric.binary.hd95(pred, gt)
    asd = metric.binary.asd(pred, gt)
    return dice, jc, hd, asd

The point is that after the pred and gt passed in here are predicted from the model, you need to use pred.cpu().detach().numpy() to convert the data into the numpy data type on the cpu, and at the same time in the single target segmentation , 传入的数据需要做二值化处理, otherwise there is a problem with the data, please refer to the handwritten code below for details.

4.2 Handwritten code

  1. The following codes are based on the fact that the predict has been binarized when it is passed in, and the target is already a binary matrix
  2. The following codes have been run in my own model, no problem, the key point is how to use it in my own model
  3. The focus is on how to understand the implementation of indicators, and help understand the calculation process of these evaluation indicators
    Recall:
def recall(predict, target): #Sensitivity, Recall, true positive rate都一样
    if torch.is_tensor(predict):   # 模型本身有sigmoid函数
        predict = predict.data.cpu().numpy()
    if torch.is_tensor(target):
        target = target.data.cpu().numpy()
    total_recall = 0.0
    for i in range(len(predict[0])):  # 因为存在batch_size通道,需要切割一下
        pre_split = predict[i]
        tar_split = target[i]
        pre_split = numpy.atleast_1d(pre_split.astype(numpy.bool))  # 非0为True 0为False
        tar_split = numpy.atleast_1d(tar_split.astype(numpy.bool))  
        tp = numpy.count_nonzero(pre_split & tar_split)   # 计算tp
        fn = numpy.count_nonzero(~pre_split & tar_split)  # 计算fn
        try:
            recall = tp / float(tp + fn)  # 根据recall公式计算recall
        except ZeroDivisionError:
            recall = 0.0
        total_recall += recall  

    return total_recall/len(predict[0])  # 进行均值化处理

Dice:

def dice(predict, target):
    if torch.is_tensor(predict):
        predict = predict.data.cpu().numpy()
    if torch.is_tensor(target):
        target = target.data.cpu().numpy()
    total_dice = 0.0
    for i in range(len(predict[0])):
        pre_split = predict[i]
        tar_split = target[i]

        pre_split = numpy.atleast_1d(pre_split.astype(numpy.bool))  
        tar_split = numpy.atleast_1d(tar_split.astype(numpy.bool))

        intersection = numpy.count_nonzero(pre_split & tar_split) #计算非零个数

        size_i1 = numpy.count_nonzero(pre_split)    # 计算gt面积
        size_i2 = numpy.count_nonzero(tar_split)  # 这里直接计算pre面积

        try:
            dice = 2. * intersection / float(size_i1 + size_i2)
        except ZeroDivisionError:
            dice = 0.0
        total_dice += dice

    return total_dice/len(predict[0])

IOU:

def iou(predict, target):
    if torch.is_tensor(predict):   
        predict = predict.data.cpu().numpy()
    if torch.is_tensor(target):
        target = target.data.cpu().numpy()
    total_iou = 0.0
    for i in range(len(predict[0])):
        pre_split = predict[i]
        tar_split = target[i]
        pre_split = numpy.atleast_1d(pre_split.astype(numpy.bool))
        tar_split = numpy.atleast_1d(tar_split.astype(numpy.bool))
        tp = numpy.count_nonzero(pre_split & tar_split)
        fn = numpy.count_nonzero(~pre_split & tar_split)
        fp = numpy.count_nonzero(~tar_split & pre_split)
        try:
            iou = tp / float(tp + fn + fp)
        except ZeroDivisionError:
            iou = 0.0
        total_iou += iou

    return total_iou/len(predict[0])

Recently, I have been working on a medical image segmentation model. I hereby record it. If there are mistakes, you can actively discuss them, and I will continue to improve!

Guess you like

Origin blog.csdn.net/qq_44864833/article/details/127700330