Article directory
foreword
This paper mainly introduces several commonly used evaluation criteria in the field of medical image segmentation: Dice Loss, Sensitivity & Specificity, Hausdorff distance, Average surface distance
etc.
1. Dice Loss
1.1. Say the coefficient
Dice
Coefficient, which is a set similarity measurement function, is usually used to calculate the similarity between two sample points (the value range is [ 0 , 1 ] [0, 1][0,1 ] ), the larger the value, the more similar the two samples are
- For the segmentation problem, it is 1 for the best segmentation and 0 for the worst
- It is used to solve the problem of sample imbalance, but it is unstable and prone to gradient explosion
Calculation formula:
D ice = 2 ∣ X ∩ Y ∣ ∣ X ∣ + ∣ Y ∣ Dice = \frac{2|X \cap Y|}{|X| + |Y|}Dice=∣X∣+∣Y∣2∣X∩Y∣
Parameter meaning:
- ∣ X ∩ Y ∣ |X\cap Y| ∣X∩Y ∣ meansXXX和YYThe number of intersection elements between Y
- ∣ X ∣ |X| ∣ X ∣和∣ Y ∣ |Y|∣ Y ∣ meansXXX、YYthe number of elements in Y
- Among them, the coefficient 2 in the numerator is due to the double calculation of the denominator XXX和YYCause of common elements between Y
- Sometimes an optional parameter is added to both the numerator and denominator: Laplace smoothing
- Avoid when ∣ X ∣ |X|∣ X ∣和∣ Y ∣ |Y|∣ Y ∣ When both are 0, the numerator is divided by 0
- reduce overfitting
1.2. F1 score - Dice
Truth\Classified | Positive | Negative |
---|---|---|
Positive | True Positive | False Negative |
Negative | False Positive | True Negative |
- Precision
- Indicates the probability of actually being 1 in a sample predicted to be 1
P = TPTP + FPP = \frac{TP}{TP + FP}P=TP+FPTP
- Indicates the probability of actually being 1 in a sample predicted to be 1
- Recall
- Indicates the probability of predicting 1 in a sample that is actually 1
R = TPTP + FNR = \frac{TP}{TP + FN}R=TP+FNTP
- Indicates the probability of predicting 1 in a sample that is actually 1
- Precision and Recall often restrict each other
- If the Precision of the model is increased, the Recall of the model will be reduced ;
- Increasing the Recall of the model will reduce the Precision of the model
In the binary classification problem, Dice coefficient
it can also be written as:
D ice = 2 TPFP + 2 TP + FN = F 1 score Dice = \frac{2TP}{FP + 2TP +FN} = F1scoreDice=FP+2TP+FN2TP=F1score
1.3. Dice Loss
Dice Loss
The mathematical expression is as follows:
D ice L oss = 1 − D ice = 1 − 2 ∣ X ∩ Y ∣ ∣ X ∣ + ∣ Y ∣ DiceLoss = 1 - Dice = 1 - \frac{2|X \cap Y|}{ |X| + |Y|}DiceLoss=1−Dice=1−∣X∣+∣Y∣2∣X∩Y∣
When Dice Loss
used in medical image segmentation problems, the meaning of the parameters:
- X X X represents the pixel label of the real segmented image
- YYY represents the pixel category of the model predicting the segmented image
- ∣ X ∩ Y ∣ |X \cap Y| ∣X∩Y ∣ is approximately the dot product between the pixels of the predicted image and the pixels of the true label image, and the dot product results are summed
- ∣ X ∣ |X| ∣ X ∣和∣ Y ∣ |Y|∣ Y ∣ are respectively approximated by the addition of pixels in their respective corresponding images
For the binary classification problem, the pixels of the real segmentation label image are only 0 00, 1 1 1 two values, so∣ X ∩ Y ∣ |X \cap Y|∣X∩Y∣ can effectively zero out all pixel values in the predicted segmentation image that are not activated in the ground truth segmentation label image. For the activated pixels, it mainly penalizes low-confidence predictions, and high-confidence predictions will get higher coefficientDice
, so as to get lowerDice Loss
, namely:
D ice L oss = 1 − 2 ∑ i = 1 N yiyi ^ ∑ i = 1 N yi + ∑ i = 1 N yi ^ DiceLoss = 1 - \frac{2\sum_{ i=1}^N y_i \hat{y_i}}{\sum_{i=1}^N y_i + \sum_{i=1}^N \hat{y_i}}DiceLoss=1−∑i=1Nyi+∑i=1Nyi^2∑i=1Nyiyi^
Parameter meaning:
- y i y_i yimeans pixel iilabel value of i
- y i ^ \hat{y_i} yi^means pixel iipredicted value of i
- N N N is the total number of pixels, equal to the number of pixels of a single image multiplied by batchsize
Dice Loss
It can alleviate the negative impact caused by the imbalance of the foreground and background (area) in the sample. The imbalance of the foreground and background means that most areas in the image do not contain the target, and only a small part of the area contains the target. Dice Loss
The training pays more attention to the mining of the foreground area, that is, it is guaranteed to have a lower FN
, but there will be a loss saturation problem. Therefore, using alone Dice Loss
often does not achieve good results, and needs to be used in combination, such as Dice Loss+CE Loss
or Dice Loss+Focal Loss
etc.
2. Sensitivity & Specificity
Truth\Classified | Positive | Negative |
---|---|---|
Positive | True Positive | False Negative |
Negative | False Positive | True Negative |
- TP : P means your predicted Positive, T (True) means your prediction is correct, TP means you predict positive samples as positive samples
- FP : P means your predicted Positive, F (False) means your prediction is wrong, FP means you predict negative samples as positive samples
- TN : N means your predicted Negative, T (True) means your prediction is correct, TN means you predict negative samples as negative samples
- FN : N means your predicted Negative, F (False) means your prediction is wrong, FP means you predicted positive samples as negative samples
- FP + TP = all samples classified as positive
- TP + FN = True Positives + False Negatives = all samples that are really positive
2.1. Sensitivity
TPR : True positive rate, describing the proportion of all positive examples identified to all positive examples
Calculation formula:
TPR = TPTP + FN TPR = \frac{TP}{TP+ FN}TPR=TP+FNTP
It can be understood as the probability that the patient is actually sick and is correctly diagnosed, that is, high sensitivity = low missed diagnosis rate (but many false ones)
2.2. Specificity
FPR : False positive rate, which describes the proportion of negative cases identified as positive cases to all negative cases
F P R = F P F P + T N FPR = \frac{FP}{FP + TN} FPR=FP+TNFP
It can be understood as the probability that the patient is not sick and is correctly diagnosed, that is, low specificity = high misdiagnosis rate (that is, many false negatives)
3. Hausdorff distance
3.1. Concept
Hausdorff distance
is the distance between two subsets in the metric space, which transforms the non-empty subset of the metric space itself into the metric space.
Informally, two sets are close in if every point of one set is close Hausdorff distance
to . Hausdorff distance
Refers to the longest distance an opponent chooses a point in one of two sets and must then travel from there to the other set. In other words, it is the greatest of all distances from a point in one set to the nearest point in the other set.
Suppose there are two sets:
A = { a 1 , a 2 , ⋯ , ap } , B = { b 1 , b 2 , ⋯ , bp } A = \{ a^1, a^2, \cdots, a^ p \}, \quad B = \{ b^1, b^2, \cdots, b^p \}A={
a1,a2,⋯,ap},B={
b1,b2,⋯,bp}
3.2. One-way Hausdorff distance
计算公式:
h ( A , B ) = max a ∈ A min b ∈ B ∣ ∣ a − b ∣ ∣ h ( B , A ) = max b ∈ A min a ∈ B ∣ ∣ b − a ∣ ∣ h(A, B) = \displaystyle\max_{a \in A}\displaystyle\min_{b \in B} || a - b || \\ h(B, A) = \displaystyle\max_{b \in A}\displaystyle\min_{a \in B} || b - a || h(A,B)=a∈Amaxb∈Bmin∣∣a−b∣∣h(B,A)=b∈Amaxa∈Bmin∣∣b−a∣∣
Parameter meaning:
- ∣ ∣ a − b ∣ ∣ || a - b || ∣∣a−b ∣∣ represents the Euclidean distance between a and b
- h ( A , B ) h(A, B) h(A,B ) is also called forward
Hausdorff distance
,h ( B , A ) h(B, A)h(B,A ) also called backwardHausdorff distance
h ( A , B ) h(A, B)h(A,B ) Understanding:
- First take the point bjb^j closest to set A in set Bbj , and then calculate aia^ifor each point in the set Aai andbjb^jbThe distance between j , sort the distance, and then take the value with the largest distance ash ( A , B ) h(A, B)h(A,B ) value
- 若 h ( A , B ) = d h(A, B) = d h(A,B)=d , meansAAAll points in A to BBThe distance of the set B does not exceedddd
- It should be noted that the Hausdorff distance is directional (or asymmetric), which means that most cases h ( A , B ) h(A, B)h(A,B ) is not equal toh ( B , A ) h(B, A)h(B,A)
Illustration:
- Given two point sets AAA andBBB , find their Hausdorff distanceh ( A , B ) h(A, B)h(A,B)
- Calculate a 1 a_1a1and b 1 , b 2 , b 3 b_1, b_2, b_3b1,b2,b3The distances d 11 , d 12 , d 13 d_{11}, d_{12}, d_{13}d11,d12,d13
- Keep the shortest distance d 11 d_{11}d11
- Calculate a 2 a_2a2and b 1 , b 2 , b 3 b_1, b_2, b_3b1,b2,b3The distances d 21 , d 22 , d 23 d_{21}, d_{22}, d_{23}d21,d22,d23
- Keep the shortest distance d 23 d_{23}d23
- d 11 d_{11} d11and d 23 d_{23}d23The larger one is the Hausdorff distance h ( A , B ) = d ( a 1 , b 1 ) h(A, B) = d(a_1, b_1)h(A,B)=d(a1,b1)
- we can get AAAny point in A to BBThe distance between points in part B , at mosth ( A , B ) h(A, B)h(A,B)
3.3. Two-way Hausdorff distance
Calculation formula:
H ( A , B ) = max { h ( A , B ) , h ( B , A ) } H(A, B) = max\{ h(A, B), h(B, A) \ }H(A,B)=max{
h(A,B),h(B,A)}
The two-way Hausdorff distance takes the maximum value of the one-way Hausdorff distance, which measures the degree of dissimilarity between two point sets (the smaller the two-way Hausdorff distance, the higher the matching degree)
3.4. Partial Hausdorff distance
However, when the image has noise pollution or occlusion, the above-mentioned Hausdorff distance can easily cause a mismatch, as shown in the following figure:
The closest point bj b_j in set B to set Abj, distance bj b_j in set AbjThe farthest point is a 2 a_2a2, but due to noise, the Hausdorff distance does not take a 2 a_2a2with bj b_jbjThe distance between, but noise and bj b_jbjdistance between , resulting in an error.
- Partial one-way Hausdorff distance:
- Calculation formula:
- h f F ( A , B ) = f F th a i ∈ A min b j ∈ B ∣ ∣ a i − b j ∣ ∣ h^{f_F}(A, B) = f_F \displaystyle\th_{a^i \in A} \displaystyle\min_{b_j \in B}|| a^i - b^j || hfF(A,B)=fFthai∈Abj∈Bmin∣∣ai−bj∣∣
- h f R ( B , A ) = f R th b j ∈ B min a i ∈ A ∣ ∣ b j − a i ∣ ∣ h^{f_R}(B, A) = f_R \displaystyle\th_{b^j \in B} \displaystyle\min_{a_i \in A}|| b^j - a^i || hfR(B,A)=fRthbj∈Bai∈Amin∣∣bj−ai∣∣
- Parameter meaning:
- f F , f R ∈ [ 0 , 1 ] f_F, f_R \in [0, 1] fF,fR∈[0,1 ] are called the forward score and the backward score respectively, which control the forward distance and the backward distance
- t h th t h means sort
- 当 f F = f R = 1 f_F = f_R = 1 fF=fR=1 , the formula degenerates into the original one-way Hausdorff distance
- Calculation formula:
- Partial two-way Hausdorff distance:
- H f F f R ( A , B ) = m a x { h f F ( A , B ) , h f R ( B , A ) } H^{f_F f_R} (A, B) = max\{ h^{f_F}(A, B), h^{f_R}(B, A) \} HfFfR(A,B)=max{ hfF(A,B),hfR(B,A)}
4. Average surface distance
4.1. Concept
Mean surface distance This indicator is the average of the surface distances of all points in P, and this indicator can also be called Average Symmetric Surface Distance (ASSD).
By encoding the voxel data, the distance between voxel points and voxel points, a lookup table is established, which greatly reduces the amount of calculation and algorithm complexity, and thus calculates the distance between points. It should be noted here that the distance calculation between voxels is calculated in unit volume
- ASD calculation formula:
- X X All points in X set to YYAverage of Y -set surface distances, point xxx to setYYThe distance of Y , is the pointxxx到YYY 最近的距离
A S D ( X , Y ) = ∑ x ∈ X m i n y ∈ Y d ( x , y ) / ∣ X ∣ ASD(X, Y) = \displaystyle\sum_{x \in X} min_{y \in Y} d(x, y) / |X| ASD(X,Y)=x∈X∑miny∈Yd(x,y)/∣X∣ - Parameter meaning:
- d ( x , y ) d(x, y) d(x,y ) is composed of two image volumesXXX和YY3D matrix of Euclidean distances between Y
- X X All points in X set to YYAverage of Y -set surface distances, point xxx to setYYThe distance of Y , is the pointxxx到YYY 最近的距离
- ASSSD Calculation Formula
- X X X到YYthe mean surface distance of Y , and YYY toXXAverage ASSSD ( X
, Y ) = { ASD ( X , Y ) + ASD ( Y , X ) } / 2 ASSSD(X, Y ) = \{ ASD(X, Y ) + ASD(Y , X) \} / 2ASSD(X,Y)={ ASD(X,Y)+ASD(Y,X)}/2
- X X X到YYthe mean surface distance of Y , and YYY toXXAverage ASSSD ( X
4.2. Calculation process
Enter A and B:
- Similar to
marching cube
the isosurface , establish the normal vector of the plane formed by the intersection points, and obtain
voxel spacing
the lookup table (a list with a length of 256) according to the sum of the normal vectorssurface distance
- Judgment data type (bool), valid area of crop
- Encoded by
kernel
convolution (denoted asCODE
) - Change the pixel value greater than 0 to 1, and at the same time require that the pixel value is not 255 (
borders
) - Distance conversion, calculate the distance from a non-zero point in the image to the nearest background point (ie 0), and construct a distance map
- Take
CODE
as input, 0~255 corresponds to the element of the index, map with the lookup table, take the area distance value, and form the area distance map (surface map
) - Take the pixels of A
borders
greater than 0 as the index (equivalent to the process of point selection), and obtain the mapping result of the distance of B, that is,border
the distance map between the voxels in the A set and B (denoted as(distances_ map, 1 dimension)
) - Take the pixel of A
borders
greater than 0 as the index, and obtainsurface_ map
the mapping result of A (denoted as(surfel map, 1 dimension)
)- Equivalent to obtaining a voxel boundary surface area distance map
- Calculate the mean value:
(distances map * srufel map) / sum(surfel map)
, that is, we getASD
- It can be seen that the boundary surface area distance map here plays a role of weight, which can effectively smooth the error caused by the sharp area
Summarize
It mainly summarizes several commonly used model evaluation criteria in the field of medical image segmentation, and will be supplemented and modified according to the indicators encountered in subsequent papers