Table of contents
1 Introduction
In semantic segmentation, performance indicators can be calculated using the confusion matrix
The method implemented here is different from that in image classification. If you need it, you can refer to: Confusion Matrix
The test data used here is as follows:
2. Create a confusion matrix
The implementation of the confusion matrix is as follows
init is to initialize the confusion matrix
update Update the value of the confusion matrix
reset resets the value of the matrix to zero
compute calculates the corresponding performance indicators based on the confusion matrix calculated by update
str is the returned string, which is the value of print after instantiating the confusion matrix
Here we explain that the test data of the confusion matrix class are all above:
2.1 update method
as follows:
a is passed in the real label, and b is passed in the value predicted by the network. Note: the predicted value here is also an integer array like the label
First, assign the size of the confusion matrix to n ( the number of classification categories + 1 background ) through the init initialization method, and then create the confusion matrix mat, which is initialized to 0 first.
Next, k finds the corresponding index in the real label a,
The purpose here is to set the uninteresting region to False, and the rest of the specific segmentation labels should be sorted in the order of 1, 2, and 3. Because usually 0 is the background and 255 is the area of no interest.
For example, when the segmentation category is 2(1,2), then adding the background n is 3(0,1,2), and the uninteresting area is set to 255(0,1,2,255). Then the index of k in the real label a(0,1,2,255) will set a>=0 & a<n, that is, the area of 0, 1, 2 to True, thus satisfying the segmentation requirements and shooting Out of 255 not interested
Therefore, in the dataset loading data, the foreground should be sorted from 1, 2, 3
Then, through the following operations, the confusion matrix with the abscissa as true and the ordinate as pred can be updated
The inds in the middle is probably to change a and b into one-dimensional vectors, then n*a will change the one-dimensional vectors into a group of n, and then perform calculations in it, and finally reshape them into n*n matrices up. Specifically, you can debug it yourself
For example, true = 1, the number of pred = 0 is one, and the value in the confusion matrix is also 1 (row 0, column 1)
2.2 compute method
Compute is to use the confusion matrix generated by update to calculate the performance indicators in the segmentation task. For the performance indicators of the segmentation task, you can check: Common evaluation indicators for semantic segmentation
Confusion matrix: the abscissa is true, the ordinate is pred
Pixel accuracy = diagonal of confusion matrix / sum of confusion matrix
acc here refers to the recall rate of each category = the value of each diagonal / true value (the behavior of the matrix is ture, so sum the rows)
recall The recall rate is the number of recalls, ... is the label, and the number of recalls is the number of correct predictions. So the recall rate is the proportion of the correct number of predictions in the label
iou is the value of each diagonal / (corresponding row + corresponding column - the value of the repeated diagonal)
2.3 str method
The str method in the python class is to return the value of the print of the instantiated class
Therefore, the str method in the confusion matrix class returns the performance index calculated by compute.
Because the str method here automatically calls compute, and compute is calculated based on update. So before calling str, be sure to call the update method to update the value of the confusion matrix
The recall and iou here are for different categories, so the return is a list
3. Test
The code for the test is as follows:
The samples tested are:
Here manually calculate the parameters of the segmentation and verify the confusion matrix
The first is pixel accuracy: 4 / 9 = 0.4444
Then there is the recall rate of each category: here are three categories 0 1 2
Then iou:
For 0: 1/3 = 0.3333
For 1: 1/6 = 0.1667
For 2: 2 / 5 = 0.4
Finally, mean iou is the mean of iou: (0.3333+0.1667+0.4) / 3 = 0.9 / 3 = 0.3
4. Complete code
The code for the confusion matrix:
import torch
# 混淆矩阵
class ConfusionMatrix(object):
def __init__(self, num_classes):
self.num_classes = num_classes # 分类个数(加了背景之后的)
self.mat = None # 混淆矩阵
def update(self, a, b): # 计算混淆矩阵,a = Ture,b = Predict
n = self.num_classes
if self.mat is None: # 创建混淆矩阵
self.mat = torch.zeros((n, n), dtype=torch.int64, device=a.device)
with torch.no_grad():
k = (a >= 0) & (a < n)
inds = n * a[k].to(torch.int64) + b[k] # 统计像素真实类别a[k]被预测成类别b[k]的个数(这里的做法很巧妙)
self.mat += torch.bincount(inds, minlength=n**2).reshape(n, n)
def reset(self):
if self.mat is not None:
self.mat.zero_()
def compute(self): # 计算分割任务的性能指标
h = self.mat.float()
acc_global = torch.diag(h).sum() / h.sum() # 计算全局预测准确率(混淆矩阵的对角线为预测正确的个数)
acc = torch.diag(h) / h.sum(1) # 计算每个类别的 recall
iou = torch.diag(h) / (h.sum(1) + h.sum(0) - torch.diag(h)) # 计算iou
return acc_global, acc, iou
def __str__(self):
acc_global, acc, iou = self.compute()
return (
'global correct: {:.4f}\n'
'recall: {}\n'
'IoU: {}\n'
'mean IoU: {:.4f}').format(
acc_global.item() ,
['{:.4f}'.format(i) for i in acc.tolist()],
['{:.4f}'.format(i) for i in iou.tolist()],
iou.mean().item())
Tested code:
confmat = ConfusionMatrix(num_classes=3) # 实例化混淆矩阵
ture = torch.LongTensor([[1,2,1],[0,2,2],[0,1,1]])
pred = torch.LongTensor([[1,2,0],[1,2,1],[0,2,2]])
confmat.update(ture, pred) # update 混淆矩阵的值
print(confmat)
'''
global correct: 0.4444
recall: ['0.5000', '0.2500', '0.6667']
IoU: ['0.3333', '0.1667', '0.4000']
mean IoU: 0.3000
'''