Source resolve semantic segmentation Shu PSPNet "Network Testing"

introduction

This article then went on a semantic segmentation Shu PSPNet source code parsing "network training" , continue to introduce the testing phase semantics division.

After completion of the training model, what kind of strategies for testing is also very important.

Generally the test model into a single scale single scale and multiscale Multi Scale , multi-scale results are generally higher than that of single-scale. In addition, other details such as the whole image is sent to the network, or the use of a sliding window sliding window portion of the feed network of FIG each taking these also affect the test results. Based on the code set forth below.

See complete code https://github.com/speedinghzl/pytorch-segmentation-toolbox/blob/master/evaluate.py

evaluate.py

main

Here is the first half of the main function test, args.wholeit indicates whether multiple scales.

If args.wholeis false, then take a single scale, calling predict_sliding, sliding window .

If args.wholeis true, then take multiscale, call predict_multiscaleand pass [0.75, 1.0, 1.25, 1.5, 1.75, 2.0]as a scaling factor, the whole map to predict .

def main():
    """Create the model and start the evaluation process."""
    args = get_arguments()  #传入参数

    # gpu0 = args.gpu
    os.environ["CUDA_VISIBLE_DEVICES"]=args.gpu
    h, w = map(int, args.input_size.split(',')) #h = 769, w = 769
    if args.whole:
        input_size = (1024, 2048)
    else:
        input_size = (h, w) #(769,769)

    model = Res_Deeplab(num_classes=args.num_classes)   #构造模型
    
    saved_state_dict = torch.load(args.restore_from)    #导入权重
    model.load_state_dict(saved_state_dict) #模型加载权重

    model.eval()    #测试模式
    model.cuda()

    testloader = data.DataLoader(CSDataSet(args.data_dir, args.data_list, crop_size=(1024, 2048), mean=IMG_MEAN, scale=False, mirror=False), 
                                    batch_size=1, shuffle=False, pin_memory=True)

    data_list = []
    confusion_matrix = np.zeros((args.num_classes,args.num_classes))    #构造混淆矩阵 shape(19,19)
    palette = get_palette(256)  #上色板
    interp = nn.Upsample(size=(1024, 2048), mode='bilinear', align_corners=True)    #上采样

    if not os.path.exists('outputs'):
        os.makedirs('outputs')

    for index, batch in enumerate(testloader):
        if index % 100 == 0:
            print('%d processd'%(index))
        image, label, size, name = batch
        #image.shape(1,3,1024,2048)、label.shape(1,1024,2048)、size=[[1024,2048,3]]
        size = size[0].numpy()  #size=[1024,2048,3]
        with torch.no_grad():   #无需梯度回传
            if args.whole:  #若采用整图训练,则调用multiscale方法 output.shape(1024,2048,19)
                output = predict_multiscale(model, image, input_size, [0.75, 1.0, 1.25, 1.5, 1.75, 2.0], args.num_classes, True, args.recurrence)
            else:   #否则采用滑动窗口法
                output = predict_sliding(model, image.numpy(), input_size, args.num_classes, True, args.recurrence)

Here are a look at a single scale predict_slidingdown and multi-scale predict_wholeand predict_multiscaleimplementation.

predict_sliding

The method is to use a window of fixed size, each image from the buckle lower part, to obtain an output feed network. Then the sliding window, an overlapping area 1/3 slide back and forth, the probability of overlapping portion is superimposed. The final probability divided by the total number of overlapping get the average probability of each pixel.

#image.shape(1,3,1024,2048)、tile_size=(769,769)、classes=19、flip=True、recur=1
def predict_sliding(net, image, tile_size, classes, flip_evaluation, recurrence):
    interp = nn.Upsample(size=tile_size, mode='bilinear', align_corners=True)   
    image_size = image.shape    #(1,3,1024,2048)
    overlap = 1/3   #每次滑动的重合率为1/3

    stride = ceil(tile_size[0] * (1 - overlap)) #滑动步长:769*(1-1/3) = 513
    tile_rows = int(ceil((image_size[2] - tile_size[0]) / stride) + 1)  #行滑动步数:(1024-769)/513 + 1 = 2
    tile_cols = int(ceil((image_size[3] - tile_size[1]) / stride) + 1)  #列滑动步数:(2048-769)/513 + 1 = 4
    print("Need %i x %i prediction tiles @ stride %i px" % (tile_cols, tile_rows, stride))
    full_probs = np.zeros((image_size[2], image_size[3], classes))  #初始化全概率矩阵 shape(1024,2048,19)
    count_predictions = np.zeros((image_size[2], image_size[3], classes))   #初始化计数矩阵 shape(1024,2048,19)
    tile_counter = 0    #滑动计数0

    for row in range(tile_rows):    # row = 0,1
        for col in range(tile_cols):    # col = 0,1,2,3
            x1 = int(col * stride)  #起始位置x1 = 0 * 513 = 0
            y1 = int(row * stride)  #        y1 = 0 * 513 = 0
            x2 = min(x1 + tile_size[1], image_size[3])  #末位置x2 = min(0+769, 2048) 
            y2 = min(y1 + tile_size[0], image_size[2])  #      y2 = min(0+769, 1024)
            x1 = max(int(x2 - tile_size[1]), 0)  #重新校准起始位置x1 = max(769-769, 0)
            y1 = max(int(y2 - tile_size[0]), 0)  #                y1 = max(769-769, 0)

            img = image[:, :, y1:y2, x1:x2] #滑动窗口对应的图像 imge[:, :, 0:769, 0:769]
            padded_img = pad_image(img, tile_size)  #padding 确保扣下来的图像为769*769
            # plt.imshow(padded_img)
            # plt.show()
            tile_counter += 1   #计数加1
            print("Predicting tile %i" % tile_counter)
            #将扣下来的部分传入网络,网络输出概率图。
            padded_prediction = net(Variable(torch.from_numpy(padded_img), volatile=True).cuda())   #[x, x_dsn]
            if isinstance(padded_prediction, list):
                padded_prediction = padded_prediction[0]    #x.shape(1,19,97,97)
            padded_prediction = interp(padded_prediction).cpu().data[0].numpy().transpose(1,2,0)    #上采样shape(769,769,19)
            prediction = padded_prediction[0:img.shape[2], 0:img.shape[3], :]   #扣下相应面积 shape(769,769,19)
            count_predictions[y1:y2, x1:x2] += 1    #窗口区域内的计数矩阵加1
            full_probs[y1:y2, x1:x2] += prediction  #窗口区域内的全概率矩阵叠加预测结果

    # average the predictions in the overlapping regions
    full_probs /= count_predictions #全概率矩阵 除以 计数矩阵 即得 平均概率
    # visualize normalization Weights
    # plt.imshow(np.mean(count_predictions, axis=2))
    # plt.show()
    return full_probs   #返回整张图的平均概率 shape(1024,2048,19)

predict_multiscale

The function call in different scales predict_whole, the use of a flip, flip the pictures will be sent to network, obtain network output, then the network output flip, output before the overlay and divided by two.

#image.shape(1,3,1024,2048)、tile_size=(769,769)、scales=[0.75, 1.0, 1.25, 1.5, 1.75, 2.0]、
#classes=19、flip=True、recur=1
def predict_multiscale(net, image, tile_size, scales, classes, flip_evaluation, recurrence):
    """
    Predict an image by looking at it with different scales.
        We choose the "predict_whole_img" for the image with less than the original input size,
        for the input of larger size, we would choose the cropping method to ensure that GPU memory is enough.
    """
    image = image.data
    N_, C_, H_, W_ = image.shape    #1, 3, 1024, 2048
    full_probs = np.zeros((H_, W_, classes))    #shape(1024, 2048, 19)  
    for scale in scales:    #[0.75, 1.0, 1.25, 1.5, 1.75, 2.0]
        scale = float(scale)    #0.75
        print("Predicting image scaled by %f" % scale)
        #用不同比例对图片进行缩放
        scale_image = ndimage.zoom(image, (1.0, 1.0, scale, scale), order=1, prefilter=False)   #shape(1,3,768,1536)
        scaled_probs = predict_whole(net, scale_image, tile_size, recurrence)   #预测缩放后的整张图像
        if flip_evaluation == True: #若采取翻转
            flip_scaled_probs = predict_whole(net, scale_image[:,:,:,::-1].copy(), tile_size, recurrence)   #翻转后再次预测整张
            scaled_probs = 0.5 * (scaled_probs + flip_scaled_probs[:,::-1,:])   #翻转前后各占50%
        full_probs += scaled_probs  #全概率累加 shape(1024, 2048, 19)
    full_probs /= len(scales)   #求平均概率
    return full_probs   #shape(1024, 2048, 19)

predict_whole

If you take the whole map predicted, then the picture size with the network input (cropsize) may be conflicts. Thus the output of the network may vary in length and width, it is necessary to sample the output (drawing) to the specified inputs.

#image.shape(1,3,1024,2048)、tile_size=(769,769)
def predict_whole(net, image, tile_size, recurrence):
    image = torch.from_numpy(image)
    interp = nn.Upsample(size=tile_size, mode='bilinear', align_corners=True)   #上采样
    prediction = net(image.cuda())  #[x, x_dsn]
    if isinstance(prediction, list):
        prediction = prediction[0]  #x.shape(1,19,97,193)注意这里跟滑动窗口法不同,输出的h、w并不相等
    prediction = interp(prediction).cpu().data[0].numpy().transpose(1,2,0)  #插值 shape(1024,2048,19)
    return prediction

main

After the completion of the operation obtained output, which is normalized and takes the maximum value in the channel dimension, prediction results obtained seg_pred, we can use the putpalettefunction obtained colored color segmentation.

More importantly, we need to calculate the index MIOU divided, as used herein, a confusion matrix confusion_matrixmethod, we seg_gt effective area and seg_pred removed and drawn into a one-dimensional vector, the input get_confusion_matrixfunction.

        seg_pred = np.asarray(np.argmax(output, axis=2), dtype=np.uint8)    #对结果进行softmax归一化 shape(1024,2048)
        output_im = PILImage.fromarray(seg_pred)    #将数组转换为图像
        output_im.putpalette(palette)               #给图像上色
        output_im.save('outputs/'+name[0]+'.png')   #保存下来

        seg_gt = np.asarray(label[0].numpy()[:size[0],:size[1]], dtype=np.int)  #取出label shape(1024,2048)
    
        ignore_index = seg_gt != 255    #找到label中的有效区域即不为255的位置,用ignore_index来指示位置
        seg_gt = seg_gt[ignore_index]   #将有效区域取出并转换为1维向量
        seg_pred = seg_pred[ignore_index]   #同上转换为1维向量,位置一一对应
        # show_all(gt, output)
        confusion_matrix += get_confusion_matrix(seg_gt, seg_pred, args.num_classes)    #混淆矩阵加上本张图的预测结果

Color prediction result, three values ​​assigned to each channel on the RGB pixels on the 1024x2048x1, to give 1024x2048x3.

def get_palette(num_cls):
    """ Returns the color map for visualizing the segmentation mask.
    Args:
        num_cls: Number of classes
    Returns:
        The color map
    """

    n = num_cls
    palette = [0] * (n * 3)
    for j in range(0, n):
        lab = j
        palette[j * 3 + 0] = 0
        palette[j * 3 + 1] = 0
        palette[j * 3 + 2] = 0
        i = 0
        while lab:
            palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
            palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
            palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
            i += 1
            lab >>= 3
    return palette

get_confusion_matrix

Initialization confusion matrix confusion_matrix, which is the dimension 19x19, confusion matrix in row i and column j represents belonging to class i according to the number of false positives was j-th column pixels.

So we need gt_labeland pred_labelto determine the position of each pixel on the confusion matrix.

We create a vector index = (gt_label * class_num + pred_label), in row major way with the one-dimensional vector to store the two-dimensional information.

E.g. gt_label [0] = 1, pred_label [0] = 3 has index [0] = 1 * 19 + 3 = 22, index [0] = 22 indicates the 0-th pixels that belong to the first category was false into three categories, so confusion_matrix [1] [3] count by one.

#gt_label、pred_label都为1维向量
def get_confusion_matrix(gt_label, pred_label, class_num):
        """
        Calcute the confusion matrix by given label and pred
        :param gt_label: the ground truth label
        :param pred_label: the pred label
        :param class_num: the nunber of class
        :return: the confusion matrix
        """
        index = (gt_label * class_num + pred_label).astype('int32') #以行优先的方式用一维向量存储二维位置信息
        label_count = np.bincount(index)    #对各种情况进行计数,如第1类被误判为第2类的一共有x个像素点
        confusion_matrix = np.zeros((class_num, class_num)) #初始化混淆矩阵 shape(19,19)

        for i_label in range(class_num):    #0,1,2,...,18
            for i_pred_label in range(class_num):   #0,1,2,...,18
                cur_index = i_label * class_num + i_pred_label  #0*18+0, 0*18+1, ..., 18*18+18 每一次对应一种判断情况
                if cur_index < len(label_count):
                    confusion_matrix[i_label, i_pred_label] = label_count[cur_index]    #矩阵放入对应判断情况的次数

        return confusion_matrix

main

Evaluation mIoU semantic segmentation is calculated as follows.

MIoU = \frac{1}{k+1}\sum_{i=0}^{k}{\frac{p_{ii}}{\sum_{j=0}^{k}{p_{ij}} + \sum_{j=0}^{k}{p_{ji}} - p_{ii}}}

Is then calculated for each class IoU averaged. A class IoU calculated as, for example, I = 1, \ (P_ {. 11} \) represents a true positives, i.e., this is a Class 1 and predictions are class 1, \ (\ SUM ^ K_ {J = 0} P_ { 1j} \) indicates that this is a class 1 has predicted that the other class number of pixels (note, included here \ (P_ {. 11} \) ), \ (\ SUM ^ K_ {J = 0} P_ {J1} \) this represents a class belonging to another class, but the number of pixels for the prediction of 1 (note that this also includes the \ (p_ {11} \) ), in the denominator \ (p_ {11} \) is calculated by subtracting twice so to a \ (p_ {11} \)

From the definition known confusion matrix elements on the diagonal is the \ (P_ {II} \) , is the summation of the i-th row \ (\ J = SUM {^ K_ {ij of P_ 0}} \) , for is the i-th column summation \ (\ ^ SUM K_ {J} = 0 {P_ JI} \) , then by calculating mIoU confusion matrix is very simple, see the code.

pos = confusion_matrix.sum(1)   #混淆矩阵对行求和
    res = confusion_matrix.sum(0)   #混淆矩阵对列求和
    tp = np.diag(confusion_matrix)  #取出对角元素,即正确判断的次数

    IU_array = (tp / np.maximum(1.0, pos + res - tp))   #每一类的IoU = ∩/∪ shape(,19)
    mean_IU = IU_array.mean()   #对类取平均
    
    # getConfusionMatrixPlot(confusion_matrix)
    print({'meanIU':mean_IU, 'IU_array':IU_array})
    with open('result.txt', 'w') as f:
        f.write(json.dumps({'meanIU':mean_IU, 'IU_array':IU_array.tolist()}))

Guess you like

Origin www.cnblogs.com/vincent1997/p/10939747.html