各位同学好，今天和大家分享一下如何使用 TensorFlow 构建非极大值抑制 NMS 和 Soft-NMS 方法。在目标检测中，对于单一目标有多个候选矩形框的问题，主要是采用非极大值抑制的方法，除去多余的候选矩形框，得到最合适的单一候选矩形框作为识别出的目标区域。

在阅读本篇文章之前，建议大家先看以下两篇文章

YOLOV1中的NMS理论基础：https://blog.csdn.net/dgvv4/article/details/123767854

各种iou损失的理论和代码：https://blog.csdn.net/dgvv4/article/details/124039111

1. NMS 非极大值抑制

1.1 方法介绍

图像经过神经网络模型运算，会得到多个目标预测框。然而，每个目标只有一个真实标签框，算法却会生成多个预测框，这样会造成检测框冗余或预测错误的情况。因此，深度神经网络通过非极大值抑制 NMS 来筛选预测框，去掉交并比更大并且置信度较低的冗余检测框，剩下置信度较高，定位较准确的预测框。

传统的非极大值抑制的算法流程如下：

（1）B代表每个预测框的信息，包含每个检测框的位置坐标(x,y,w,h)，预测框内是否包含物体c，预测框内的物体属于每个类别的条件概率；C'代表每个预测框的置信度集合；R代表最终的筛选出的预测框集合。

（2）对某个类别，将集合 C' 中的置信度分数从高到低排序，并且根据置信度对预测框 B 进行排序，置信度越大的检测框排在越前面。将置信度最大的预测框（命名为X）从集合B中单独挑出来，将该预测框加入到集合R中，并在B中删除。

（3）集合B中剩余的每个预测框都与上一步被选中的预测框 X 计算交并比 iou 的值，将 iou 和阈值比较，移除比预设阈值大的 iou 对应的预测框，即剔除和最优预测框重合度较大的其余次优预测框。

（4）重复执行步骤②和③，直到集合B中的预测框个数为0后，结束循环，算法所得到的集合R就是图片上筛选出的所有预测框。

缺点：传统的非极大值抑制NMS无法抑制不同类别的目标。

1.2 代码展示

先定义一个计算iou的函数，计算最优框和其他所有次优检测框的交并比。iou 的值越大，表明两个框重叠度越高。当iou为0时，说明两个框完全没有重合，iou为1时说明两个框完全重合。

import tensorflow as tf

# 计算两个检测框的交并比
def IOU(box1, box2):
    '''
    box1代表最佳预测框的信息[anchors,7]
    box2代表其他预测框的信息[anchors,7]
    '''

    # 获取两个框的左上和右下坐标
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
    
    # 计算两个框的交集的左上和右下坐标
    intersect_x1 = tf.maximum(b1_x1, b2_x1)
    intersect_y1 = tf.maximum(b1_y1, b2_y1)
    intersect_x2 = tf.minimum(b1_x2, b2_x2)
    intersect_y2 = tf.minimum(b1_y2, b2_y2)

    # 计算交集的宽高
    intersect_w = intersect_x2 - intersect_x1
    intersect_h = intersect_y2 - intersect_y1
    # 交集的面积
    intersect_area = intersect_w * intersect_h
    
    # 并集的面积
    box1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1)
    box2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
    union_area = box1_area + box2_area - intersect_area
    
    # 计算交并比，分母加上一个很小的数防止为0
    iou = intersect_area / (union_area + tf.keras.backend.epsilon())
    
    return iou

接下来对图像进行非极大值抑制，本篇使用Tensorflow方法，某些函数如果大家有疑惑的话可以看我的这个TF2基础专栏：https://blog.csdn.net/dgvv4/category_11493230.html

# 定义NMS非极大值抑制函数
def NMS(inputs, num_classes, conf_thresh, nms_thresh):
    
    # 获取输入特征图的batch_size
    batch_size = inputs.shape[0]
    
    # tensor1.assign(data)  用data值代替tensor值，必须是Variable类型
    # 将预测框的中心坐标xy宽高wh转换成左上(x1,y1)和右下坐标(x2,y2)
    inputs[:,:,0].assign(inputs[:,:,0] - inputs[:,:,2] // 2)  # x1
    inputs[:,:,1].assign(inputs[:,:,1] - inputs[:,:,3] // 2)  # y1
    inputs[:,:,2].assign(inputs[:,:,0] + inputs[:,:,2])  # x1 + w = x2
    inputs[:,:,3].assign(inputs[:,:,1] + inputs[:,:,3])  # y1 + h = y2
    
    # 保存筛选出的所有图片的检测框
    outputs = []
    
    # 循环，分别处理每一张图片的预测框
    for i in range(batch_size):
        
        #（1）取出每张图片的预测结果[num_anchors, 4+1+num_classes]
        prediction = inputs[i]
        
        # 取出该张图片的每个检测框的置信度
        score = prediction[:, 4]
        
        # 判断预测框置信度是否大于阈值，即检测框内部是否包含目标物体
        mask = score > conf_thresh  # mask保存每个框是否需要被保留
        
        # 筛选需要保留的预测框，若超过阈值那么预测框对应的mask=True
        detections = prediction[mask]
        
        #（2）找出每一个预测框对应的最大概率
        # [detection_anchors, 4+1+num_classes] ==> [detection_anchors]
        class_conf = tf.reduce_max(detections[:,5:], axis=-1)
        # 为了后面的计算，将维度填充成原来的，每个框对应一个最大的类别概率
        # [detection_anchors] ==> [detection_anchors, 1]
        class_conf = tf.expand_dims(class_conf, axis=-1)
        
        # 找出每个预测框所属的类别索引，即最大类别概率的索引
        class_pred = tf.expand_dims(tf.argmax(detections[:, 5:], axis=-1), axis=-1)
        # class_pred输出是tf.int64类型，转变成float32
        class_pred = tf.cast(class_pred, dtype=tf.float32)
        
        #（3）组合每个检测框的坐标、置信度、类别概率、类别索引
        # [anchors,5] + [anchors,1] + [anchors,1] ==> [anchors,7]
        detections = tf.concat([detections[:,:5], class_conf, class_pred], axis=-1)
        
        # 筛选出所有的种类
        unique_class, _ = tf.unique(detections[:,-1])
        # 如果这张图片中没有预测到目标，就切换下一张图
        if len(unique_class) == 0:
            continue
        
        #（4）遍历每一个类别的预测框，选择最佳的预测框
        best_box = []  # 存放一张图片上筛选出的预测框
        for c in unique_class:
            
            # 筛选出属于这个类别的所有预测框
            class_mask = (detections[:,-1] == c)
            # 取出所有属于该分类的预测框
            detection = detections[class_mask]
            
            # 对所有属于该类别的预测框排序，将该类别置信度最大的排在最前面
            score = detection[:,4]  # 取出所有框的置信度
            # 返回降序排序后的索引，如值最大的元素在score中的索引
            argsort = tf.argsort(score, direction='DESCENDING')
            # 对检测框从大到小排序,tf.gather()函数在某一维度按照指定的索引获取数据
            detection = tf.gather(detection, axis=0, indices=argsort)
            
            # 如果有检测框检测到该类别，再进行下一步处理
            while len(detection) != 0:
                
                # 首先将类别概率最高的检测框信息保存
                best_box.append(detection[0])
                
                # 如果这个类别只有这一个检测框，接下来就不需要比较了
                if len(detection) == 1:
                    break
                
                iouList = []  # 保存该类别的最优框和其他次优框计算出的iou值
                #（5）计算得分最大的框和其他框的iou交并比
                for other_box in detection[1:]:
                    # 最优框依次和其他框比较
                    iou = IOU(best_box[-1], other_box)
                    # 将每次计算出的iou值保存
                    iouList.append(iou)

                # 选择交并比小于阈值的其他有效框，删除重合度大的框
                detection = detection[1:][iouList < tf.constant(nms_thresh)]
            
        # 将每张图片筛选出的所有类别的检测框保存起来
        outputs.append(best_box)
        
    # 返回检测框信息
    return outputs

构建随机的输入tensor来验证NMS代码。NMS主要作用是删除同一类别的重复的框。

构造输入特征图shape=[batch_size, num_anchors, 4+1+num_classes]

其中 4 代表：预测框中心坐标(x,y)宽高(w,h)；1 代表：检测框中是否包含目标物体，即置信度c；num_classes 代表，预测框内部物体属于某个类别的条件概率，VOC数据集中等于20

conf_thresh 代表预测框需要满足的最小置信度，nms_thresh 代表两个框的交并比iou，大于该值就代表重复框住了同一个目标。

if __name__ == '__main__':
    
    # 构造输入，1个batch，5个anchor，3个classes
    inputs = tf.Variable([[[250, 250, 420, 420, 0.80, 0.8, 0.6, 0.4],
                           [220, 220, 320, 330, 0.92, 0.4, 0.6, 0.8],
                           [160, 160, 210, 210, 0.72, 0.8, 0.5, 0.3],
                           [240, 230, 330, 325, 0.81, 0.8, 0.4, 0.6],
                           [230, 220, 340, 315, 0.90, 0.4, 0.8, 0.6]]])
    
    # 返回输出
    outputs = NMS(inputs, num_classes=3, conf_thresh=0.5, nms_thresh=0.4)
    print(outputs)


''' 输出结果
[[<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 75.  ,  68.  , 405.  , 393.  ,   0.81,   0.8 ,   0.  ], dtype=float32)>, 
<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 55.  ,  55.  , 265.  , 265.  ,   0.72,   0.8 ,   0.  ], dtype=float32)>, 
<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 60.  ,  55.  , 380.  , 385.  ,   0.92,   0.8 ,   2.  ], dtype=float32)>, 
<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 60. ,  63. , 400. , 378. ,   0.9,   0.8,   1. ], dtype=float32)>]]
'''

2. Soft-NMS 非极大值抑制

2.1 方法介绍

对于两个相互遮挡，有重叠的同类目标，他们的检测框是相互靠近的，即 IOU 值很高，通过 NMS 后，会将其中一个 socre 较低的检测框强制删除，因此 NMS 无法很好的对相互遮挡有重叠的目标进行处理。

NMS 之所以无法有效的对相互遮挡有重叠的目标进行检测，是因为它将相近的其他物体的预测框当作自身的冗余检测框，将其他预测框的 score 强制置 0，从而导致对相近的其他物体检测框误删。

为此 Soft-NMS 在对相邻检测框的 score 重新打分时，不是像 NMS 那样粗暴的直接置 0，而是通过一个基于与 IOU 相关的函数来降低相邻检测框的 score。虽然 score 被降低，但相邻的检测框仍在物体检测的序列中。

对于两个 iou 大于阈值的检测框，根据他们的 iou 对 score 进行削弱。iou 越大，两个检测框的重叠度越高，他们是同一个物体的不同检测框的可能性越大，因此对 score 的削弱越严重；iou 越小，两个检测框的重叠度越小，他们是同一个物体的不同检测框的可能性越小，因此对 score 的削弱较小。

如下图，对于同一类别的检测，在两个或多个待检测目标重合在一起时，使用 NMS 算法容易导致最后的检测结果中缺失某些目标，另外当待检测目标周围有其他遮挡物遮挡时也有可能会无法检测出目标。使用 Soft-NMS 算法保留了交并比并非最高的重叠物体的预测框，并给予这些预测框一个分数，之后再进一步筛选，解决了物体被遮挡的问题。

2.2 代码展示

Soft-NMS 和 NMS 的代码部分很相似，只有计算iou后预测框排序那个部分的代码不相同，在下面的代码中我已经标出来了，黄色字体中间的代码需要注意一下。

import tensorflow as tf
import numpy as np

# 计算两个检测框的交并比
def IOU(box1, box2):

    # 获取两个框的左上和右下坐标
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
    
    # 计算两个框的交集的左上和右下坐标
    intersect_x1 = tf.maximum(b1_x1, b2_x1)
    intersect_y1 = tf.maximum(b1_y1, b2_y1)
    intersect_x2 = tf.minimum(b1_x2, b2_x2)
    intersect_y2 = tf.minimum(b1_y2, b2_y2)

    # 计算交集的宽高
    intersect_w = intersect_x2 - intersect_x1
    intersect_h = intersect_y2 - intersect_y1
    # 交集的面积
    intersect_area = intersect_w * intersect_h
    
    # 并集的面积
    box1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1)
    box2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
    union_area = box1_area + box2_area - intersect_area
    
    # 计算交并比，分母加上一个很小的数防止为0
    iou = intersect_area / (union_area + tf.keras.backend.epsilon())
    
    return iou    


# 定义NMS非极大值抑制函数
def NMS(inputs, num_classes, conf_thresh, nms_thresh, sigma=0.5):
    
    # 获取输入特征图的batch_size
    batch_size = inputs.shape[0]
    
    # tensor1.assign(data)  用data值代替tensor值，data必须是Variable类型
    # 将预测框的中心坐标xy宽高wh转换成左上(x1,y1)和右下坐标(x2,y2)
    inputs[:,:,0].assign(inputs[:,:,0] - inputs[:,:,2] // 2)  # x1
    inputs[:,:,1].assign(inputs[:,:,1] - inputs[:,:,3] // 2)  # y1
    inputs[:,:,2].assign(inputs[:,:,0] + inputs[:,:,2])  # x1 + w = x2
    inputs[:,:,3].assign(inputs[:,:,1] + inputs[:,:,3])  # y1 + h = y2
    
    # 保存筛选出的所有图片的检测框
    outputs = []
    
    # 循环，分别处理每一张图片的预测框
    for i in range(batch_size):
        
        #（1）取出每张图片的预测结果[num_anchors, 4+1+num_classes]
        prediction = inputs[i]
        
        # 取出该张图片的每个检测框的置信度
        score = prediction[:, 4]
        
        # 判断预测框置信度是否大于阈值，即检测框内部是否包含目标物体
        mask = score > conf_thresh  # mask保存每个框是否需要被保留
        
        # 筛选需要保留的预测框，若超过阈值那么预测框对应的mask=True
        detections = prediction[mask]
        
        #（2）找出每一个预测框对应的最大概率
        # [detection_anchors, 4+1+num_classes] ==> [detection_anchors]
        class_conf = tf.reduce_max(detections[:,5:], axis=-1)
        # 为了后面的计算，将维度填充成原来的，每个框对应一个最大的类别概率
        # [detection_anchors] ==> [detection_anchors, 1]
        class_conf = tf.expand_dims(class_conf, axis=-1)
        
        # 找出每个预测框所属的类别索引，即最大类别概率的索引
        class_pred = tf.expand_dims(tf.argmax(detections[:, 5:], axis=-1), axis=-1)
        # class_pred输出是tf.int64类型，转变成float32
        class_pred = tf.cast(class_pred, dtype=tf.float32)
        
        #（3）组合每个检测框的坐标、置信度、类别概率、类别索引
        # [anchors,5] + [anchors,1] + [anchors,1] ==> [anchors,7]
        detections = tf.concat([detections[:,:5], class_conf, class_pred], axis=-1)
        
        # 筛选出所有的种类
        unique_class, _ = tf.unique(detections[:,-1])
        # 如果这张图片中没有预测到目标，就切换下一张图
        if len(unique_class) == 0:
            continue
        
        #（4）遍历每一个类别的预测框，选择最佳的预测框
        best_box = []  # 存放一张图片上筛选出的预测框
        for c in unique_class:
            
            # 筛选出属于这个类别的所有预测框
            class_mask = (detections[:,-1] == c)
            # 取出所有属于该分类的预测框
            detection = detections[class_mask]
            
            # 对所有属于该类别的预测框排序，将该类别置信度最大的排在最前面
            score = detection[:,4]  # 取出所有框的置信度
            # 返回降序排序后的索引，如值最大的元素在score中的索引
            argsort = tf.argsort(score, direction='DESCENDING')
            # 对检测框从大到小排序,tf.gather()函数在某一维度按照指定的索引获取数据
            detection = tf.gather(detection, axis=0, indices=argsort)
            
            '''
            NMS和Soft-NMS的区别只在计算IOU值后剔除重合框时计算方法不同
            根据iou计算置信度的系数，该系数符合高斯分布，若两个框完全不重合，系数等于1，若完全重合，系数趋于0
            计算完新的置信度后，以置信度分数为指标，重新对预测框排序
            '''

            # 如果有检测框检测到该类别，再进行下一步处理
            while len(detection) != 0:
                
                # 首先将类别概率最高的检测框信息保存
                best_box.append(detection[0])
                
                # 如果这个类别只有这一个检测框，接下来就不需要比较了
                if len(detection) == 1:
                    break
                
                iouList = []  # 保存该类别的最优框和其他次优框计算出的iou值
                # 计算得分最大的框和其他框的iou交并比
                for other_box in detection[1:]:
                    # 最优框依次和其他框比较
                    iou = IOU(best_box[-1], other_box)  # 转换成数值类型
                    # 将每次计算出的iou值保存
                    iouList.append(iou)
                
                # 变成Variable类型用于赋值
                detection = tf.Variable(detection)
                
                # 对所有次优检测框的置信度值乘以一个指数系数
                detection[1:,4].assign(detection[1:,4] * tf.exp( - np.array(iouList) * np.array(iouList) / sigma ))
                
                # 不需要再调整最优预测框
                detection = detection[1:]
                
                # 重新依据置信度分数排序次优预测框
                score =  detection[:,4]
                # 返回降序排序后的索引，如值最大的元素在score中的索引
                argsort = tf.argsort(score, direction='DESCENDING')
                # 对检测框从大到小排序,tf.gather()函数在某一维度按照指定的索引获取数据
                detection = tf.gather(detection, axis=0, indices=argsort)
            
            '''
            剩余部分和普通 NMS 相同
            ''' 
        # 将每张图片筛选出的所有类别的检测框保存起来
        outputs.append(best_box)
        
    # 返回检测框信息
    return outputs

使用和NMS一样的初始化输入，方便比较两者差别。我们可以看到，所有预测框都被保留下来了，只降低置信度不删除预测框

if __name__ == '__main__':
    
    # 构造输入，1个batch，5个anchor，3个classes
    inputs = tf.Variable([[[250, 250, 420, 420, 0.80, 0.8, 0.6, 0.4],
                           [220, 220, 320, 330, 0.92, 0.4, 0.6, 0.8],
                           [160, 160, 210, 210, 0.72, 0.8, 0.5, 0.3],
                           [240, 230, 330, 325, 0.81, 0.8, 0.4, 0.6],
                           [230, 220, 340, 315, 0.90, 0.4, 0.8, 0.6]]])
    
    # 返回输出
    outputs = NMS(inputs, num_classes=3, conf_thresh=0.5, nms_thresh=0.4)
    print(outputs)

'''
[[<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 75.  ,  68.  , 405.  , 393.  ,   0.81, 0.8, 0. ], dtype=float32)>, 
<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 55. ,  55. , 265. , 265. , 0.58018255, 0.8 ,   0. ], dtype=float32)>, 
<tf.Tensor: shape=(7,), dtype=float32, numpy=array([4.0000000e+01, 4.0000000e+01, 4.6000000e+02, 4.6000000e+02, 3.3707327e-01, 8.0000001e-01, 0.0000000e+00], type=float32)>, 
<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 60.  ,  55.  , 380.  , 385.  ,   0.92,   0.8 ,   2.  ], dtype=float32)>, 
<tf.Tensor: shape=(7,), dtype=float32, numpy=array([ 60. ,  63. , 400. , 378. ,   0.9,   0.8,   1. ], dtype=float32)>]]
'''

【目标检测】(12) 非极大值抑制 NMS 和 Soft-NMS，附TensorFlow完整代码