深度篇——图像处理的一些方法(二) 细说性能评估 IOU 和 GIOU

返回主目录

返回图像处理的一些方法目录

上一章：深度篇——图像处理的一些方法(一) 关于图像处理的三个层次与 selective search

下一章：深度篇——图像处理的一些方法(三) 细说 HOG 特征与 bag-of-word

本小节，细说性能评估 IOU 和 GIOU，下一小节细说 HOG 特征与 bag-of-word

GIOU 论文地址：Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

性能评估

自然地，通过计算得到的包含物体的 bounding boxes 与 ground true (真实情况) 的窗口重叠越多，那么性能就越好。

三. IOU (Instersectiou Over Union, IOU)

1. IOU 是一种测量在特定数据集中检测相应物体准确度的一个标准。

2. IOU 的计算公式：

$\LARGE IOU(g_{c, i}, l_{j}) = \frac{area(g_{c, i}) \cap area(l_{j})}{area(g_{c, i}) \cup area(l_{j})}$

$\large g_{c, i}$ ：为对于每个固定的类别 C，每个 ground true (真实情况) 表示为 $\large g_{c, i}$ ，其中 $\large g_{c, i} \in G_{c}$

$\large l_{j}$ ：为每个 bounding box，其中 $\large l_{j} \in L$

其实， $\large IOU = \frac{A \cap B}{A \cup B}$ 的面积比。

IOU 的 loss为：

$\LARGE L_{IOU} = 1 - IOU$

3. IOU 的 python 代码

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/02/01 16:04
# @Author   : WanDaoYi
# @FileName : bbox_iou.py
# ============================================

import numpy as np


# IOU
def bboxes_iou(boxes1, boxes2):
    boxes1 = np.array(boxes1)
    boxes2 = np.array(boxes2)

    # 计算 面积
    boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])
    boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])

    # 交集的 左上角坐标
    left_up = np.maximum(boxes1[..., :2], boxes2[..., :2])
    # 交集的 右下角坐标
    right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:])

    # 计算交集矩形框的 high 和 width
    inter_section = np.maximum(right_down - left_up, 0.0)

    # 两个矩形框的 交集 面积
    inter_area = inter_section[..., 0] * inter_section[..., 1]

    # 两个矩形框的并集面积
    union_area = boxes1_area + boxes2_area - inter_area
    # 计算 iou
    ious = np.maximum(1.0 * inter_area / union_area, np.finfo(np.float32).eps)

    return ious

4. MABO (Mean Average Best Overlap, MABO) 平均最高重叠率

在做性能评估的时候，有时也会用 MABO来评估，其公式如下：

$\LARGE ABO = \frac{1}{|G_{c}|} \sum_{g_{c, i} \in G_{c}} \max_{l_{j} \in L} IOU(g_{c, i}, l_{j})$

对于所有的类别下的性能评估，很自然就使用所有类别下的性能评估，很自然就使用所有类别的 ABO 的平均值 MABO 来评估。

5. 单一策略评估

可以通过改变多样性策略中的任何一种，评估选择性搜索的 MABO 性能指标。论文中采取的策略如下：

(1). 使用 RGB 色彩空间(基于图的图像分割会利用不同的色彩进行图像区域分割)

(2). 采用四种相似度计算的组合方式

(3). 设置图像分割的阈值 k = 50

然后，通过改变其中一个策略参数，获取 MABO 性能指标。

6. 多样性策略组合

使用贪婪的搜索算法，把单一策略进行组合，会获得较高的 MABO，但是，也会造成计算成本的增加。

四. GIOU (Generalized IOU, GIOU) 广义 IOU

1. 对于 IOU 有如下两个缺点：

(1). 当 $\large IOU(B^{p}, B^{g}) = 0$ 时，即 bounding box 与 ground true 无交集时，无法得知 bounding box 与 ground true 是相互临近，还是相距很远。

$\large B^{p}$ ：为 predicted box，即为 bounding box，对应的坐标为 $\large (x_{1}^{p},\; y_{1}^{p},\; x_{2}^{p},\; y_{2}^{p})$

$\large B^{g}$ ：为 ground true，对应的坐标为 $\large (x_{1}^{g},\; y_{1}^{g},\; x_{2}^{g},\; y_{2}^{g})$

(2). IOU 无法辨别不同方式的对齐，比如方向不一致等，如下：

2. GIOU 是对 IOU 进行了优化，计算如下：

$\large B^{p}$ ：为 predicted box，即为 bounding box，对应的坐标为 $\large (x_{1}^{p},\; y_{1}^{p},\; x_{2}^{p},\; y_{2}^{p})$

$\large B^{g}$ ：为 ground true，对应的坐标为 $\large (x_{1}^{g},\; y_{1}^{g},\; x_{2}^{g},\; y_{2}^{g})$

(1). 计算 $\large B^{p}$ 的面积

$\LARGE \hat{x}_{1}^{p} = \min (x_{1}^{p},\; x_{2}^{p})$

$\LARGE \hat{x}_{2}^{p} = \max (x_{1}^{p},\; x_{2}^{p})$

$\LARGE \hat{y}_{1}^{p} = \min (y_{1}^{p},\; y_{2}^{p})$

$\LARGE \hat{y}_{2}^{p} = \max (y_{1}^{p},\; y_{2}^{p})$

$\LARGE A^{p} = (\hat{x}_{2}^{p} - \hat{x}_{1}^{p}) \times (\hat{y}_{2}^{p} - \hat{y}_{1}^{p})$

(2). 计算 $\large B^{g}$ 的面积 (同理)

$\LARGE A^{g} = (x_{2}^{g} - x_{1}^{g}) \times (y_{2}^{g} - y_{1}^{g})$

(3). 计算 $\large A^{p} \cap A^{g}$ 的面积 $\large I$

$\LARGE x_{1}^{I} = max (\hat{x}_{1}^{p},\; x_{1}^{g})$

$\LARGE x_{2}^{I} = min (\hat{x}_{2}^{p},\; x_{2}^{g})$

$\LARGE y_{1}^{I} = max (\hat{y}_{1}^{p},\; y_{1}^{g})$

$\LARGE y_{2}^{I} = min (\hat{y}_{2}^{p},\; y_{2}^{g})$

$\LARGE I = \left\{\begin{matrix} (x_{2}^{I} - x_{1}^{I}) \times (y_{2}^{I} - y_{1}^{I}) & \;\;\; if \; x_{2}^{I} > x_{1}^{I},\; y_{2}^{I} > y_{1}^{I} \\ 0 & otherwise \end{matrix}\right.$

(4). 计算包含 $\large B^{p}$ 和 $\large B^{g}$ 的最小密封面积 box $\large B^{c}$ ：

$\LARGE x_{1}^{c} = min (\hat{x}_{1}^{p},\; x_{1}^{g})$

$\LARGE x_{2}^{c} = max (\hat{x}_{2}^{p},\; x_{2}^{g})$

$\LARGE y_{1}^{c} = min (\hat{y}_{1}^{p},\; y_{1}^{g})$

$\LARGE y_{2}^{c} = max (\hat{y}_{2}^{p},\; y_{2}^{g})$

$\LARGE A^{c} = (x_{2}^{c} - x_{1}^{c}) \times (y_{2}^{c} - y_{1}^{c})$

$\LARGE IOU = \frac{I}{U} \;\;\;\;\;\;\; (U = A^{p} + A^{g} - I)$

$\LARGE GIOU = IOU - \frac{A^{c} - U}{A^{c}}$

GIOU 的 loss 为：

$\LARGE L_{GIOU} = 1 - GIOU$

3. GIOU 优化的理解

通过计算得：

$\large 0 \leq IOU \leq 1$

$\large -1 \leq GIOU \leq 1$

(1). 当 IOU = 0 时，IOU 无法进行优化。而 $\large L_{GIOU}$ 的行为：

$\large L_{GIOU} = 1 - GIOU = 1 + \frac{A^{c} - U}{A^{c}} - IOU$

当 $\large B^{p}$ 与 $\large B^{g}$ 不相交时，即 $\large I = 0 \;\;(IOU = 0)$ ，此时 $\large L_{GIOU}$ 简化为：

$\large L_{GIOU} = 1 + \frac{A^{c} - U}{A^{c}} = 2 - \frac{U}{A^{c}}$

最小化 GIOU 损失则需要最大化 $\large \frac{U}{A^{c}}$ ，这一项已经是归一化的，即 $\large 0 \leq \frac{U}{A^{c}} \leq 1$ ，并且最大化这一项则需要最小化 $\large A^{c}$ ，同时最大化 $\large U$ ，因为 $\large I = 0$ ，故 $\large U = B^{p} + B^{g}$ ，由于 $\large B^{g}$ 已知，且固定，所以需要最大化 $\large B^{p}$ ，也就是说，最小化 $\large A^{c}$ 且同时最大化 $\large B^{p}$ ，显然，这就使得 $\large B^{p}$ 趋于与 $\large B^{g}$ 重合。

(2). 因为当 $\large GIOU \leq 0$ 时， GIOU 依然可以进行优化，这样，GIOU 既保留了 IOU 的原始性质，同时又弱化了它的弱点，于是论文认为可以将 GIOU 作为 IOU 的替换。

4. GIOU 的 python 代码：

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/02/01 16:04
# @Author   : WanDaoYi
# @FileName : bbox_giou.py
# ============================================

import tensorflow as tf


def bbox_giou(boxes1, boxes2):
    # 将 中心点 与 high 和 width 转为 左上 和 右下 坐标点
    boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5,
                        boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1)
    boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5,
                        boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1)

    # 获取 框 的 左上角 和 右下角 的坐标值
    boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]),
                        tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1)
    boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]),
                        tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1)

    # 计算 面积
    boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])
    boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])

    # 计算交集的 左上角 和 右下角 坐标
    left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2])
    right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:])

    # 计算 交集 的 high 和 width
    inter_section = tf.maximum(right_down - left_up, 0.0)
    # 计算 交集 的面积
    inter_area = inter_section[..., 0] * inter_section[..., 1]
    # 计算 并集 的面积
    union_area = boxes1_area + boxes2_area - inter_area
    # 计算 IOU
    iou = inter_area / union_area

    # 计算最小密封框 的 左上角 坐标
    enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2])
    # 计算最小密封框 的 右下角 坐标
    enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:])
    # 计算最小密封框 的 high 和 width
    enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0)
    # 计算最小密封框 的 面积
    enclose_area = enclose[..., 0] * enclose[..., 1]
    # 计算 GIOU
    giou = iou - 1.0 * (enclose_area - union_area) / enclose_area

    return giou