人脸检测BlazeFace

论文：BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

MNN-github：https://github.com/xindongzhang/MNN-APPLICATIONS

Google出品，

论文提出了一个移动端的超级实时的人脸检测框架（人脸检测+关键点检测），基于MobileNetV1/V2和ssd进行修改，在移动端GPU上可以达到200-1000FPS，基于MNN框架在rk3399板子上cpu速度可以达到10ms。

主要贡献：

基于MobileNetV1/V2修改的一个非常紧凑的基础特征提取网络，轻量化，专为移动端设计。
基于SSD修改的GPU友好的anchor方案。
相比于预测的质量，对于NMS操作，一个紧的可选的分辨率策略，有助于取得平稳，平滑的预测。

网络结构设计：

BlazeFace主要检测边框和eye centers, ear tragions, mouth center, nose tip共6个关键点。

1.增加感受野的大小（Enlarging the receptive field sizes）

在深度可分离卷积（depthwise separable convolution）中，depthwise convolution部分（s*s*c*k*k）与 pointwise convolution 部分（s*s*c*d）计算量比值为（k*k：d），可见depthwise separable convolution计算量主要由d决定。大部分时候，k=3，而d则很大，取值包括，24，32，64，96，160，320，1280。所以1*1卷积的计算量大于3*3可分离卷积的计算量。

a 3×3 depthwise convolution in 16-bit floating point arithmetic takes 0.07 ms for a 56×56×128 tensor, while the subsequent 1×1 convolution from 128 to 128 channels is 4.3× slower at 0.3 ms

使用5*5卷积核代替3*3卷积核，不会带来太大开销，但是可以增大感受野（receptive field）。

在原始MobileNet残差结构的基础上，将3*3卷积换成5*5卷积，从而增大感受野得到BlazeBlock模块。并且对两个BlazeBlock模块进行叠加，得到double BlazeBlock模块。

2.特征提取器基础网络（Feature extractor）

输入图像大小128*128*3，输出大小8*8，网络包含5个BlazeBlock模块，6个double BlazeBlock模块

3.Anchor 策略（Anchor scheme）

相比ssd的6个尺度的anchor，BlazeFace修改为只有2个scale的anchor，而aspect ratio 只取1。

4.后处理（Post-processing）

后处理为了解决手机拍摄或者录像中的抖动问题，引入blending nms，可以提高10%的准确性。

Blending nms代码：

typedef struct ObjectInfo {
    float x1;
    float y1;
    float x2;
    float y2;
    float score;
} ObjectInfo;

void nms(std::vector<ObjectInfo> &input, std::vector<ObjectInfo> &output, int type) {
    std::sort(input.begin(), input.end(), [](const ObjectInfo &a, const ObjectInfo &b) { return a.score > b.score; });

    int box_num = input.size();

    std::vector<int> merged(box_num, 0);

    for (int i = 0; i < box_num; i++) {
        if (merged[i])
            continue;
        std::vector<ObjectInfo> buf;

        buf.push_back(input[i]);
        merged[i] = 1;

        float h0 = input[i].y2 - input[i].y1 + 1;
        float w0 = input[i].x2 - input[i].x1 + 1;

        float area0 = h0 * w0;

        for (int j = i + 1; j < box_num; j++) {
            if (merged[j])
                continue;

            float inner_x0 = input[i].x1 > input[j].x1 ? input[i].x1 : input[j].x1;
            float inner_y0 = input[i].y1 > input[j].y1 ? input[i].y1 : input[j].y1;

            float inner_x1 = input[i].x2 < input[j].x2 ? input[i].x2 : input[j].x2;
            float inner_y1 = input[i].y2 < input[j].y2 ? input[i].y2 : input[j].y2;

            float inner_h = inner_y1 - inner_y0 + 1;
            float inner_w = inner_x1 - inner_x0 + 1;

            if (inner_h <= 0 || inner_w <= 0)
                continue;

            float inner_area = inner_h * inner_w;

            float h1 = input[j].y2 - input[j].y1 + 1;
            float w1 = input[j].x2 - input[j].x1 + 1;

            float area1 = h1 * w1;

            float score;

            score = inner_area / (area0 + area1 - inner_area);

            if (score > iou_threshold) {
                merged[j] = 1;
                buf.push_back(input[j]);
            }
        }
        switch (type) {
            case hard_nms: {
                output.push_back(buf[0]);
                break;
            }
            case blending_nms: {
                float total = 0;
                for (int i = 0; i < buf.size(); i++) {
                    total += exp(buf[i].score);
                }
                ObjectInfo rects;
                memset(&rects, 0, sizeof(rects));
                for (int i = 0; i < buf.size(); i++) {
                    float rate = exp(buf[i].score) / total;
                    rects.x1 += buf[i].x1 * rate;
                    rects.y1 += buf[i].y1 * rate;
                    rects.x2 += buf[i].x2 * rate;
                    rects.y2 += buf[i].y2 * rate;
                    rects.score += buf[i].score * rate;
                }
                output.push_back(rects);
                break;
            }
            default: {
                printf("wrong type of nms.");
                exit(-1);
            }
        }
    }
}

实验结果：

总结：

针对手机端的正向人脸检测，可以同时做检测+关键点
速度超级快

watersink

发布了219 篇原创文章 · 获赞 898 · 访问量 140万+

他的留言板关注

猜你喜欢