ICLR 2023 | DamoFD, a new open source framework for face detection

01

paper

This article introduces the paper "DamoFD: Digging into Backbone Design on Face Detection" received by the Open Vision Team of DAMO Academy by ICLR 2023, the top international conference on machine learning

Paper link

https://openreview.net/pdf?id=NkJOhtNKX91

Open source code https://github.com/ly19965/EasyFace/tree/master/face_project/face_detection/DamoFD

02

background

Face Detection

The face detection algorithm is to detect the position of the face in a picture or video sequence, and give the specific coordinates of the face, generally rectangular coordinates, which are the key points, attributes, editing, stylization, recognition and other modules of the face. Foundation.

The benchmark used by the academic community to measure the performance of face detectors is [WiderFace](http://shuoyang1213.me/WIDERFACE/WiderFace_Results.html). This dataset mainly clarifies some challenges faced by face detectors, including scale , pose, occlusion, etc.

The research question of this paper is how to automatically search the backbone of a lightweight face detector?

b8529dda409d4d7816b8ea4bd0048852.png

Picture from Wider Face official website

Development History of Lightweight Face Detector
    • Manual Lightweight Face Detector Design: Early lightweight face detectors (FaceBoxes & BlazeFace) all used the single-stage target detector structure (SSD) and replaced it with their own hand-designed backbone module (eg, Faceboxes introduced Crelu, and BlazeFace introduced DW Conv). The common disadvantage of these methods is that they cannot automatically adjust the face detector structure as the computing power constraints change, which limits their application scenarios.

00c74595297bf63fa4b96f5b37a2f8d5.png

  Image from the FaceBoxes paper

    • Nas-Based lightweight face detector: With the rise of Neural Architecture Search (NAS) technology, researchers began to use Nas to automatically design the structure of face detectors, eg, SPNas in BFBox, DARTS in ASFD, RegNet in SCRFD. SCRFD draws on the idea of ​​RegNet to determine the search space of the detector, and creates a lightweight face detector with sota performance. The figure below shows the optimal computing power distribution interval on the backbone obtained by SCRFD

173a7152744f8694eec577c52283d4f8.png

Image from SCRFD paper

03

method

Motivation

The current Nas method mainly consists of two modules, a network generator and an accuracy predictor. Among them, the network generator is used to generate the candidate backbone structure, and the accuracy predictor is used to predict the accuracy of the sampled backbone structure.

Due to the inconsistent task goals of detection and classification, the former pays more attention to the representation of backbone stage-level (c2-c5), while the latter pays more attention to the representation of high-level (c5), which leads to the accuracy predictor used on classification tasks Good at predicting high-level representational capabilities but unable to predict stage-level representational capabilities.

Therefore, in the face detection task, we need an accuracy predictor that can predict the stage-level representation ability to better search the face detection-friendly backbone.

Preliminaries

First we introduce the background knowledge related to our method:

Relu CNN2d488b31ec433a9c7efcd65c2bbcc309.png

aa3fb64348cdae2044a144d55809808a.png

    Linear Region of16e322b69718525411cdd4ffef9c4b55.png

    • Region: When the hyperplane is used to divide the space, the connected component that is divided is called a region

    • Linear Region: refers to the linear region of the piecewise linear function, which represents the largest connected subset obtained when the function/hyperplane (wx+b=0) divides the space.

    • Linear Region of a Relu CNN at outside_default.png: Given the network parameters, all hyperplanes will enter the region divided by the space.

635d3b09dd58a3877507b5b41cee5f95.png

      • The number of Number of Linear Region at outside_default.png:e34e81cef35e76b6c9f8716dd5a66f24.png

      • Maximal Number of Linear Region : c7db2bc5df0662483c8fb686d6bfb829.png=> Used to describe network expressivity.

7834a64553794172bef430f3cdcf86f2.png

    Two theorems related to linear region

07794549f79b9e07242ed6a85acfa629.png

85577396351f903cc6e30638d69f6242.png


Method

For how to design an accuracy predictor that can predict the stage-level representation ability, we innovatively proposed SAR-score to describe the stage-wise network expressivity unbiasedly from the perspective of describing the network expressivity, and based on the prior knowledge of the dataset gt The empirical distribution is used to determine the importance of different stages, and the DDSAR-score is further proposed to describe the accuracy of the detection backbone.


Adopt Theorem2 to charaterize stage-level network expressivity

aec8dbd3cd449754d10a80d30367b68b.png


Two issues ocuur

34b3344bc0c37b760bea3ebcd317a5b1.png

410eaccf1381eb9ac4beb14520a46756.png

44737987e7c1581276a9b332240c4a1b.png

Stage-aware Expressivity Score

f8d4983de6d6808fce24261c5c23cd4e.png

  Design Guidelines:

6a64ce3e295b90fa04111a8337363b8d.png

721281d019a26f3004e4e9d8c1884bbb.png

Filter Sensitivity Score

9ed5c520f882a46a2d590553e29219ee.png

Replace f5b5550c394b9916aa6e4233301b2150.pngwith 507173f9d46ed039406dc7c90239fdb7.pngto describe the sensitivity of different convolutional layers to the filter size.

88e0dcb9e88b69e66f8dd8736195c095.png


SAR-Score and DDSAR-Score

677cff960d6d2e1bf73ec450c4afae84.png

The importance of each stage is further determined based on the distribution of the training set GT

1ea4d201cb3f5f1a7f71c9213ee203e1.png

Serch Space and Evolutionary Architecture Search

b758f5832b015d690ef0a80655973013.png

6ee58b63981702c77d6c0c29beed8749.png

04

result

38724a60e866d614102630fbda2828e4.png

05

Outlook and application

Reduce sensitivity to hyperparameters

During the experiment, we found that DDSAR is not very sensitive to hyperparameters when searching for lightweight (500m) detection structures, and will soon get a good structure, but when searching for detection structures under 2.5G, 10G and 34G Flops, it is not very sensitive. Hyperparameters are sensitive, and \alpha and search space need to be adjusted. The possible reason is that in order to speed up the calculation process, our filter sensitivity score can only approximately reflect the sensitivity to the filter size, and the above-mentioned violent enumeration process can be optimized from other angles later.

Enhance the generalizability of the method on different detection tasks

Our DDSAR-score is a score used to describe the expressiveness of the detector. In theory, it should do well in different detection tasks. We currently only consider the distribution of dataset gt, but different detection datasets also have differences in data quality and dataset size. We can further build datasets based on the data-centric idea to improve the quality of datasets, data enhancement and other dimensions. And the relationship between the accuracy predictor, so that it is very effective on different detection tasks.

More accurate calculation of the number of network linear regions

Describing the expressiveness of the network through the number of linear regions has published many papers in the field of ML, and you can try a more accurate bound or exact number for linear region.

Click to enter —>【Computer Vision】WeChat Technology Exchange Group

The latest CVPP 2023 papers and code download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

Background reply: Transformer review, you can download the latest 3 Transformer review PDFs

目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号
整理不易,请点赞和在看

Guess you like

Origin blog.csdn.net/amusi1994/article/details/130120310