01
paper
This article introduces the paper "DamoFD: Digging into Backbone Design on Face Detection" received by the Open Vision Team of DAMO Academy by ICLR 2023, the top international conference on machine learning
Paper link
https://openreview.net/pdf?id=NkJOhtNKX91
Open source code https://github.com/ly19965/EasyFace/tree/master/face_project/face_detection/DamoFD
02
background
Face Detection
The face detection algorithm is to detect the position of the face in a picture or video sequence, and give the specific coordinates of the face, generally rectangular coordinates, which are the key points, attributes, editing, stylization, recognition and other modules of the face. Foundation.
The benchmark used by the academic community to measure the performance of face detectors is [WiderFace](http://shuoyang1213.me/WIDERFACE/WiderFace_Results.html). This dataset mainly clarifies some challenges faced by face detectors, including scale , pose, occlusion, etc.
The research question of this paper is how to automatically search the backbone of a lightweight face detector?
Picture from Wider Face official website
Development History of Lightweight Face Detector
-
Manual Lightweight Face Detector Design: Early lightweight face detectors (FaceBoxes & BlazeFace) all used the single-stage target detector structure (SSD) and replaced it with their own hand-designed backbone module (eg, Faceboxes introduced Crelu, and BlazeFace introduced DW Conv). The common disadvantage of these methods is that they cannot automatically adjust the face detector structure as the computing power constraints change, which limits their application scenarios.
Image from the FaceBoxes paper
-
Nas-Based lightweight face detector: With the rise of Neural Architecture Search (NAS) technology, researchers began to use Nas to automatically design the structure of face detectors, eg, SPNas in BFBox, DARTS in ASFD, RegNet in SCRFD. SCRFD draws on the idea of RegNet to determine the search space of the detector, and creates a lightweight face detector with sota performance. The figure below shows the optimal computing power distribution interval on the backbone obtained by SCRFD
Image from SCRFD paper
03
method
Motivation
The current Nas method mainly consists of two modules, a network generator and an accuracy predictor. Among them, the network generator is used to generate the candidate backbone structure, and the accuracy predictor is used to predict the accuracy of the sampled backbone structure.
Due to the inconsistent task goals of detection and classification, the former pays more attention to the representation of backbone stage-level (c2-c5), while the latter pays more attention to the representation of high-level (c5), which leads to the accuracy predictor used on classification tasks Good at predicting high-level representational capabilities but unable to predict stage-level representational capabilities.
Therefore, in the face detection task, we need an accuracy predictor that can predict the stage-level representation ability to better search the face detection-friendly backbone.
Preliminaries
First we introduce the background knowledge related to our method:
Relu CNN
Linear Region of
-
Region: When the hyperplane is used to divide the space, the connected component that is divided is called a region
Linear Region: refers to the linear region of the piecewise linear function, which represents the largest connected subset obtained when the function/hyperplane (wx+b=0) divides the space.
Linear Region of a Relu CNN at : Given the network parameters, all hyperplanes will enter the region divided by the space.
-
-
The number of Number of Linear Region at :
Maximal Number of Linear Region : => Used to describe network expressivity.
-
Two theorems related to linear region
Method
For how to design an accuracy predictor that can predict the stage-level representation ability, we innovatively proposed SAR-score to describe the stage-wise network expressivity unbiasedly from the perspective of describing the network expressivity, and based on the prior knowledge of the dataset gt The empirical distribution is used to determine the importance of different stages, and the DDSAR-score is further proposed to describe the accuracy of the detection backbone.
Adopt Theorem2 to charaterize stage-level network expressivity
Two issues ocuur
Stage-aware Expressivity Score
Design Guidelines:
Filter Sensitivity Score
Replace with to describe the sensitivity of different convolutional layers to the filter size.
SAR-Score and DDSAR-Score
The importance of each stage is further determined based on the distribution of the training set GT
Serch Space and Evolutionary Architecture Search
04
result
05
Outlook and application
Reduce sensitivity to hyperparameters
During the experiment, we found that DDSAR is not very sensitive to hyperparameters when searching for lightweight (500m) detection structures, and will soon get a good structure, but when searching for detection structures under 2.5G, 10G and 34G Flops, it is not very sensitive. Hyperparameters are sensitive, and \alpha and search space need to be adjusted. The possible reason is that in order to speed up the calculation process, our filter sensitivity score can only approximately reflect the sensitivity to the filter size, and the above-mentioned violent enumeration process can be optimized from other angles later.
Enhance the generalizability of the method on different detection tasks
Our DDSAR-score is a score used to describe the expressiveness of the detector. In theory, it should do well in different detection tasks. We currently only consider the distribution of dataset gt, but different detection datasets also have differences in data quality and dataset size. We can further build datasets based on the data-centric idea to improve the quality of datasets, data enhancement and other dimensions. And the relationship between the accuracy predictor, so that it is very effective on different detection tasks.
More accurate calculation of the number of network linear regions
Describing the expressiveness of the network through the number of linear regions has published many papers in the field of ML, and you can try a more accurate bound or exact number for linear region.
Click to enter —>【Computer Vision】WeChat Technology Exchange Group
The latest CVPP 2023 papers and code download
Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers
Background reply: Transformer review, you can download the latest 3 Transformer review PDFs
目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群
▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!
▲扫码进星球
▲点击上方卡片,关注CVer公众号
整理不易,请点赞和在看