Learning Data Augmentation Strategies for Object Detection(翻译)

      This method is Google brain Quoc Le team, but also to train a target detection model, especially by the amplification data strategy to the new data set training target detection model and then automatically amplified so the algorithm for small data sets and achieved good results on target detection. The paper and open source code is as follows:

Paper Portal:
https://arxiv.org/abs/1906.11172

Code Portal:
https://github.com/tensorflow/tpu/tree/master/models/official/detection

Here is a brief introduction to this paper. For the convenience of their own reading, make the following translation version, there is something wrong please advice.


Summary

Data enhancement is a key component of the depth learning model training. Although data enhancement has been shown to significantly improve image classification, but for target detection, it has not been thoroughly studied its potential. In view of the additional cost of the image is marked for target detection, computer vision tasks, data enhancement may be more important. In this work, we studied the effect of increasing data on the target detection. We first prove that borrowed from the image classification data may help to enhance the operation of train detection model, but the improvement is limited. Therefore, we investigated how to study, professional data enhancement strategies to improve detection model GM performance. Importantly, these tactics will only enhance the impact of training, and will not change a model trained in the evaluation process. Experiments on coco data sets show that data optimization strategies to improve the amplification + 2.3mAP, and allows a single inference model to achieve 50.7mAP. More importantly, the best strategy may be found on the same COCO transferred to other test data sets and models to improve the accuracy of prediction. For example, to determine the best strategy to enhance use of coco improves the pascal-voc + 2.7mAP above. Our results also show that even taking into account the strong baseline study to enhance policy is better than the most advanced structural regularization method. Learning strategy training code is available online.

I. INTRODUCTION

Deep neural network is a powerful machine learning system, if trained on large amounts of data, they work best. In order to increase the amount of training data neural network, a lot of work are committed to creating a better strategy to increase data [3,42,21]. In common enhanced image domain image comprises translating ( translated) into several pixels, or flip the image horizontally. Most modern image classifier are equipped with hand-crafted data enhancement strategies [ 21 , 44, 16, 18, 56]. ( Paper an overview of 21 )

 Recent work has shown that learning from data the best policy, instead of manually increase the strategic design data, can greatly improve the performance of the promotion of image classification model [22,45,8,33,31,54,2,43,37], 5]. For the image classification model can be created from scratch by a learning data generator [33,31,54,2,43] or [5] is increased by a set of data suitable for converting existing learning samples of the training set. [37]. Model for object detection, increasing the amount of data is more important, because of the higher cost of testing data collected labeled for detecting normal sample data set than image classification data set is much less. However, we do not know how to increase the data: Should we add a policy to reuse data directly from the image classification? How should we deal with the content bounding box and surrounded by boxes?

In this work, we have created a set of simple conversion, may be applied to detect an object data set, and then transfer them to convert the detected data set, and other architectures. These conversions only during the training to use, rather than testing time. Our conversion include those that may be applied to the entire image without affecting the position of the bounding box of the converter (e.g. borrowed from the color conversion in the image classification model), changing the position of the frame surrounding the whole affect the conversion image (e.g., translate or shear cutting the entire image), and conversion is applied only to the box enclosing object. When a large number of transitions, effectively manually combinations thereof becomes the extraordinary. Therefore, we designed the search strategy for the design of object detection data. Experiments show that the method has very good performance in different data sets, the data set size, the backbone architecture and detection algorithm. In addition, we also studied the performance-enhancing policy - how depends on the number of operations included in the search space, as well as to enhance the effectiveness of how technology changes with data set size varies.

In short, our main contribution is as follows:

  • Design and Implementation of a search method, and by calculating the new bounding box marked combination, a combination of data and object detection optimization problem enhancement strategy.
  • In a series of tests and the data set architecture, showing the continued growth of the cross-validation accuracy. In particular, for a single model, we go beyond the latest achievements CoCO and the results achieved to compete in the Pascal VOC target detection.
  • By providing a robust regularization, in order to avoid over-fitting on a small object, highlighting the enhanced data learned how small data set is particularly advantageous strategy.

Second, related work

Enhanced data visualization model strategy is often a particular data set, and even machine learning architecture. For example, the most advanced model of highly trained mnist elastic distortions affecting the scale, translation and rotation [42,4,47,40]. Random cropping and mirroring of images generally used for natural image classification model training [51,21]. Data in target detection enhancement strategy, image mirror image and multi-scale training is the most widely used [15]. Object-centric planting is a popular method to increase planting. Some methods are not to focus on the part of the image, but random noise is added to or erased patch image in order to improve the accuracy of [9,53,13], robustness [50,12], or both [29]. Similarly, [48] the learning mode of each closed object to create an example of confrontation. In addition to cutting and erasing, [10] by cutting and pasting to add new objects in the training image.

To avoid data-specific amplification data, recent work has focused on direct amplification strategic learning data from the data itself. For example, using a smart network enhancement, the new data is generated [22] by combining two or more samples from the same category. tran et al Bayesian methods, depending on the distribution of the training set [45] obtained, generating enhancement data. DeVries and Taylor learned feature space used in a simple transformation, such as noise, interpolation and extrapolation, to increase the data [8]. ratner et al using the generated sequence is generated against the network [37] data enhancement operations. Recently, a number of papers using automatic enhanced search space, improved optimization algorithms to more efficiently find the automatic enhancement strategy.

While the above methods have played a role in the classification, but we have taken an automated method to find the optimal data strategy to enhance object detection. The classification is different, the tag data object detection more scarce, because the detection data annotation higher costs. Compared with image classification, developed for enhanced policy data object detection more difficult, because, by distorting the image, the bounding boxes of objects in the size and the position detection data and the introduction of more complex ways. Our goal is to use the set to help verify the accuracy of search for new detection enhancement program, using custom actions to promote between data sets, data set size, the backbone architecture and detection algorithms.

Third, the method

We will enhance the search data as discrete optimization problem to deal with, and optimize the performance of generalization. This work extends previous work focuses on enhancing policy object detection. Object detecting introduces additional complexity to maintain consistency between the warped image and the position of bounding boxes. Note bounding box opens up the possibility of the introduction of enhanced operation, this operation has a unique effect on the content of each bounding box. In addition, we also discussed how to change the bounding box when the geometric transformation applied to the image position.

We will define the amplification strategy for the k th unordered collection of sub-strategy. During training, one of k sub-strategies will be randomly selected and then applied to the current image. Each child policy, there are pictures of N transformation, in turn on the same graph. We [5], a search will learn strategies to enhance the search space by creating a problem into a discrete optimization problem. The search space of k = 5 sub-strategies, each strategy by the n = 2 sub- types of image conversion, sequentially applied to a single image. Further, each operation is also associated, the probability of a given application and the size of the operation and the operation of the two super-parameters. Figure 2 (in text) shows a secondary policy 5 learned. The probability parameter introduces a concept stoic enhanced policy, the selected operation will be applied to enhance the image with the specified probability.

                          

                                                   

      图2:学习的扩充子策略示例。应用于一个示例映像的学习子策略的5个示例。每列对应于对应子的不同随机样本。 政策。增强子策略的每一步都由与操作、应用概率和幅度度量相对应的三重奏组成。对边框进行调整以维护CON。 与应用增强的一致性。注意概率和震级是离散的值(详见文本)。

在几个初步实验中,我们为搜索空间确定了22项操作,这些操作对目标检测是有益的。这些操作都是在张力流中进行的[1]。我们简要地总结了这些操作,但保留附录的细节:

  • 色彩运算.扭曲颜色通道,而不影响包围框的位置(例如,均衡,对比,亮度)。
  • 几何运算.几何上扭曲图像,相应地改变了包围框注释的位置和大小(如旋转、剪切、平移等)。
  • 包围箱操作.只扭曲包围框注释中包含的像素内容(例如,框只相等,框只旋转,框只翻转)。

注意,对于影响图像几何形状的任何操作,我们同样修改了包围框的大小和位置以保持一致性。

我们与每个操作关联一个自定义的参数值范围,并将这个范围映射到一个从0到10的标准化范围。我们将梯度范围离散为L均匀间隔的值,这样这些参数就可以进行离散优化。类似地,我们离散了将操作应用到m均匀间隔值的概率。在初步实验中,我们发现用rl算法设置L=6和m=6可以很好地平衡计算的可跟踪性和学习性能。这样,找到一个好的子策略就变成了在一个包含(22Lm)^2基数的离散空间中的搜索。特别是,要搜索超过5个子策略,搜索空间大约包含(22×6×6)^2×5约等于9.6×10^28的可能性,并且需要有效的搜索技术来导航这个空间。

解决离散优化问题有许多方法,包括强化学习[55]、进化方法[38]和基于序列模型的优化[26]。在本工作中,我们选择将离散优化问题构建为rnn的输出空间,并使用增强学习来更新模型的权重[55]。rnn的训练设置与[55,56,6,5]相似。我们采用近端策略优化(pop)[41]搜索算法。rnn启动了30个步骤来预测单个增强策略。未滚动步骤的数目,30,对应于为枚举5个子策略而必须作出的离散预测的数目。每个子策略由2个操作组成,每个操作由3个预测组成,对应于选定的图像变换、应用概率和变换的大小。

为了训练每个子模型,我们从CoCo训练集中选取了5k图像,因为我们发现直接在整个CoCo数据集上搜索是非常昂贵的。我们发现用这个子集数据识别的策略在提供大量计算节约的同时,也推广到了完整的数据集。简单地说,我们从零开始对每个子模型进行了训练,这些模型是用resnet-50的主干[16]和retinanet探测器[24]用余弦学习速率衰减[30]拍摄的。控制器的奖励信号是一个定制的托管验证集上的mAP,该验证集包含7392图像,创建自CoCo训练集的子集。

rnn控制器接受了超过20K的增强策略训练。搜索使用了400TPU的[20]超过48小时,控制器的超参数与[56]相同。可以利用最近开发的基于人口训练[17]或密度匹配[23]的更有效的搜索方法来加快搜索速度。学习到的策略可以在附录中的表格中看到。

四、Results

我们将我们的自动扩增方法应用到了具有Resnet-50[16]骨干和retinanet[24]的coco数据集上,以便找到良好的扩增策略,推广到其他检测数据集。我们使用在coco上找到的顶级策略,并将其应用于不同的数据集、数据集大小和体系结构配置,以检查通用性以及策略在有限数据系统中的运行方式。

4.1学习数据增强策略

在5kcoco训练图像上寻找学习到的增强策略,最终产生了将用于我们所有结果的增强策略。在检查时,在良好策略中最常用的操作是旋转,旋转整个图像和包围框。在旋转之后,包围框会变得更大,包括所有旋转的物体。尽管旋转操作有这种效果,但它似乎是非常有益的:它是良好策略中最常用的操作。通常使用的另外两个操作是和(仅可翻译)的。将像素值的直方图对齐,并且不修改每个包围框的位置或大小。只以相等的概率将包围框中的对象垂直、向上或向下翻译。

4.2学习的增强策略系统地改善了对象检测

我们评估比较了CoCO数据集[25]在不同主干网结构和检测算法上的顶级增强策略的质量。我们从比较RetinaNet对象开始,检测器采用与[13]相同的训练协议。简单地说,我们从零开始训练,全局批量大小为64,图像大小调整为640×640,学习率为0.08,w衰减为1e-4,α= 0.25和γ=1.5的焦损参数,训练150个epoch,采用逐步衰减法,在120和140时,学习速率降低10倍。所有模型都接受了关于TPU的训练。 [20].

本节和后续章节中使用的基线RetinaNet架构主要采用了针对图像分类训练定制的标准数据扩展技术[24]。这包括进行50%概率的水平翻转和多尺度抖动,其中图像在训练期间随机调整大小在512和786之间,然后裁剪到640 x640。

表1和表2显示了我们在上述过程中使用增强策略的结果。在表1中,学习到的增强策略在多个骨干架构师中实现了系统增益。 改善范围从1.6mAP到2.3mAP。相比之下,应用于ResNet-50[13]的最先进的正则化技术获得了1.7%的MAP(表2)。

                         

为了更好地理解增益来自何处,我们将应用到Resnet-50的数据增强策略分为三个部分:颜色操作、几何操作和bbox-only-operation(表 2)。采用颜色运算仅提高了+0.8mAP的性能。将搜索与几何操作相结合可通过+1.9mAP提高性能的提升。最后,在与前几次操作一起使用时,添加包围框特定操作会产生最好的结果,并比基线提高+2.3%mAP。请注意,该策略仅使用5kCoCo训练示例进行搜索,并且在对整个CoCo数据集进行训练时仍然能很好地推广。

4.3.利用所学的增强策略实现最先进的目标检测

一个好的数据增强策略是一种可以在模型之间、数据集之间进行传输的策略,并且可以很好地工作在不同图像大小的模型上。在这里,我们在一个不同的骨干架构和检测模型上实验学习到的增强策略。为了测试学习的策略如何转移到最先进的检测模型,我们用AmoebaNet-D[38]主干替换了ResNet-50主干网络。检测算法由RetinaNet[24]改为NAS-FPN[14].此外,我们还为AmoebaNetD骨干使用ImageNet预训练,因为我们发现在从头开始的训练中,我们无法取得竞争性的成绩。模型损耗为150个epoch,采用余弦学习率衰减,学习速率为0.08。设置的其余部分与ResNet-50相同。 主干模型除图像大小由640×640增加到1280×1280。

表3显示,在竞争、检测架构和设置的顶部,已学习的增强策略提高了+1.5%的mAP。这些实验还表明,增强策略在不同的主干网体系结构、检测算法、图像大小(即640个→1280像素)和训练过程(从头开始的训练→使用imagenet预训练)之间进行了很好的传输。我们可以通过将图像分辨率从1280增加到1536像素来进一步扩展这些结果,并同样增加[49]之后的探测锚的数量。由于这一模式比以前的模式大得多,我们通过将搜索中的前4个政策结合起来,增加了学习型政策中的子政策,从而导致了20个学习型政策的增加

                     

这些简单修改的结果是第一个单阶段检测系统,实现了先进的单模型50.7mAP在coco上的结果。我们注意到,这一结果只需要图像的一次传递,而以前的结果要求在测试时间[32]的不同空间尺度上对同一图像进行多次评价。此外,这些结果是通过提高图像分辨率和增加锚锚的数量来获得的,这些锚都是为了提高目标检测性能[49,19]。与此相反,以前的最新成果依靠的是对模型结构和正则化方法的大致多重定制修改,以实现这些成果[32]。我们的方法在很大程度上依赖于一个更现代的网络架构和一个学习到的数据增强策略。

4.4.学习的增强策略转移到其他检测数据集。

为了评估完全不同的数据集和另一种不同的检测算法的可移植性,我们在PASCAL VOC数据集上用ResNet-101骨干网训练了一个Faster R_CNN模型。我们比较了Pascal VOC 2007和Pascal VOC 2012的训练集,并在Pascal VOC 2007测试集(4952张图像)上进行了测试。我们的评价标准是平均精度。对于基线模型,我们使用TensorFlow对象检测API[19]和默认的超参数:9个GPU工作者用于异步训练wh 每个工作人员都要处理1批。初始学习速率设置为3×10×4,500 K后下降0.1。训练从CoCO检测模型检查点开始。当崔 根据我们的数据增强策略,我们不会更改任何训练详情,并将我们的策略添加到Coco上,以便进行预处理。这导致在MAP50上改进了2.7%(表4)。

                       

4.5.学习增强策略模拟大型注释数据集的性能 

在本节中,我们进行了实验,以确定如果有更多或更少的训练数据,学习到的增强策略将如何执行。为了进行这些实验,我们使用了coco数据集的子集来制作具有以下图像数的数据集:5000,9000,14000,23000(见表5)。在这个实验中训练的所有模型都是使用resnet-50的有视网膜的骨干,并且在150个时代中不使用成像仪的预先训练。

                        

正如我们所预期的那样,在对模型进行关于较小数据集的训练时,由于所学的扩充政策而得到的改进较大,如图3和表5所示。我们显示,对于在5,000个训练样本上训练的模型,学习到的增强策略可以比基线改善70%以上的mAP。随着训练人数的增加,增加学习的政策的效果下降,尽管改进仍然很显著。值得注意的是,受过学习增强策略训练的模型在检测较小的对象方面似乎做得特别好,尤其是当训练数据集中的图像较少的时候。例如,对于小对象,应用学习到的增强策略似乎比增加50%的数据集大小要好,如表5中所示。对于小物体,使用9000个例子的学习增强策略进行训练,当使用15000个图像时,效果比基线好。在这个场景中,使用我们的增强策略几乎和您的数据集大小加倍一样有效。

                             

另一个有趣的行为是,受过学习增强策略训练的模型在ap75(平均精度iou=0.75)的难度任务上做得相对更好。在图4中,我们绘制了mAP、ap50和ap75的百分比改进图,用于学习增强策略(相对于基线增强)的模型。在所有训练集尺寸上,ap75的相对改进大于ap50。所学的数据增强在ap75上特别有用,这表明增强策略有助于更精确地调整包围框预测。这表明,增强策略特别有助于在包围框位置中学习精细的空间细节——这与观察到的小物体的增益是一致的。

4.6.改进模型正规化

在本节中,我们研究了学习数据增强的正则化效果。我们首先注意到,检测模型的最终训练损失是较低的,当训练在一个更大的训练集(见图5的黑色曲线)。当我们应用所学的数据增强时,所有数据集大小(红色曲线)的训练损失都会显著增加。正则化效应也可以通过观察训练模型的权值的L2范数来看出。在较大的数据集上训练的模型的L2准则较小,而在学习增强策略下训练的模型的L2准则比在基线增强训练下训练的模型的L2准则较小(见图6)。

                  

五、讨论

在本工作中,我们研究了一个学习到的数据增强策略在对象检测性能上的应用。我们发现,在所有考虑的数据大小中,学习到的数据增强策略都是有效的,训练集很小时,则有较大的改进。我们还观察到,由于学习到的数据增加策略在更难的任务上有更大的改进,即检测更小的对象和更精确的检测。

我们还发现,其他成功的正则化技术,如果与学习到的数据增强策略一起应用,是没有好处的。我们用输入错误[52],Manifold Mixup[46]和滴管[13]进行了几个实验。对于所有的方法,我们发现它们既没有帮助也没有伤害模型性能。这是一个有趣的结果,因为所提出的方法独立地优于这些正则化方法,但是很显然,当应用一个学习到的数据增强策略时,这些正则化方法是不需要的

今后的工作将包括将这种方法应用于其他感知领域。例如,学习增加策略的自然延伸将是语义[28]和实例分割[34,7]。同样,点云特性是另一个具有丰富的几何数据增强操作可能性的领域,并且可以从类似这里所采取的方法中受益。获取此类任务的训练实例所需的人工注释成本很高。根据我们的研究结果,所学的增强策略是可以转移的,并且对于在有限的训练数据上训练的模型更有效。因此,投资图书馆学习数据增加政策可能是获取附加人类注释数据的有效替代办法。

Guess you like

Origin blog.csdn.net/qq_37764129/article/details/94754679
Recommended