密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 摘要前言篇

[ TPAMI 2021 ]

Multi-Task Learning for Dense Prediction Tasks:

A Survey

[ The authors ]

Simon Vandenhende, Wouter Van Gansbeke and Marc Proesmans

Center for Processing Speech and Images, Department Electrical Engineering, KU Leuven.

Stamatios Georgoulis and Dengxin Dai

Computer Vision Lab, Department Electrical Engineering, ETH Zurich.

Luc Van Gool

Center for Processing Speech and Images, KU Leuven;

Computer Vision Lab, ETH Zurich.

[ Paper | Code ]

Multi-Task Learning for Dense Prediction Tasks: A Survey

GitHub - SimonVandenhende/Multi-Task-Learning-PyTorch: PyTorch implementation of multi-task learning architectures, incl. MTI-Net (ECCV2020).

Figure 1 shows a structured overview of the paper. Our code is made publicly available to ease the adoption of the reviewed MTL techniques: https://github.com/ SimonVandenhende/Multi-Task-Learning-PyTorch.

[ CSDN Links ]

该综述全篇过长,故将其分为 4 部分分别讲解,相关博客链接如下:

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 摘要前言篇

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 网络结构篇 (上)

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 网络结构篇 (下)

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 优化方法篇

____________________\triangledown____________________

目录

Abstract

1  Introduction


Abstract

With the advent of deep learning, many dense prediction tasks, i.e. tasks that produce pixel-level predictions, have seen significant performance improvements. The typical approach is to learn these tasks in isolation, that is, a separate neural network is trained for each individual task. Yet, recent multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint, by jointly tackling multiple tasks through a learned shared representation.

In this survey, we provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision, explicitly emphasizing on dense prediction tasks.

Our contributions concern the following.

First, we consider MTL from a network architecture point-of-view. We include an extensive overview and discuss the advantages/disadvantages of recent popular MTL models.

Second, we examine various optimization methods to tackle the joint learning of multiple tasks. We summarize the qualitative elements of these works and explore their commonalities and differences.

Finally, we provide an extensive experimental evaluation across a variety of dense prediction benchmarks to examine the pros and cons of the different methods, including both architectural and optimization based strategies.

随着深度学习的出现,许多密集的预测任务,即产生像素级预测的任务,都有了显著的性能改进。典型的方法是孤立地学习这些任务,即针对每个单独的任务训练一个单独的神经网络。然而,最近的多任务学习 (MTL) 技术通过学习共享表示来联合处理多个任务,在性能、计算和/或内存占用方面显示出了有前景的结果。

在本综述中,提供了一个深度学习 MTL 方法在计算机视觉中的全面视角,侧重于密集的预测任务。

本文的贡献涉及以下方面。

首先,从网络体系结构的角度来考虑 MTL。包括一个广泛的概述,并讨论了最近流行的 MTL 模型的优点/缺点。

其次,研究了解决多任务联合学习的各种优化方法。总结了这些作品的定性要素,并探讨了它们的共性和差异。

最后,在各种密集的预测基准测试中提供了广泛的实验评估,以检查不同方法的优缺点,包括架构和基于优化的策略。

1  Introduction

Over the last decade, neural networks have shown impressive results for a multitude of tasks, such as semantic segmentation [1], instance segmentation [2] and monocular depth estimation [3]. Traditionally, these tasks are tackled in isolation, i.e. a separate neural network is trained for each task.

Yet, many real-world problems are inherently multi-modal. For example, an autonomous car should be able to segment the lane markings, detect all instances in the scene, estimate their distance and trajectory, etc., in order to safely navigate itself in its surroundings. Similarly, an intelligent advertisement system should be able to detect the presence of people in its viewpoint, understand their gender and age group, analyze their appearance, track where they are looking at, etc., in order to provide personalized content.

At the same time, humans are remarkably good at solving many tasks concurrently. Biological data processing appears to follow a multi-tasking strategy too: instead of separating tasks and tackling them in isolation, different processes seem to share the same early processing layers in the brain (see V1 in macaques [4]).

The aforementioned observations have motivated researchers to develop generalized deep learning models that given an input can infer all desired task outputs.

研究背景介绍:实际的需求和应用价值

在过去的十年中,神经网络在许多任务中显示了令人印象深刻的结果,如语义分割,实例分割和单目深度估计。传统上,这些任务是孤立处理的,即为每个任务训练一个单独的神经网络。

然而,许多现实世界的问题本质上是多模式的。例如,自动驾驶汽车应该能够分割车道标记,检测场景中的所有实例,估计它们的距离和轨迹等,以便在其周围安全导航。同样的,智能广告系统应该能够从它的角度检测出人们的存在,了解他们的性别和年龄,分析他们的外表,跟踪他们在看什么,等等,以便提供个性化的内容。

与此同时,人类非常擅长同时解决许多任务。生物数据处理似乎也遵循多任务策略:不同的过程似乎共享大脑中相同的早期处理层,而不是将任务分开并单独处理它们。

上述观察结果促使研究人员开发了广义深度学习模型,即给定输入即可推断出所有期望的任务输出。

Multi-Task Learning (MTL) [30] aims to improve such generalization by leveraging domain-specific information contained in the training signals of related tasks.

In the deep learning era, MTL translates to designing networks capable of learning shared representations from multi-task supervisory signals.

Compared to the single-task case, where each individual task is solved separately by its own network, such multi-task networks bring several advantages to the table.

First, due to their inherent layer sharing, the resulting memory footprint is substantially reduced.

Second, as they explicitly avoid to repeatedly calculate the features in the shared layers, once for every task, they show increased inference speeds.

Most importantly, they have the potential for improved performance if the associated tasks share complementary information, or act as a regularizer for one another.

多任务学习 (MTL) 旨在通过利用相关任务的训练信号中包含的领域特定信息来改进这种泛化

在深度学习时代,MTL 指的是设计能够从多任务监控信号中学习共享表示的网络。

与单任务的情况下,每个单独的任务都由它自己的网络单独解决的情况相比,这种多任务网络给表带来了几个优点

首先,由于它们固有的层共享,所产生的内存占用大大减少

其次,由于他们明确地避免重复计算共享层中的特征,每个任务计算一次,他们显示出更高的推理速度

最重要的是,如果相关联的任务共享互补的信息,或相互充当规则器,则它们具有提高性能的潜力。

Scope. In this survey, we study deep learning approaches for MTL in computer vision. We refer the interested reader to [31] for an overview of MTL in other application domains, such as natural language processing [32], speech recognition [33], bioinformatics [34], etc. Most importantly, we emphasize on solving multiple pixel-level or dense prediction tasks, rather than multiple image-level classification tasks, a case that has been mostly under-explored in MTL. Tackling multiple dense prediction tasks differs in several aspects from solving multiple classification tasks. First, as jointly learning multiple dense prediction tasks is governed by the use of different loss functions, unlike classification tasks that mostly use cross-entropy losses, additional consideration is required to avoid a scenario where some tasks overwhelm the others during training. Second, opposed to image-level classification tasks, dense prediction tasks can not be directly predicted from a shared global image representation [35], which renders the network design more difficult. Third, pixel-level tasks in scene understanding often have similar characteristics [14], and these similarities can potentially be used to boost the performance under a MTL setup. A popular example is semantic segmentation and depth estimation [13].

在本综述中,研究了计算机视觉中 MTL 的深度学习方法。有兴趣的读者可以通过 [31] 了解 MTL 在其他应用领域的概况,如自然语言处理 [32]、语音识别 [33]、生物信息学 [34]等。最重要的是,本文强调解决多像素级或密集的预测任务,而不是多图像级分类任务,这种情况在 MTL 中尚未得到充分探索

解决多个密集预测任务与解决多个分类任务有几个不同之处。

首先,由于联合学习多个密集预测任务由不同的损失函数控制,不像分类任务主要使用交叉熵损失,需要额外考虑,以避免在训练过程中某些任务压倒其他任务的情况。(损失函数使用方法不同)

其次,相对于图像级分类任务,密集的预测任务不能直接从共享的全局图像表示进行预测,这使得网络设计更加困难。(网络设计方法不同)

第三,场景理解中的像素级任务通常具有类似的特征,这些相似性可以潜在地用于在 MTL 设置下提高性能。一个典型的例子是语义分割和深度估计 [13]。

[31] A survey on multi-task learning:arxiv, 2017.

Motivation. The abundant literature on MTL is rather fragmented. For example, we identify two main groups of works on deep multi-task architectures in Section 2 that have been considered largely independent from each other. Moreover, there is limited agreement on the used evaluation metrics and benchmarks. This paper aims to provide a more unified view on the topic. Additionally, we provide a comprehensive experimental study where different groups of works are evaluated in an apples-to-apples comparison.

关于 MTL 的大量文献相当零散。例如,在本文第 2 节中确定了两组关于深度多任务架构的主要工作,它们在很大程度上是相互独立的。此外,对使用的评价指标和基准也没有达成一致。本文旨在对这一问题提供一个较为统一的观点。此外,本文提供了一个全面的实验研究,在横向比较(同类比较)中评估了两组 MTL 工作。

Related work. MTL has been the subject of several surveys [30], [31], [36], [37]. In [30], Caruana showed that MTL can be beneficial as it allows for the acquisition of inductive bias through the inclusion of related additional tasks into the training pipeline. The author showcased the use of MTL in artificial neural networks, decision trees and k-nearest neighbors methods, but this study is placed in the very early days of neural networks, rendering it outdated in the deep learning era. Ruder [36] gave an overview of recent MTL techniques (e.g. [5], [6], [9], [19]) applied in deep neural networks. In the same vein, Zhang and Yang [31] provided a survey that includes feature learning, low-rank, task clustering, task relation learning, and decomposition approaches for MTL. Yet, both works are literature review studies without an empirical evaluation or comparison of the presented techniques. Finally, Gong et al. [37] benchmarked several optimization techniques (e.g. [8], [19]) across three MTL datasets. Still, the scope of this study is rather limited, and explicitly focuses on the optimization aspect. Most importantly, all prior studies provide a general overview on MTL without giving specific attention to dense prediction tasks that are of utmost importance in computer vision.

MTL survey 工作已经在 [30],[31],[36],[37] 进行了研究。

[30] 中,Caruana 表明 MTL 是有益的,因为它允许通过将相关的额外任务纳入 train pipeline 来获得归纳偏差。作者展示了 MTL 在人工神经网络、决策树和 k-最近邻方法中的使用,但该研究处于神经网络的早期阶段,在深度学习时代显得过时

Ruder [36] 概述了最近应用于深度神经网络的 MTL 技术 (如 [5],[6],[9],[19])。同样,Zhang 和Yang [31] 对 MTL 的特征学习、低秩、任务聚类、任务关系学习和分解方法进行了综述。然而,这两个工作都是文献综述研究,没有实证评价或比较提出的技术

最后,Gong et al. [37] 在三个 MTL 数据集上对几种优化技术 (如 [8]、[19]) 进行了基准测试。然而,这项研究的范围是相当有限的,明确地集中在优化方面。

最重要的是,所有之前的研究都提供了对 MTL 的总体概述,但没有特别关注在计算机视觉中至关重要的密集预测任务

[30] Multitask learning : Machine learning, 1997.

[36] An overview of multi-task learning in deep neural networks :  arXiv, 2017.

Paper overview. In the following sections, we provide a well-rounded view on state-of-the-art MTL techniques that fall within the defined scope. Section 2 considers different deep multi-task architectures, categorizing them into two main groups: encoder- and decoder-focused approaches. Section 3 surveys various optimization techniques for balancing the influence of the tasks when updating the network’s weights. We consider the majority of task balancing, adversarial and modulation techniques. In Section 4, we provide an extensive experimental evaluation across different datasets both within the scope of each group of methods (e.g. encoder-focused approaches) as well as across groups of methods (e.g. encoder- vs decoder-focused approaches). Section 5 discusses the relations of MTL with other fields. Section 6 concludes the paper.

在下面的小节中,本文将全面介绍定义范围内的最先进的 MTL 技术。

第 2 节考虑了不同的深度多任务体系结构,将它们分为两大类:以编码器和解码器为重点的方法。

第 3 节考察了在更新网络权重时平衡任务影响的各种优化技术。本文考虑了大部分的任务平衡,包括对抗和调制技术。

第 4 节为对比实验部分,本文在每组方法 (例如以编码器为中心的方法) 的范围内以及在方法 (例如以编码器为中心的方法 vs 以解码器为中心的方法) 的不同数据集上提供了广泛的实验评估。

第 5 节讨论了 MTL 与其他领域的关系

第 6 节对全文进行总结

更多内容,请继续阅读博客:

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 网络结构篇 (上)

____________________\triangle____________________

[ Links ]

 该综述全篇过长,故将其分为 4 部分分别讲解,相关博客链接如下:

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 摘要前言篇

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 网络结构篇 (上)

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 网络结构篇 (下)

密集预测任务的多任务学习(Multi-Task Learning)研究综述 - 优化方法篇

____________________\triangledown____________________

[ Extension ]

Multi-Task Learning with Deep Neural Networks: A Survey (2020)

Guess you like

Origin blog.csdn.net/u014546828/article/details/121221048