题目：SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC（具有深度关联度量的简单在线实时跟踪）

ABSTRACT：

In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches.

在本文中，我们集成了外观信息来提高SORT性能，通过这种扩展，通过长时间的遮挡，我们也能跟踪对象，有效地减少ID跳变。

During online application, we establish measurement-to-track associations using nearest neighbor queries in visual appearance space. Experimental evaluation shows that our extensions reduce the number of identity switches by 45%, achieving overall competitive performance at high frame rates.

在在线应用过程中，我们视觉外观空间上使用最近邻查询建立了跟踪关联的度量。实验评估表明，我们的扩展将ID跳变的数量减少了45%，在高帧率虾实现了整体竞争性能。

INTRODUCTION：

Simple online and realtime tracking (SORT) is a much simpler framework that performs Kalman filtering in image space and frame-by-frame data association using the Hungarian method with an association metric that measures bounding box overlap. This simple approach achieves favor-
able performance at high frame rates.

SORT是一个简单得多的框架，它在图像空间中执行卡尔曼滤波，并使用匈牙利方法和测量边界框重叠的关联度量进行逐帧数据关联。这种简单的方法在高帧速率下实现了良好的性能。

SORT has a deficiency in tracking through occlusions as they typically appear in frontal-view camera scenes. We overcome this issue by replacing the association metric with a more informed metric that combines motion and appearance information.

SORT在通过遮挡进行跟踪方面存在缺陷，因为遮挡通常出现在正面视图摄像机场景中。我们通过将关联度量替换为结合了运动和外观信息的更明智的度量来克服这个问题。

we apply a convolutional neural network (CNN) that has been trained to discriminate pedestrians on a large-scale person re-identification dataset. Through integration of this network weincrease robustness againstmisses andocclusions while keeping the system easy to implement, efficient, and applicable to online scenarios.

我们应用了一个卷积神经网络（CNN），该网络经过训练，可以在大规模的人员再识别数据集上识别行人。通过该网络的集成，我们提高了对遗漏和遮挡的鲁棒性，同时使系统易于实现、高效并适用于在线场景。

SORT WITH DEEP ASSOCIATION METRIC

1.Track Handling and State Estimation（轨迹处理与状态估计）

our tracking scenario is defined on the eight dimensional state space(u, v, γ, h,˙x,˙y,˙γ,˙h)that contains the bounding box center position(u, v), aspect ratioγ, height h, and their respective velocities in image coordinates. We use a standard Kalman filter with constant velocity motion
and linear observation model, where we take the bounding coordinates(u, v, γ, h)as direct observations of the object state.

我们的跟踪场景是在八维状态空间上(u, v, γ, h,˙x,˙y,˙γ,˙h)定义的，其中包含边界框中心位置（u，v）、纵横比γ、高度h以及它们在图像坐标中的各自速度。我们使用了一个标准的卡尔曼滤波器，具有恒定速度运动和线性观测模型，其中我们将边界坐标（u、v、γ、h）作为对象状态的直接观测值。

For each trackkwe count the number of frames since the last successful measurement associationak. This counter is incremented during Kalman filter prediction and reset to 0
when the track has been associated with a measurement.

对于每个轨迹k，我们从上次成功测量关联ak以来计算帧数，该计数器在卡尔曼滤波预测期间递增，并在轨迹与测量值关联时重置为0。

Tracks that exceed a predefined maximum ageAmaxare considered to have left the scene and are deleted from the track set. New track hypotheses are initiated for each detection that cannot be associated to an existing track.These new tracks are classified as tentative during their first three frames. During this time, we expect a successful measurement association at each time step. Tracks that are not successfully associated to a measurement within their first three frames are deleted.

超过预定义最大年龄的轨迹将被视为已离开场景，并从轨迹集中删除。对于无法与现有轨迹关联的每个检测，都会启动新的轨迹假设。这些新轨迹在前三帧中被归类为暂定轨迹。在这段时间内，我们期望在每个时间步都有一个成功的测量关联。删除前三帧内未成功关联到测量的轨迹。

2.Assignment Problem（指派问题）

A conventional way to solve the association between the predicted Kalman states and newly arrived measurements is to build an assignment problem that can be solved using the Hungarian algorithm. Into this problem formulation we integrate motion and appearance information through combination of two appropriate metrics.

解决预测的卡尔曼状态和新到达的测量值之间的关联的传统方法是建立一个分配问题，该问题可以使用匈牙利算法来解决。在这个问题公式中，我们通过组合两个适当的度量来集成运动和外观信息。

To incorporate motion information we use the (squared) Mahalanobis distance between predicted Kalman states and newly arrived measurements:

为了结合运动信息，我们使用预测的卡尔曼状态和新到达的测量值之间的（平方）马氏距离：

where we denote the projection of thei-th track distribution into measurement space by(yi,Si)and thej-th bounding box detection bydj. The Mahalanobis distance takes state estimation uncertainty into account by measuring how many standard deviations the detection is away from the mean track location.

其中，我们用（yi，Si）表示第i个轨迹分布到测量空间的投影，用dj表示第j个边界框检测。马氏距离通过测量检测距离平均轨道位置的标准偏差，将状态估计不确定性考虑在内。

Further, using this metric it is possible to exclude unlikely associations by thresholding the Mahalanobis distance at a95%confidence interval computed from the inverse χ2 distribution. We denote this decision with an indicator

此外，使用该指标可以通过以95%置信区间对马氏距离进行阈值化来排除不可能的关联，该置信区间由χ2逆分布计算得出。我们用一个指标来表示这个决定

that evaluates to1if the association between thei-th track and j-th detection is admissible. For our four dimensional measurement space the corresponding Mahalanobis threshold is t(1)= 9.4877.

如果第i个轨迹和第j个检测之间的关联是可接受的，则计算结果为。对于我们的四维测量空间，相应的马氏阈值t(1)= 9.4877.

当运动的不确定性较低时，马氏距离是一个合适的关联度量，但在图像空间问题公式中，从卡尔曼滤波框架获得的预测状态分布仅提供对象位置的粗略估计。特别是未解释的相机运动可能会在图像平面中引入快速位移，使马氏距离成为通过遮挡进行跟踪的一个相当不知情的度量。

因此，我们将第二个度量引入到指派问题中，对于每个边界框检测dj，我们计算一个外观描述器rj，，此外，为每条轨迹保留，然后计算外观空间中第i个轨迹和第j个检测之间的最小余弦距离：

，

同样，我们也引入一个二进制变量来表示是否允许该关联：

这两个指标通过服务于分配问题的不同方面而相互补充，一方面，马氏距离提供了运动目标的可能位置信息，另一方面，余弦距离考虑了外观信息，这些信息对于长期遮挡后恢复ID特别有用，因此，通过加权将这两个指标结合起来：

如果关联在两个指标的选通区域内，则为容许关联。

3.Matching Cascade（级联匹配）

通过引入级联来解决一系列子问题，优先考虑更常见的现象。

输入：跟踪和检测的数据集以及最大年龄值

计算关联成本矩阵和容许关联矩阵

初始化M,D

选择最近一帧中检测未关联的轨迹子集

根据最小成本算法计算出是否关联成功

更新匹配和未匹配检测的集合

4.Deep Appearance Descriptor（深度特征描述器）

通过使用简单的最近邻查询而无需额外的度量学习，我们的方法的成功应用需要在实际在线跟踪应用之前离线训练具有良好辨别力的特征嵌入。为此，我们使用了一个CNN，该CNN已在大规模的人员重新识别数据集上接受过培训，该数据集包含超过1100000张1261名行人的图像，非常适合在人员跟踪环境中进行深度度量学习。

采用带有六个残差快两个卷积层的残差网络。

EXPERIMENTS

实验性能如下图：

评价指标：

多目标跟踪精度（MOTA）：关于FP,FN和ID跳变的总体跟踪精度总结

多目标跟踪精度（MOTP）：真实值和报告位置之间边界框重叠方面的总体跟踪精度总结

主要追踪（MT）：至少80%的生命周期内具有相同标签的地面真相追踪百分比。

大部分丢失（ML）：在其生命周期中最多20%被跟踪的地面真相跟踪的百分比。

ID跳变：真实值跟踪报告标识更改的次数。

FM：一条轨迹由于丢失检测中断的次数。

学习笔记6——deepsort论文阅读