20,000 words explain in simple terms yolov5+deepsort to achieve target tracking, including complete code, yolov, Kalman filter estimation, ReID target re-identification, Hungarian matching KM algorithm matching

Table of contents

I. Introduction

Two: Tracking part:

ReID structureEdit

First frame (generate track)

second frame

Update the predicted value of the prior

Initialization of state matrix

Update (correct) the predicted values:

After the matching is completed, update the correction, and at the same time update whether the state exceeds the third frame, it needs to be set to 2 to become the confirmation state:

delete. Add track

The third frame is the same as the second frame

The fourth frame starts (cascade matching can be done)

predict

renew

match

cascading match

True cascade matching min_cost_matching

cascade matching function

The index of the optimal matching result is obtained by calculating the distance

ReID feature matching and sports matching

Sports matching below

Kalman filter update diagram

deepsort update process

Effect demonstration:

Complete code + UI interface


I. Introduction

The last two articles talked about the training and detection part of yolov5. This article uses deepsort, ReID and Hungarian algorithm to track the target on the basis of detection.

Two: Tracking part:

# pass detections to deepsort, the model here is only used to contain people in this category. im0s stores the pixels of all pictures in this frame (the first one is (73, 46, 3), hwc)
# We have done so much to get the ID of each object to be consistent. The core idea is to calculate and update the Kalman gain cost matrix, perform matching, and return the result.
# Tracking: If the main object is detected well, the tracking will not be bad
outputs = deepsort.update(xywhs, confss, im0)

Input the obtained xywh and pixel values ​​of the image into the ReID network for feature extraction of human images.

class Extractor(object):
    def __init__(self, model_path, use_cuda=True):
        self.net = Net(reid=True)
        self.device = "cuda" if torch.cuda.is_available() and use_cuda else "cpu"
        state_dict = torch.load(model_path, map_location=torch.device(self.device))[
            'net_dict']
        self.net.load_state_dict(state_dict)
        logger = logging.getLogger("root.tracker")
        logger.info("Loading weights from {}... Done!".format(model_path))
        self.net.to(self.device)
        self.size = (64, 128)
        self.norm = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ])

    def _preprocess(self, im_crops):
        """
        TODO:
            1. to float with scale from 0 to 1
            2. resize to (64, 128) as Market1501 dataset did
            3. concatenate to a numpy array
            3. to torch Tensor
            4. normalize
        """
        def _resize(im, size):
            return cv2.resize(im.astype(np.float32)/255., size)

        im_batch = torch.cat([self.norm(_resize(im, self.size)).unsqueeze(
            0) for im in im_crops], dim=0).float()
        return im_batch

    def __call__(self, im_crops):
        im_batch = self._preprocess(im_crops)
        with torch.no_grad():
            im_batch = im_batch.to(self.device)
            features = self.net(im_batch)  # 这个网络是detection文件里面的
        return features.cpu().numpy()

ReID structureInsert image description here

First frame (generate track)

The first frame only has detection, no track, so no prediction operation will be performed

 # update tracker 先计算预测值,然后基于预测值更新参数。例如追踪每一帧的状态肯定要变的,预测完需要根据观测值来修正,修正后的状态值去估计下一帧
        self.tracker.predict()  # 预测操作:卡尔曼滤波 首先得有track。预测两条公式
        self.tracker.update(detections)  # 更新3条公式,对匹配后的结果进行更新
First, _initiate_track calculates the mean and covariance, and then goes to Track to initialize the attributes. Here is the operation of creating a new track (many attributes need to be added). Each frame is re-appended to sel.tracks. For a new track, the first frame requires line initialization with an initial value, mean (x (k-1)) and error covariance matrix (p (k-1))
 # 先_initiate_track计算出mean和协方差,在去Track进行初始化属性,这里是新建track的操作(要添加很多属性),每一帧都是重新append到sel.tracks中
        for detection_idx in unmatched_detections:
            self._initiate_track(detections[detection_idx])
    def initiate(self, measurement):
        """Create track from unassociated measurement.

        Parameters
        ----------
        measurement : ndarray
            Bounding box coordinates (x, y, a, h) with center position (x, y),
            aspect ratio a, and height h.

        Returns
        -------
        (ndarray, ndarray)
            Returns the mean vector (8 dimensional) and covariance matrix (8x8
            dimensional) of the new track. Unobserved velocities are initialized
            to 0 mean.

        """  # mean是位置信息,各个速度初始化为0;协方差是位置信息之间的关系
        mean_pos = measurement
        mean_vel = np.zeros_like(mean_pos)
        mean = np.r_[mean_pos, mean_vel]  # 设定h是因为当目标远近的适合,xya确实变化都不明显,但是h很明显
        # 感觉这些变量都跟高度关系比较紧密?高度越小就越远,高度越大就越近?与其当前位置无关
        std = [
            2 * self._std_weight_position * measurement[3],
            2 * self._std_weight_position * measurement[3],
            1e-2,
            2 * self._std_weight_position * measurement[3],
            10 * self._std_weight_velocity * measurement[3],
            10 * self._std_weight_velocity * measurement[3],
            1e-5,
            10 * self._std_weight_velocity * measurement[3]]
        covariance = np.diag(np.square(std))
        return mean, covariance

Creates a new trace, initialized based on the incoming unassociated measurements. All the variables of this track are appended to itself. Then add 1 to the number and continue appending to the next track.

    def _initiate_track(self, detection):
        # mean是[242, 266.5, 0.63014, 73, 0, 0, 0, 0],目标状态向量被扩展为包含位置 (x, y)、宽高 (a, h) 和速度 (vx, vy),va角度速度vh高度速度的八维向量
        # 因此,协方差矩阵的维度也是 8x8,表示状态mean的向量各个维度之间的关联和不确定性。
        mean, covariance = self.kf.initiate(detection.to_xyah())
        # 第一帧只做了初始化Track类的东西
        self.tracks.append(Track(
            mean, covariance, self._next_id, self.n_init, self.max_age,
            detection.feature))
        self._next_id += 1  # 表示当前track的id

Traverse all tracks, because these tracks are not confirmed (because it is only the first frame, three consecutive frames are enough),

 for track in self.tracks:
            if not track.is_confirmed():
                continue
            features += track.features  # 已确认目标加入到features列表中
            targets += [track.track_id for _ in track.features]
            # targets.append(track.track_id)遍历track.features有点奇怪,感觉这样就可以了
            # 如果不清空特征列表,而是简单地将当前帧的特征追加到列表中,那么特征列表会越来越长,包含之前帧的特征,这可能会导致不必要的计算和内存消耗。
            track.features = []
        # 最多100个。将已确认的active_targets 保存特征到self.samples = {}里面,为了级联匹配的计算
        self.metric.partial_fit(np.asarray(features), np.asarray(targets), active_targets)

second frame

The front is still the same as the first frame, first get the xywh of target detection, confidence, and pixel value of each target, and then use xywh and the pixel value of each target to obtain the characteristics of each target, which needs to be re-identified later.

Then the advanced prediction part.

self.mean, self.covariance is the last x (k-1) and p (k-1). Perform root prediction to obtain the prior and the covariance of the prior error.
    self.tracker.predict()  # 预测操作:卡尔曼滤波 首先得有track。预测两条公式

    def predict(self, kf):
        """Propagate the state distribution to the current time step using a
        Kalman filter prediction step.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.
                         x, y, a, h, vx, vy, va, vh
        """  # mean是[242, 266.5, 0.63014, 73, 0, 0, 0, 0],目标状态向量被扩展为包含位置 (x, y)、宽高 (a, h) 和速度 (vx, vy),va角度速度vh高度速度的八维向量
        # 因此,协方差矩阵的维度也是 8x8,表示状态mean的向量各个维度之间的关联和不确定性。
        # self.mean, self.covariance是上一次 的x(k-1)和p(k-1),进行根预测得到先验和先验误差的协方差
        self.mean, self.covariance = kf.predict(self.mean, self.covariance)
        self.increment_age()
 kf.predict(self.mean, self.covariance)函数
The reason for setting h is that when the distance of the target is suitable, xya does not change significantly, but h is very obvious, so the following calculations are all based on h

Update the predicted value of the prior

self.mean,  self.covariance   ,进行变量更新self.age += 1 self.time_since_update += 1self.age += 1   self.time_since_update += 1
    def predict(self, mean, covariance):
        """Run Kalman filter prediction step.

        Parameters
        ----------
        mean : ndarray
            The 8 dimensional mean vector of the object state at the previous
            time step.
        covariance : ndarray
            The 8x8 dimensional covariance matrix of the object state at the
            previous time step.

        Returns
        -------
        (ndarray, ndarray)
            Returns the mean vector and covariance matrix of the predicted
            state. Unobserved velocities are initialized to 0 mean.

        """  # mean[3],:设定h是因为当目标远近的适合,xya确实变化都不明显,但是h很明显,所以下面计算都是基于h来计算的
        std_pos = [
            self._std_weight_position * mean[3],   # _std_weight_position是权重,mean[3]是高度,他们相乘用作测量噪声的标准差,好像只考虑了这个?数学估计没有考虑噪音?
            self._std_weight_position * mean[3],
            1e-2,
            self._std_weight_position * mean[3]]
        std_vel = [
            self._std_weight_velocity * mean[3],
            self._std_weight_velocity * mean[3],
            1e-5,
            self._std_weight_velocity * mean[3]]
        motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))  # 看看那些噪音会影响这个系统的状态,先初始化噪声矩阵Q(也是8*8的矩阵)

        mean = np.dot(self._motion_mat, mean)  # x' = Fx 得到预测状态(八个量) 用状态转移矩阵*均值得到下一时刻的状态
        covariance = np.linalg.multi_dot((  # p' = FPF^T + Q  计算新的协方差矩阵
            self._motion_mat, covariance, self._motion_mat.T)) + motion_cov

        return mean, covariance  # 这样就完成了预测操作

Initialization of state matrix

 def __init__(self):
        ndim, dt = 4, 1.

        # 状态转移矩阵(motion matrix)和观测矩阵(update matrix)
        # 在卡尔曼滤波器中,状态转移矩阵描述了系统状态在时间步之间的变化规律。它是一个方阵,其维度与状态向量的维度相同。在这个例子中,self._motion_mat
        # 初始状态转移矩阵8*8。每个状态变量都在时间步之间保持不变。在许多实际应用中,一些状态变量可能在短时间内不发生显著变化,而卡尔曼滤波器可以利用这种先验知识来提高状态估计的准确性。
        #非线性系统:卡尔曼滤波器最初设计用于线性系统。然而,在处理非线性系统时,通常需要使用扩展卡尔曼滤波器(Extended Kalman Filter,EKF)或其他非线性滤波技术
        self._motion_mat = np.eye(2 * ndim, 2 * ndim)
        for i in range(ndim):#修改状态转移矩阵的一部分元素
            self._motion_mat[i, ndim + i] = dt # 第0行第4列置为1等等....
        # 观测矩阵则描述了如何从状态空间中观测到系统的观测值。
        self._update_mat = np.eye(ndim, 2 * ndim)

        # Motion and observation uncertainty are chosen relative to the current
        # state estimate. These weights control the amount of uncertainty in
        # the model. This is a bit hacky.
        self._std_weight_position = 1. / 20  # 位置的方差
        self._std_weight_velocity = 1. / 160  # 速度的方差

Update (correct) the predicted values:

        Different from the first frame, the second frame already has a track. Although cascade matching is not yet available, IOU matching can be performed, because only the confirmed ones will enter the cascade matching, and the unconfirmed ones will be together with the cascaded ones. Carry out iou, there is no confirmed track to be cascaded, so only unconfirmed and detection are used to do iou, as shown in the figure

matches_b, unmatched_tracks_b, unmatched_detections = \
            linear_assignment.min_cost_matching(
                iou_matching.iou_cost, self.max_iou_distance, self.tracks, #第二帧:因为iou_track_candidates是unmatched_detections生成的track,所以第二帧全部都匹配上了,这一点知道就行,不那么zhongyao要
                detections, iou_track_candidates, unmatched_detections)  # 因为 匈牙利匹配会尽可能的多的去匹配,所以要设置一个最大距离,超过就不要了
        # iou_track_candidates里面有些是未被确定的,对于一个新建的track来说:(删除在前三帧中没有匹配到的目标)和unmatched_tracks_a看看是不是超过了70帧,要删除
        matches = matches_a + matches_b

        unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b))
        return matches, unmatched_tracks, unmatched_detections#unmatched_tracks要么是已被确认但是超过70帧,要么是前三帧没检测到,

After the matching is completed, update the correction, and at the same time update whether the state exceeds the third frame, it needs to be set to 2 to become the confirmation state:

# 更新当前track的一些变量
        for track_idx, detection_idx in matches:  # matches是当前跟踪帧和检测帧的匹配结果
            self.tracks[track_idx].update(
                self.kf, detections[detection_idx])

Read the code carefully, it is the same as the formula 

    def update(self, kf, detection):
        """Perform Kalman filter measurement update step and update the feature
        cache.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.
        detection : Detection
            The associated detection.
        """
        # !!!!!更新了当前track的一些变量!!!!!这些值的更新,track的值也跟着更新,怎样才能一步一步往下进行预测
        # ************************************************************************
        # ************这里的每个变量都是针对一个track而言的。***************************
        # ************************************************************************
        self.mean, self.covariance = kf.update(  # 当前的状态值self.mean, self.covariance,当前的框detection(传感器得到的,或者是观测值)
            self.mean, self.covariance, detection.to_xyah())
        self.features.append(detection.feature)  # 保存这个track的每一帧的特征,以后要对比的,最多存100个,好像没说
        self.hits += 1  # 命中次数,每一个track ID都要做在三条公式的更新(他们都有自己的self.hits,)计算自己的帧数是不是到了第三帧了
        self.time_since_update = 0  # 清.零表示当前帧是最新的更新帧。,也就是这个目标刚更新过,数值越大表示距离上次更新越久
        if self.state == TrackState.Tentative and self.hits >= self._n_init:  # self._n_init用来判断当前更新是不是第三帧以上
            self.state = TrackState.Confirmed  # 是第三帧就改变state标志位

kf.update function

    def update(self, mean, covariance, measurement):
        """Run Kalman filter correction step.

        Parameters
        ----------
        mean : ndarray
            The predicted state's mean vector (8 dimensional).
        covariance : ndarray
            The state's covariance matrix (8x8 dimensional).
        measurement : ndarray
            The 4 dimensional measurement vector (x, y, a, h), where (x, y)
            is the center position, a the aspect ratio, and h the height of the
            bounding box.

        Returns
        -------
        (ndarray, ndarray)
            Returns the measurement-corrected state distribution.

        """

        projected_mean, projected_cov = self.project(mean, covariance)  # 映射:一个是为了计算增益,一个是为了计算新的估计值
        # 下面的计算公式,是ppt那边的公式,证明可能公式的由来可能有点难,这里直接拿公式来用
        chol_factor, lower = scipy.linalg.cho_factor(  # 矩阵分解
            projected_cov, lower=True, check_finite=False)
        # 卡尔曼增益。卡尔曼增益表示预测状态和观测之间的关系,用于根据观测值来调整预测状态。
        kalman_gain = scipy.linalg.cho_solve(  # 计算卡尔曼增益
            (chol_factor, lower), np.dot(covariance, self._update_mat.T).T,
            check_finite=False).T  # measurement是传感器的观测值。是更新公式里面的yk,C是projected映射过来的,projected_mean是xk
        # 即观测值与预测状态之间的差异
        innovation = measurement - projected_mean  # y=z-Hx',z为detection的均值向量,不包含速度变化值,

        # z=[cx,cy,r,h],H称为测量矩阵,它将track的均值向量x'映射到检测空间,该公式计算detection和track的均值误差。
        # 计算出更新后的状态估计(均值向量和协方差矩阵)。
        new_mean = mean + np.dot(innovation, kalman_gain.T)  # 新的均值向量x=x'+Ky=x'+k(z-Hx') ppt中的公式
        new_covariance = covariance - np.linalg.multi_dot((  # 新的协方差向量P=P'-KHK^T
            kalman_gain, projected_cov, kalman_gain.T)) #covariance + innovation_cov = projected_cov  #HP'H^T+R
        # 计算出更新后的状态估计(均值向量和协方差矩阵)。
        return new_mean, new_covariance

delete. Add track

In the second frame, the track was just generated in the first frame, but there is no matching detection object in the second frame, so delete it. This may mean that the detection of this target is unstable or that the initialization of the system is not accurate.

 如果一个跟踪目标在有了track后,但是下一帧没有匹配到任何检测对象,这可能意味着该目标的检测结果不稳定或者系统的初始化不准确。
        # 总之,对于一个新建的track来说:(删除在前三帧中没有匹配到的目标)和(连续超过70帧)没有匹配到的目标可以帮助保持目标跟踪系统的准确性、效率和稳定性。
        # 如果它是超过了三帧已确认的话,那就只能判断有没有超过70帧了
        for track_idx in unmatched_tracks:
            self.tracks[track_idx].mark_missed()


    def mark_missed(self):
        """Mark this track as missed (no association at the current time step).
        """
        # 为什么要判断TrackState.Tentative这个??还不懂,懂了,因为大于3帧的时候state就是2了,然后只能判断是不是70帧都没有进行更新了
        if self.state == TrackState.Tentative:
            self.state = TrackState.Deleted
        elif self.time_since_update > self._max_age:
            self.state = TrackState.Deleted

Before the third frame, the function cannot perform real matching, but the feature and target of the saved track must be stored, up to 100

# 保存特征到self.samples = {}里面,为了级联匹配的计算
    def partial_fit(self, features, targets, active_targets):
        """Update the distance metric with new data.

        Parameters
        ----------
        features : ndarray
            An NxM matrix of N features of dimensionality M.
        targets : ndarray
            An integer array of associated target identities.
        active_targets : List[int]
            A list of targets that are currently present in the scene.

        example:
            假设 self.budget = 3,self.samples[target] 是一个包含 6 个元素的列表 [1, 2, 3, 4, 5, 6]。
            如果使用 self.samples[target][-self.budget:],则取出最近的 3 个元素,即 [4, 5, 6]。
            如果使用 self.samples[target][:self.budget],则取出最早的 3 个元素,即 [1, 2, 3]。
        """
        # 将特征和跟踪目标的ID配对,并将它们存储在字典 self.samples 中。每个目标都对应一个特征列表,其中存储了该目标在不同时间步的特征。
        for feature, target in zip(features, targets):
            self.samples.setdefault(target, []).append(feature)
            if self.budget is not None:
                self.samples[target] = self.samples[target][-self.budget:]  # 这里才是才是100个,因为一个track存100个512的特征已经很多了
        # 只保留与活跃目标相关的特征列表。避免存储过多的无效信息。并以 k 作为键,将 self.samples[k] 的值作为对应的值,构建一个新的字典。
        # 这里不是存100个,可以存无限个,上面才是100个,因为一个track存100个512的特征已经很多了
        self.samples = {k: self.samples[k] for k in active_targets}

The third frame is the same as the second frame

The fourth frame starts (cascade matching can be done)

Because at the end of the third frame, some variables of the track will be updated, so some variables will be updated, including the number of frames, the last frame

Time_since_update is set to 0 to indicate that the previous frame has been updated, etc.

predict

The previous initialization is the same as the first, second and third frames, and the prediction is also the same as the second frame. First, predict the prior self.mean, self.covariance, and update the variables self.age += 1 self.time_since_update += 1

renew

match

matches, unmatched_tracks, unmatched_detections = \
            self._match(detections)

If the track at this time has matches in the first three frames, append it to confirmed_tracks.append(i)

# 跟踪对象状态判断 确认还是未确认
        confirmed_tracks = []
        unconfirmed_tracks = []

        for i, t in enumerate(self.tracks):
            if t.is_confirmed():  # 一个新建的track,必须经过三次更新才会得到self.state = TrackState.Confirmed,也就是需要三帧
                confirmed_tracks.append(i)
            else:
                unconfirmed_tracks.append(i)

cascading match

self.tracks is all self.tracks before this frame including newly initialized ones that have exceeded three frames.
detections is the target measured by yolo detection in the current frame (including its coordinates, confidence, and human characteristics)
confirmed_tracks is a track that exceeds three frames
# 级联匹配: 将确认的track进行级联匹配,筛选出mathch的
        matches_a, unmatched_tracks_a, unmatched_detections = \
            linear_assignment.matching_cascade(
                gated_metric, self.metric.matching_threshold, self.max_age,
                self.tracks, detections, confirmed_tracks)
Cascade matching: 1. Appearance information (128-dimensional features) 2. Motion information (position of track predicted based on Kalman filter)

Because it will be deleted after more than 70 frames, so the first thing is to ensure that the tracks within 70 frames can be cascaded. For example, the 65th frame does not match the detection object, but it may be matched to the 66th frame. That’s what it means

So to traverse 70 times

 for level in range(cascade_depth):

        if len(unmatched_detections) == 0:
            break
        # 使得一些可能由于目标形变、遮挡或检测器误检等原因导致匹配困难的目标有更多的机会进行匹配。
        track_indices_l = [        #tracks[k].time_since_update == 1是优先遍历上一次更新的了track
            k for k in track_indices  # tracks就是self.track
            if tracks[k].time_since_update == 1 + level
        ]
        # 如果在当前级联匹配的层级中,没有任何跟踪目标需要进行匹配,可以跳过当前层级的匹配过程,继续处理下一个层级的匹配
        if len(track_indices_l) == 0:
            continue
        # ppt第28页。这里只是计算前70帧大家的匹配结果,真正判断是在mark_missed这里判断,反正如果是三帧以上了,以后每次都匹配不到,
        # 那么每一帧他都会+1,直到70帧他就会被mark_missed删除的了
        matches_l, _, unmatched_detections = \
            min_cost_matching(
                distance_metric, max_distance, tracks, detections,
                track_indices_l, unmatched_detections)  # unmatched_detections检测对象,track_indices_l优先遍历的track
        matches += matches_l
    # 计算未匹配的跟踪目标索引列表 set(track_indices)获取所有跟踪目标索引的集合➖set(k for k, _ in matches)获取已匹配的跟踪目标索引集合
    unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))
    # unmatched_tracks表示剩下几个跟踪对象没有匹配成功,unmatched_detections表示剩下几个检测对象没有匹配成功,
    # 守门员应该是unmatched_tracks(之前与跟踪结果 ,但后来消失了一会)
    return matches, unmatched_tracks, unmatched_detections

True cascade matching min_cost_matching

Let me state it first: the track here has been confirmed, so be sure to remember it, it’s very important!
Tracks is the confirmed track, detections is the frame of yolo, track_indices_l is the track that is traversed first, because there are 70 levels, the less the number of frames is lost, the earlier the match, unmatched_detections detects the object, I don’t know why it is called unmatched_detections, called matched_detections is better understood
 matches_l, _, unmatched_detections = \
            min_cost_matching(
                distance_metric, max_distance, tracks, detections,
                track_indices_l, unmatched_detections)  # unmatched_detections检测对象,track_indices_l优先遍历的track

cascade matching function

Take the values ​​estimated by Kalman filter, that is, the variables of the track that are updated every frame, match and filter with the yolo detection, and limit some useless target frames through the threshold. The purpose of this is to filter out those that are related to the current state. Distribute inconsistent correlations to improve correlation accuracy and reliability.

Construct a cost matrix. The first and second frames here are both 16, so it is a 16*16 cost matrix. When calculating the cascade matching filter in the first step, distance_metric calculates the target feature vector and the detection feature vector. the distance between
mean: mean vector of state distribution (8 dimensions). covariance: The covariance matrix of the state distribution (8x8 dimensions). measurements only_position: optional parameter, if True, only consider the center position of the bounding box when calculating the distance. Default is False.
#  #这段代码是一个目标跟踪算法的一部分。它的目的是计算目标和特征之间的相似性度量,并生成一个代价矩阵(cost_matrix),用于表示目标与特征之间的关联代价。
        def gated_metric(tracks, dets, track_indices, detection_indices):
            # 从检测到的物体中提取特征(features)和目标跟踪器的ID(targets)。特征可以是物体的视觉特征,例如图像中的颜色、纹理、形状等。目标跟踪器的ID是唯一标识跟踪器的值。
            features = np.array([dets[i].feature for i in detection_indices])
            targets = np.array([tracks[i].track_id for i in track_indices])
            # 通过计算目标和特征之间的距离或相似性,可以得到一个初始的关联矩阵,其中较小的值表示更可能的匹配。
            # 然后,通过进一步的处理(例如卡尔曼滤波器进行无效化操作),可以对关联进行筛选和调整,以提高匹配的准确性和可靠性。
            cost_matrix = self.metric.distance(features, targets)
            # 通过调用 linear_assignment.gate_cost_matrix 函数,使用卡尔曼滤波器的状态分布来无效化代价矩阵中的不可行条目
            cost_matrix = linear_assignment.gate_cost_matrix(
                self.kf, cost_matrix, tracks, dets, track_indices,
                detection_indices)

            return cost_matrix

 cost_matrix is ​​the matrix of iou distance calculated by distance_metric (16*16), (1-iou is used)

        The first part: In the cascade matching filtering, the value greater than max_distance (0.2) is assigned (the Mahalanobis distance, the large one can be as large as 100000, the small one is about 0.01) is 0.20001, which is a little larger. The second part: filtering greater than max_distance in iou (0.7) is assigned a value of 7.00001, which is a little larger

The index of the optimal matching result is obtained by calculating the distance

def min_cost_matching(
        distance_metric, max_distance, tracks, detections, track_indices=None,
        detection_indices=None):
 
    if track_indices is None:
        track_indices = np.arange(len(tracks))
    if detection_indices is None:
        detection_indices = np.arange(len(detections))

    # 构建代价矩阵,我这里第一帧和第二帧都是16,所以是16*16的代价矩阵
    if len(detection_indices) == 0 or len(track_indices) == 0:
        return [], track_indices, detection_indices  # Nothing to match.
    # 在第一步计算级联匹配筛选的时候,distance_metric计算出来的是目标特征向量和检测特征向量之间的距离。第二部计算iou筛选的时候,
    cost_matrix = distance_metric(#cost_matrix是通过distance_metric计算得到的iou距离的矩阵(16*16),(使用了1-iou)
        tracks, detections, track_indices, detection_indices)
    # 第一部分:级联匹配筛选中,大于max_distance(0.2)赋值(马氏距离,大的可以大到100000,小的有0.01左右)为0.20001,就大一点点,第二部分:在iou筛选大于max_distance(0.7)赋值为7.00001,就大一点点
    cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5
    # 匈牙利算法或者KM算法匹配:也称为二分图中的最小权重匹配问题,解决最小权重匹配问题,并返回最优分配的行索引和列索引。根据实际问题的要求,你可以选择最小化或最大化目标函数,通过设置参数maximize=True来计算最大权重匹配问题。
    row_indices, col_indices = linear_assignment(cost_matrix)  # !!!!cost_matrix是一个二维矩阵,表示跟踪结果和检测结果之间的相似性或匹配程度。

   Rows and columns are traversed separately to find tracking objects and detection objects that match successfully and fail to match. The row index corresponds to the tracking object, and the column index corresponds to the detection object. These indexes allow you to determine which tracking and detection objects are successfully matched.

 # 遍历检测对象.....行和列分开遍历是为了找到匹配成功和匹配失败的跟踪对象和检测对象。行索引对应跟踪对象,列索引对应检测对象。通过这些索引,可以确定哪些跟踪对象和检测对象成功匹配。
    # 假设有15个检测对象,10个跟踪对象 经过匹配算法后其中5个跟踪对象成功匹配到了相应的检测对象,那么现在unmatched_detections就是10
    for col, detection_idx in enumerate(detection_indices):
        # 如果跟踪目标(好像是状态转移矩阵的出来的)和检测出来对象没有匹配,那么这大概率是个新目标或者上一帧没有被跟踪到的目标。则加入到没有匹配成功的检测对象
        # 可能的情况:新目标(因为新目标没有对应的target),跟踪丢失,目标离开视野(跟踪的目标可能在当前帧中已经离开了摄像机的视野范围,因此无法被检测到。)
        # 检测器失败(在某些情况下,检测器可能无法正确检测目标,例如目标形变、低分辨率图像或运动模糊等。) 检测器误检
        if col not in col_indices:
            unmatched_detections.append(detection_idx)

    # 那么unmatched_tracks是5
    for row, track_idx in enumerate(track_indices):
        # 之前跟踪了,但是这次消失的。足球守门员
        # 对于跟踪结果没有匹配到任何检测结果的情况,可以理解为检测结果没有与之对应的跟踪结果。在这种情况下,可能发生遮挡、
        # 检测器误检、目标离开视野或跟踪器失败等情况,导致检测结果无法与之前的跟踪结果进行匹配。
        if row not in row_indices:
            unmatched_tracks.append(track_idx)
    # 又重新遍历row_indices,col_indices???不是重新遍历了,可能存在误判,第四帧有这种情况,误判后的结果肯定是 大于max_distance的,也要加入unmatched中
    for row, col in zip(row_indices, col_indices):  # 全都是匹配到了的。而且小于阈值的了
        track_idx = track_indices[row]
        detection_idx = detection_indices[col]

        if cost_matrix[row, col] > max_distance:
            unmatched_tracks.append(track_idx)  # 这里是行号,好像一定要行号,因为上面构建矩阵的时候已经定死了
            unmatched_detections.append(detection_idx)  # 这里是列号
        else:
            matches.append((track_idx, detection_idx))  # 其他符合的,就加入匹配成功matches

    # 三个返回值:matches:第二批领导iou(小于0.7的), 如果这里unmatched_detections还有值的话,那就是新目标,需要初始化track
    return matches, unmatched_tracks, unmatched_detections

After cascade matching, IOU screening is performed again. The unmatched_tracks_a here must be a confirmed track. This part of the tracking target was updated in the previous frame.

There are undetermined ones in iou_track_candidates. For a newly created track: (delete the targets that are not matched in the first three frames) and unmatched_tracks_a to see if it exceeds 70 frames and needs to be deleted.

Regardless of the distance or the cascaded iou, min_cost_matching will be executed for filtering. The first part: in the cascade matching filtering, the value greater than max_distance (0.2) (the Mahalanobis distance can be as large as 100000, and the small one is about 0.01) is 0.20001, just a little bigger, part two: filtering in iou is greater than max_distance (0.7) and assign a value of 7.00001, just a little bigger

def min_cost_matching(
        distance_metric, max_distance, tracks, detections, track_indices=None,
        detection_indices=None):
   
    if track_indices is None:
        track_indices = np.arange(len(tracks))
    if detection_indices is None:
        detection_indices = np.arange(len(detections))

    # 构建代价矩阵,我这里第一帧和第二帧都是16,所以是16*16的代价矩阵
    if len(detection_indices) == 0 or len(track_indices) == 0:
        return [], track_indices, detection_indices  # Nothing to match.
    # 在第一步计算级联匹配筛选的时候,distance_metric计算出来的是目标特征向量和检测特征向量之间的距离。第二部计算iou筛选的时候,
    cost_matrix = distance_metric(#cost_matrix是通过distance_metric计算得到的iou距离的矩阵(16*16),(使用了1-iou)
        tracks, detections, track_indices, detection_indices)
    # 第一部分:级联匹配筛选中,大于max_distance(0.2)赋值(马氏距离,大的可以大到100000,小的有0.01左右)为0.20001,就大一点点,第二部分:在iou筛选大于max_distance(0.7)赋值为7.00001,就大一点点
    cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5
    # 匈牙利算法或者KM算法匹配:也称为二分图中的最小权重匹配问题,解决最小权重匹配问题,并返回最优分配的行索引和列索引。根据实际问题的要求,你可以选择最小化或最大化目标函数,通过设置参数maximize=True来计算最大权重匹配问题。
    row_indices, col_indices = linear_assignment(cost_matrix)  # !!!!cost_matrix是一个二维矩阵,表示跟踪结果和检测结果之间的相似性或匹配程度。

    matches, unmatched_tracks, unmatched_detections = [], [], []

    # 遍历检测对象.....行和列分开遍历是为了找到匹配成功和匹配失败的跟踪对象和检测对象。行索引对应跟踪对象,列索引对应检测对象。通过这些索引,可以确定哪些跟踪对象和检测对象成功匹配。
    # 假设有15个检测对象,10个跟踪对象 经过匹配算法后其中5个跟踪对象成功匹配到了相应的检测对象,那么现在unmatched_detections就是10
    for col, detection_idx in enumerate(detection_indices):
        # 如果跟踪目标(好像是状态转移矩阵的出来的)和检测出来对象没有匹配,那么这大概率是个新目标或者上一帧没有被跟踪到的目标。则加入到没有匹配成功的检测对象
        # 可能的情况:新目标(因为新目标没有对应的target),跟踪丢失,目标离开视野(跟踪的目标可能在当前帧中已经离开了摄像机的视野范围,因此无法被检测到。)
        # 检测器失败(在某些情况下,检测器可能无法正确检测目标,例如目标形变、低分辨率图像或运动模糊等。) 检测器误检
        if col not in col_indices:
            unmatched_detections.append(detection_idx)

    # 那么unmatched_tracks是5
    for row, track_idx in enumerate(track_indices):
        # 之前跟踪了,但是这次消失的。足球守门员
        # 对于跟踪结果没有匹配到任何检测结果的情况,可以理解为检测结果没有与之对应的跟踪结果。在这种情况下,可能发生遮挡、
        # 检测器误检、目标离开视野或跟踪器失败等情况,导致检测结果无法与之前的跟踪结果进行匹配。
        if row not in row_indices:
            unmatched_tracks.append(track_idx)
    # 又重新遍历row_indices,col_indices???不是重新遍历了,可能存在误判,第四帧有这种情况,误判后的结果肯定是 大于max_distance的,也要加入unmatched中
    for row, col in zip(row_indices, col_indices):  # 全都是匹配到了的。而且小于阈值的了
        track_idx = track_indices[row]
        detection_idx = detection_indices[col]

        if cost_matrix[row, col] > max_distance:
            unmatched_tracks.append(track_idx)  # 这里是行号,好像一定要行号,因为上面构建矩阵的时候已经定死了
            unmatched_detections.append(detection_idx)  # 这里是列号
        else:
            matches.append((track_idx, detection_idx))  # 其他符合的,就加入匹配成功matches

    # 三个返回值:matches:第二批领导iou(小于0.7的), 如果这里unmatched_detections还有值的话,那就是新目标,需要初始化track
    return matches, unmatched_tracks, unmatched_detections

       The tracks that were not deleted in the end are stored in features and targets.

self.metric.partial_fit(np.asarray(features), np.asarray(targets), active_targets)

This is ReID data, used for pedestrian re-identification.

ReID feature matching and sports matching

Obtain Reid's cost matrix, but it is not enough. We also need motion matching linear_assignment.gate_cost_matrix

 def distance(self, features, targets):

        cost_matrix = np.zeros((len(targets), len(features)))
        for i, target in enumerate(targets):
            cost_matrix[i, :] = self._metric(self.samples[target], features)#找到当前目标的target和特征
        return cost_matrix
def _nn_cosine_distance(x, y):
    # 对特征进行计算余弦相似度,返回离它最近的
    distances = _cosine_distance(x, y)
    return distances.min(axis=0)
    def _match(self, detections):

        def gated_metric(tracks, dets, track_indices, detection_indices):
            # 从检测到的物体中提取特征(features)和目标跟踪器的ID(targets)。特征可以是物体的视觉特征,例如图像中的颜色、纹理、形状等。目标跟踪器的ID是唯一标识跟踪器的值。
            features = np.array([dets[i].feature for i in detection_indices])
            targets = np.array([tracks[i].track_id for i in track_indices])
            #得到Reid的代价矩阵
            cost_matrix = self.metric.distance(features, targets)
            # 得到特征的代价矩阵再一步进行运动匹配(与估计值与测量值进行匹配)
            cost_matrix = linear_assignment.gate_cost_matrix(
                self.kf, cost_matrix, tracks, dets, track_indices,
                detection_indices)

            return cost_matrix

Sports matching below

# 这个函数会计算目标跟踪器的状态分布与测量之间的马氏距离(Mahalanobis distance)
def gate_cost_matrix(
        kf, cost_matrix, tracks, detections, track_indices, detection_indices,
        gated_cost=INFTY_COST, only_position=False):

    gating_dim = 2 if only_position else 4
    gating_threshold = kalman_filter.chi2inv95[gating_dim]
    measurements = np.asarray(
        [detections[i].to_xyah() for i in detection_indices])
    for row, track_idx in enumerate(track_indices):#对于每一个track都要计算它跟detection之间的
        track = tracks[track_idx]
        # 计算状态分布与测量之间的马氏距离(Mahalanobis distance)。并将超过门限值的马氏距离对应的代价矩阵条目设置为gated_cost。
        # 这样做的目的是过滤掉那些与当前状态分布不一致的关联,以提高关联的准确性和可靠性。
        gating_distance = kf.gating_distance(  # gating_distance方法用于计算状态分布与一组测量之间的门限距离(gating distance)。
            track.mean, track.covariance, measurements, only_position)
        cost_matrix[row, gating_distance > gating_threshold] = gated_cost#gating_threshold阈值,
    return cost_matrix  # 最后,返回经过无效化操作后的代价矩阵,该矩阵将用于进一步的关联和匹配过程。
    # mean:状态分布的均值向量(8维)。
    # covariance:状态分布的协方差矩阵(8x8维)。
    # measurements:包含N个测量值的Nx4维矩阵,每个测量值的格式为(x, y, a, h),其中(x, y)是边界框的中心位置,a是宽高比,h是高度。
    # only_position:可选参数,如果为True,则仅在计算距离时考虑边界框的中心位置。默认为False。

After passing the Hungarian algorithm, perform iou screening

# 匈牙利的门限,因为他会尽可能的匹配多个物体
        matches_b, unmatched_tracks_b, unmatched_detections = \
            linear_assignment.min_cost_matching(
                iou_matching.iou_cost, self.max_iou_distance, self.tracks, #第二帧:因为iou_track_candidates是unmatched_detections生成的track,所以第二帧全部都匹配上了,这一点知道就行,不那么zhongyao要
                detections, iou_track_candidates, unmatched_detections)  # 因为 匈牙利匹配会尽可能的多的去匹配,所以要设置一个最大距离,超过就不要了
        # iou_track_candidates里面有些是未被确定的,对于一个新建的track来说:(删除在前三帧中没有匹配到的目标)和unmatched_tracks_a看看是不是超过了70帧,要删除
        matches = matches_a + matches_b
unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b))
        return matches, unmatched_tracks, unmatched_detections#unmatched_tracks要么是已被确认但是超过70帧,要么是前三帧没检测到,

Kalman filter update diagram

deepsort update process

Effect demonstration:

 

history record

 

Complete code + UI interface

Videos, notes and codes, and comments have all been uploaded to the network disk and placed on the homepage as a top article

Guess you like

Origin blog.csdn.net/m0_56175815/article/details/131177488