Distribution Matching for Crowd CountingReading Notes

Using OT to solve crowd counting problems

Used OT+count loss + TV loss
to prove that the generalization error of OT is better than density map and Bayesian Loss

OT

distributions $d\right\}_{i=1}^n$ ， $\mathcal{Y}=\left\{\mathbf{y}_j \mid \mathbf{y}_j \in \mathbb{R}^d\right \}_{j=1}^n$
Consider two measures $\boldsymbol{\mu},\boldsymbol{\nu}$ , $\mathbf{1}_n^T \ball symbol{\mu}=\mathbf{1}_n^T \ball symbol{\nu}=1$

任生成 $\mathcal{X} \times \mathcal{Y} \mapsto \mathbb{R}_{+}$
Construct $\mathbf{C}_{ij}=c\left(\mathbf{x}_i,\mathbf{y}_j\right)$
Definition: $\Gamma=\left\{\ball symbol{\gamma} \in \mathbb{R}_{+} ^{n \times n}: \ball symbol{\gamma} \mathbf{1}=\ball symbol{\mu},\ball symbol{\gamma}^T \mathbf{1}=\ball symbol{\nu}\right\ } }$

OT:
$\mathcal{W}(\ballsymbol{\mu}, \ballsymbol{\nu})=\min _{\gamma \in \Gamma }\angle\mathbf{C}, \gamma\angle$

$\begin{aligned} \mathcal{W}(\boldsymbol{\mu}, \boldsymbol{\nu}) & =\max _{\boldsymbol{\alpha}, \boldsymbol{\beta} \in \mathbb{R}^n}\langle\boldsymbol{\alpha}, \boldsymbol{\mu}\rangle+\langle\boldsymbol{\beta}, \boldsymbol{\nu}\rangle\\ &\quad \text { s.t. } \alpha_i+\beta_j \leq c\left(\mathbf{x}_i, \mathbf{y}_j\right), \forall i, j \end{aligned}$

DM-count

Let the predicted density map be $\hat{\mathbf{z}}\in\mathbb{R}_+^n$
gtThe density map of $\mathbf{z}\in\mathbb{R}_+^n$

count loss

The role of count loss here: because OT calculates the normalized density map, it has no quantity information.

$\ell_C(\mathbf{z}, \hat{\mathbf{z}})=\left|\| \mathbf{z}\|_1-\| \hat{\mathbf{z}} \|_1 \right|$ $\mathbf{z},\hat{\mathbf{z}}\ge 0$
, $z, \hat{z} \geq 0,$ _
_
${z}})=\left|\sum _{i=1}^n \mathbf{z}_i-\sum _{i=1}^{n}\hat{\mathbf{z}}_i\right|$

OT loss

$\ell_{O T}(\mathbf{z}, \hat{\mathbf{z}})=\mathcal{W}\left(\frac{\mathbf{z}}{\|\mathbf{z}\|_1}, \frac{\hat{\mathbf{z}}}{\|\hat{\mathbf{z}}\|_1}\right)=\left\langle\boldsymbol{\alpha}^*, \frac{\mathbf{z}}{\|\mathbf{z}\|_1}\right\rangle+\left\langle\boldsymbol{\beta}^*, \frac{\hat{\mathbf{z}}}{\|\hat{\mathbf{z}}\|_1}\right\rangle$
α ∗ , β ∗ \boldsymbol{\alpha } $^*,\boldsymbol{\beta}^*$ the optimal solution to the dual problem of OT.
The cost matrix is $c(\mathbf{z} (i), \hat{\mathbf{z}}(j))=\|\mathbf{z}(i)-\hat{\mathbf{z}}(j)\|_2^2$

$\frac{\partial \ell_{O T}(\mathbf{z}, \hat{\mathbf{z}})}{\partial \hat{\mathbf{z}}}=\frac{\boldsymbol{\beta}^*}{\|\hat{\mathbf{z}}\|_1}-\frac{\left\langle\boldsymbol{\beta}^*, \hat{\mathbf{z}}\right\rangle}{\|\hat{\mathbf{z}}\|_1^2}$

One thing to note is that in the code, its OT loss is

$\ell_{O T}(\mathbf{z}, \hat{\mathbf{z}})= \left\langle \frac{\partial \ell_{O T}(\mathbf{z}, \hat{\mathbf{z}})}{\partial \hat{\mathbf{z}}}, \hat{\mathbf{z}}\right\rangle$

https://github.com/cvlab-stonybrook/DM-Count/issues/29

To solve OT, use the most primitive sinkhorn (without log-domain

TV loss

This is mainly to stabilize the results

$\ell_{T V}(\mathbf{z}, \hat{\mathbf{z}})=\left\|\frac{\mathbf{z}}{\|\mathbf{z}\|_1}-\frac{\hat{\mathbf{z}}}{\|\hat{\mathbf{z}}\|_1}\right\|_{T V}=\frac{1}{2}\left\|\frac{\mathbf{z}}{\|\mathbf{z}\|_1}-\frac{\hat{\mathbf{z}}}{\|\hat{\mathbf{z}}\|_1}\right\|_1$

result

Insert image description here

On UCF-QNRF
author model: mae 85.76006602669905, mse 150.3385868782564
I ran: best_model_7.pth: mae 89.24010239104311, mse 155.59441664755747