A Generalized Loss Function for Crowd Counting and Localization reading notes

To put it simply, UOT is used to solve the crowd counting problem.

Code: https://github.com/jia-wan/GeneralizedLoss-Counting-Pytorch.git
I changed it a little: https://github.com/Nightmare4214/GeneralizedLoss-Counting-Pytorch.git

loss

设density map为 A = { ( a i , x i ) } i = 1 n \mathcal{A} =\left\{\left(a_i, \mathbf{x}_i\right)\right\}_{i=1}^{n} A={ (ai,xi)}i=1n
inside ai a_iaiTo predict density, xi ∈ R n \mathbf{x}_i\in\mathbb{R}^nxiRn is the coordinate,nnn is the number of pixels
, leta = [ ai ] i \mathbf{a} = \left[a_i\right]_ia=[ai]i, that is, the density map is converted into a column vector

The real point diagram is B = { ( bj , yj ) } j = 1 m \mathcal{B}=\left\{\left(b_j,\mathbb{y}_j\right)\right\}_{j=1 }^mB={ (bj,yj)}j=1m
where yj \mathbf{y}_jyjis the coordinate, mmm is the number of labeled points,bj b_jbjFor the number of people represented by this point,
this paper assumes that b = [ bj ] j = 1 m \mathbf{b}=\left[b_j\right]_j = \mathbf{1}_mb=[bj]j=1m, that is to say, there is only one person at each point

Properties of the UOT
LC τ ( A , B ) = min ⁡ P ∈ R + n × m ⟨ C , P ⟩ − ϵ H ( P ) + τ D 1 ( P 1 m ∣ a ) + τ D 2 ( PT 1 n ∣ b ) \mathcal{L}_{\mathbf{C}}^{\tau}\left(\mathcal{A},\mathcal{B}\right) = \min_{\mathbf{P} \in\mathbb{R}_+^{n\times m}} \left\angle \mathbf{C},\mathbf{P}\right\rangle -\epsilon H\left(\mathbf{P}\right ) + \tau D_1\left(\mathbf{P}\mathbf{1}_m|\mathbf{a}\right) +\tau D_2\left(\mathbf{P}^T\mathbf{1}_n|\ mathbf{b}\right)LCt(A,B)=PR+n×mminC,PϵH(P)+τD1( P 1ma)+τD2(PT 1nb)
其中 C ∈ R + n × m \mathbf{C}\in\mathbb{R}_+^{n\times m} CR+n×mis the transmission cost matrix, C i , j C_{i,j}Ci,jTo change the density from xi \mathbf{x}_ixiMove to yj \mathbf{y}_jyjThe distance
P \mathbf{P}P的电影电影最好
a ^ = P 1 m , b ^ = PT 1 n \hat{\mathbf{a}} = \mathbf{P}\mathbf{1}_m, \hat{\mathbf{b}}= \mathbf{P}^T\mathbf{1}_na^=P1 _m,b^=PT 1n

This loss has 4 parts.
The first part is the transmission loss, which aims to move the predicted density map closer to the real label. The
second part is the entropy H ( P ) = − ∑ i , j P i , j log ⁡ P i , j H \left(\mathbf{P}\right) = -\sum_{i,j}P_{i,j}\log P_{i,j}H(P)=i,jPi,jlogPi,jIs the entropy regularization term, which is used to control the degree of sparsity. The larger it is, the sparser it is (it will tend to be evenly distributed), and vice versa.

The third part is hope a ^ \hat{\mathbf{a}}a^ neara \mathbf{a}The fourth part of a
is hopeb ^ \hat{\mathbf{b}}b^ nearb \mathbf{b}b

In the paper, D 1 D_1D1Take L 2 L_2L2The square of
D 2 D_2D2Take L 1 L_1L1

cost matrix

C i , j = e 1 η ( x i , y j ) ∥ x i − y j ∥ 2 C_{i,j} = e^{\frac{1}{\eta\left(x_i,y_j\right)}\|\mathbf{x}_i-\mathbf{y}_j\|_2} Ci,j=eh ( xi,yj)1xiyj2
Given the values ​​xi , yj \mathbf{x}_i,\mathbf{y}_jxi,yjIt is normalized
, but please note that eta (xi, yj) \eta\left(x_i,y_j\right) in the codethe(xi,yj) is a constant, the default is0.6 0.60.6

Solve

Using the sinkhorn
P = diag ⁡ ( u ) K diag ⁡ ( v ) , K = exp ⁡ ( − C / ε ) \mathbf{P}=\operatorname{diag}(\mathbf{u}) \mathbf{K } \operatorname{diag}(\mathbf{v}), \quad \mathbf{K}=\exp (-\mathbf{C} / \varepsilon)P=diag(u)Kdiag ( v ) ,K=exp ( C / ε )
here approximatesD 1 , D 2 D_1,D_2D1,D2For the KL function, the equivalent of the smooth function
u ( l + 1 ) = ( a K v ( l ) ) τ τ + ϵ , v ( l + 1 ) = ( b K ⊤ u ( l + 1 ) ) τ τ + ϵ \mathbf{u}^{(\ell+1)}=\left(\frac{\ballsymbol{a}}{\mathbf{K}\mathbf{v}^{(\ell)}}\ right)^{\frac{\tau}{\tau+\epsilon}}, \quad \mathbf{v}^{(\ell+1)}=\left(\frac{\ballsymbol{b}}{\mathbf {K}^{\top}\mathbf{u}^{(\ell+1)}}\right)^{\frac{\tau}{\tau+\epsilon}}u(+1)=(Kv()a)τ + ϵt,v(+1)=(Ku(+1)b)τ + ϵt

(In fact, even if it is KL KLK L divergence, other codes cannot seem to be written like this)

code

data set

preprocessing

Used UCF-QNRF

Preprocessing:
1. Let h, wh,wh,The smaller one in w is at [512, 2048] \left[512,2048\right][512,2048 ] range, and the other is adjusted according to the scaling ratio
2. Filter points that are not in the picture
3. Additional calculation of a distance from each point to other points, specifically
P = ( p 1 T p 2 T ⋮ pm T ), pi ∈ R 2 \mathbf{P} = \begin{pmatrix} \mathbf{p}_1^T\\ \mathbf{p}_2^T\\ \vdots\\ \mathbf{p}_m^T \end{pmatrix },\quad \mathbf{p}_i\in\mathbb{R}^2P= p1Tp2TpmT ,piR2
d i s = [ ∥ p i − p j ∥ ] i , j \mathbf{dis} = \left[\|\mathbf{p}_i-\mathbf{p}_j\|\right]_{i,j} dis=[pipj]i,j

Finally, perform the process of selecting sentinels in quick sorting for each row, and find the 3rd (counting from 0)
pair of 1, 2, 3 1,2,31,2,Average 3 elements (counting from 0 )

def find_dis(point):
    a = point[:, None, :]
    b = point[None, ...]
    dis = np.linalg.norm(a - b, ord=2, axis=-1)  # dis_{i,j} = ||p_i - p_j||
    # mean(4th_min, 2 of the [1st_min, 2nd_min, 3rd_min])
    dis = np.mean(np.partition(dis, 3, axis=1)[:, 1:4], axis=1, keepdims=True)
    
    return dis

因此得到的标签为
P = [ ( x i , y i , d i s i ) ] i ∈ R m × 3 \mathbf{P}=\left[\left(x_i,y_i,dis_i\right)\right]_i\in\mathbb{R}^{m\times 3} P=[(xi,yi,disi)]iRm×3

Read data

Randomly crop the picture to (512, 512) \left(512,512\right)(512,512 )
i, ji,ji,j is the coordinate of the upper left corner of cropping,h = w = 512 h=w=512h=w=512

Then read the tag
according to dis disd i s to set a small rectangle.
Calculate the area of ​​this rectangle in the clipping range, and1 4 of the rectangular area \frac{1}{4}41
If this ratio is greater than 0.3, select this point, otherwise discard it
Insert image description here
, and then the others are randomly flipped horizontally.

Model

vgg19+upsampling+two layers of convolution+abs

train

Note that sinkhorn here has ϵ − scaling heuristic \epsilon-\text{scaling heuristic}ϵScaling heuristic can achieve convergence within 20 rounds.log-domain \text{log-domain}
is also used.log-domain

result

Insert image description here

The results of the model provided by the author: mae 85.09911092883813, mse 150.88815648865386
The results I ran on UCF-QNRF: mae:85.69232401590861, mse:155.30853159819492

Guess you like

Origin blog.csdn.net/qq_39942341/article/details/131785574