Literature reading notes - Social Ways

Cited

Amirian J, Hayet J B, Pettre J. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs[J]. 2019.

The article is the second Social LSTM, Social GAN model to further enhance, overlooking the database in the ETH ideal monitor, forecast data on UCY. Key contributions include:

  1. It introduced a mechanism to make the model of independent allocation of attention focus on interaction information.
  2. Enhance the predictive ability of the model to track more than reasonable.
  3. There is provided a multi-track is able to verify the predictive ability of the models of the scene and tracks small synthetic determination index generating effect.


Model Framework

As shown above, the basic framework of the article is a GAN network, without considering the batch in the batch, the pedestrian model individually for each predicted trajectory.

  • In the Generator, the prediction for a pedestrian to be \ (I \) , will first of all known locus encoding a pedestrian, then based on (I \) \ geographical and motion information between a pedestrian and the other, so that the mechanism is introduced attention model interactive information to other pedestrians autonomous adaptation. Pedestrian \ (i \) locus coding, interactive information pooling of attention, noise, latent code (the introduction of new content, will be mentioned later) four input as input Decoder, the decoder pedestrians \ (i \) of the predicted trajectory.
  • In Discriminator, the track will generate / real trajectory determination, it is determined as a result of Generator / Discriminator cost function.
  • Specifically, the model framework is InfoGAN , InfoGAN networking solutions is to generate distributed without the supervision of a latent code by modifying a tendency to control the GAN and its emphasis on controlling latent code generated compared with GAN, compared with cGAN emphasize able unsupervised (no data labels) learning potential in the category of data. Thus the newly introduced network GAN Latent Code and Information Loss two structures.


HighLight 1 - attentional mechanisms

Attention mechanism using the Key-Value-Query type definitions, introducing appropriate manual index from cognitive indicators based on the model to generate different attention surrounding locus.

  • = = The Value Key \ (H_t \) (except the target pedestrian \ (I \) , the other encoded information track pedestrians).
  • Query: \ (F ij of ^ {} \) information synthesized from three motions geographic movement
    • \ (I \) and \ (J \) Euclidean distance between
    • \ (I \) and \ (J \) the angle between the direction of the movement.
    • The current athletic stance, \ (i \) and \ (j \) the shortest distance in the future will emerge.

\ [\ Sigma (f {i} ^, f ^ k) = {{N-1} \ over \ sqrt d _ {\ sigma}} <f ^ {i}, W_ \ sigma h ^ k> \]

\[\alpha^{i,j}={exp(\sigma(f^{ij},h^j)) \over \sum_{k \neq i} exp(\sigma(f^{ik},h^k))}\]


Highlight 2 - InfoGAN

InfoGAN model interpretation: https://www.jiqizhixin.com/articles/2018-10-29-21

Model structure

Model structure InfoGAN GAN improved compared to the smaller, the model above, the first is the addition of Latent Code in the input, and then abandoned in SGAN loss function L2, the addition of a Discriminator subnetwork \ (Q \) produced Information Loss.

Introduction Principles

  1. Motivation: ideal state after the training is adjusted InfoGAN Latent Code (latent Code) - \ (C \) distribution of generated input. However, high flexibility GAN freedom, the network is easy to ignore the existence of direct Latent Code, it must adjust the cost function allows network importance Latent Code exist. InfoGAN desired mutual information \ (the I \) as the optimization target, \ (the I \) The greater the potential generated code and the relationship:

    \[I(X;Y) = H(X) - H(X|Y)\]

    \[I(c;G(z,c))\]

  2. Restriction: obtaining \ (I (c; G ( z, c)) \) required to generate data based on the latent code \ (x \ sim G (z , c) \) posterior probability, it is very difficult to obtain, and therefore use \ (Q (C | X) \) (secondary distribution) approximate Solution for the posterior probability of \ (P (C | X) \) . So far, data to refine the study two aspects: \ (Q (c | the X-) \) the ability to fit and Generator sensitivity of c .

  3. The Target : To mutual information maximization \ (I (c; G ( z, c)) \) is to maximize \ (- H (X-| the Y) \) , but is not required because, by so \ (- E_ {c \ sim P (c ), x \ sim G (z, c)} [logQ (c | x)] \) represents \ (- H (X | Y ) \) lower bound maximization of mutual information conversion to maximize the mutual information lower bound :

    \[E_{c \sim P(c),x \sim G(z,c)}[logQ(c|x)] + H(c) \leq I(c;G(z,c))\]

    After that, it transforms proven, the above equation is converted to:

    \[L_1(G,Q)=E_{x \sim G(z,c)}[E_{c' \sim P(c|x)}log(Q(c'|x))] + H(c)\]

    Here, I can not quite understand why you want to continue to transform, what is its purpose?

    After eventually joined GAN loss function, the overall objective optimization changes:

    \ [Min_Gmax_DV_1 (D, G) = V (D, G) - \ mu L_1 (G, Q) \]

  4. Implement : Ways Social model in the implementation, there will be no loss of function theory looked so tall on the function. Q is actually a latent code reconstructor (unspoken code restorer), is implemented by a fully connected neural network, and with Discriminator training. Information Loss actually refers latent recovered code Q \ (\ hat c \) real code and latent \ (C \) the MSE between .


HighLight 3 - Multi-generated scene predicted trajectory

GAN trajectory prediction model introduces an important purpose is to help generate a plurality of tracks (distribution), the article is to explore the predictive ability of the model to different types of GAN multi-track, specially artificially generate a test scenario (as shown below):

  • Blue is a known trajectory, red is to be predicted trajectory.
  • Trace generated from six directions, and turn generates three specific branch track in each direction.

Different baseline model prediction results generated by the different iterations as shown, which proves to InfoGAN plurality trajectory prediction reasonable effectiveness, which can recognize the trajectory of the various possibilities in a shorter period of iterations:

In addition, the article also used the 1-Nearest Neighbor classifier and Earth Mover's Distance are two ways to generate quality real future trajectory and the trajectory are evaluated:

  • For 1-Nearest Neighbor classifier, nearly 50% the better.
  • For EMD, the lower the better.

Guess you like

Origin www.cnblogs.com/sinoyou/p/11512830.html