Literature Read the report - Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

  1. paper:Gupta A , Johnson J , Fei-Fei L , et al. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks[J]. 2018.
  2. code:https://github.com/agrimgupta92/sgan


Overview

This paper presents the trajectory of one kind GAN architecture using trained predictive models, Generator by the Encoder-Decoder structures, Discriminator composed of Decoder designed from rationality, diversity and predict the speed of many of the existing model upgrade and so on.

To solve problems

  1. In line with social norms track: focus on the feasibility of the predicted trajectory generated in the community rules, intersect other models generate more reasonable path on a qualitative assessment.
  2. Diversification of the track: using ADE when traditional assessment model and FDE index, quantitative evaluation of the optimization model is good, but it often leads to mild single predicted trajectory, which does not match with the reality of the scene in the trajectory of diverse situations.
  3. Predict speedup: Vanilla LSTM SGAN VS VS Soicla LSTM: VS 56x 16x 1x VS, speed has improved significantly.


Model innovation

  1. Propose a new loss function -Variety Loss: Minimum Over N reference to loss function, which encourages Generator loss function generating a plurality of possible paths. - diversification trajectory
  2. The new model proposed pooling: Pooling model for LSTMs exchange information, SGAN the Social LSTM each step of pooling model become known trajectory change stage only once pooled (default forecast period each step were pooled), while the fixed range pooled local range expanding to global to all pedestrians . - conform to social norms trajectory , predicted faster .
  3. The GAN trajectory prediction model is applied in the sequence generation task: GAN been widely used in the visual process, but the natural language processing sequence involves less model, mainly because of the operation of the output generator is transmitted to the determination Nondifferentiable of.


Read questions

  1. The article mentions the influence GAN apply reason in the domain model is to generate a sequence to the operating discriminator is not differentiable, why?
  2. The final output SGAN generator is two-dimensional coordinate track Decoder hidden by the MLP (multilayer perceptron) obtained, but Social LSTM prediction is based on two-dimensional coordinates satisfying a two-dimensional Gaussian distribution hidden, such an assumption does not use SGAN because the method can not be micro-propagation in the reverse direction, why?


SGAN overall architecture model

GAN given cGAN

Chinese, also known as GAN generate adversarial network, is Goodfellowone way, and others made, designed to maximize the likelihood of the lower bound of the training data, which contains more mathematical principles and derivation, the author of this narrative is not specific, only implementation level several features of the GAN Description:

  1. Components of the GAN: GAN generator and a discriminator by the composition, but does not require the generator to be determined by the neural network is composed, it may be other mathematical model. Thus GAN actually training a frame, wherein the physical reality vary due to, for example, in a particular GAN Social model, and generating classifiers are neural networks, and using Encoder-Decoder when generating structure, when determined using Encoder structure, the core part of series models .

    \[\min_{G}\max_{D}V(G,D) = E_{x \sim p_{data}}[logD(x)]+E_{x \sim p(z)}[log(1- D(G(z)))]\]

  2. cGAN : GAN-based network, the results generated by the generator based on the input random initialization vector (e.g. LSTM model as input the random initialization Hidden State), but the goal of the network is to generate a predicted trajectory based on the known trajectory , Thus the need to generate the input synthesized according to the existing information.

    FIG SGAN structure below, the Generator To further detail some of the words, the real Decoder generator is composed of LSTN part, and the anterior and middle Encoder Pooling Module actually ready to initialize Hidden State Decoder preprocessing member .

  1. GAN training process : Object Builder and discriminator when GAN training, and only builder object under test.

    1. Iteration (epoch / iteration), the generator, and respectively via the discriminator g_stepsand d_stepsstep training, each iteration of training classifiers to separate d_stepsviews, retraining generator separately.
    2. Training arbiter : Each step training, for the same period of the known path, the trajectory discriminator will accept the true and false tracks from the database and the generator, and two tracks of true and false to assess, against a loss function based discrimination It is determined for two tracks .
    3. Training generator : training each step, the generator will generate a false track path in accordance with a known period, and true and false judgments referred determiner, against loss function is determined based on a false determination of the trajectory .


SGAN structure

Social GAN分为Generator和Discriminator:

  1. Generator:生成器由Encoder、Pooling Module和Decoder组成。

    1. Encoder使用LSTM序列模型实现,用于将行人的历史轨迹信息编码。最终输出的隐藏状态\(h_{ei}^{t_{obs}}\),将包含整个轨迹的信息。

      \[e_i^t = \phi (x_i^t, y_i^t, W_{ee})\]

      \[h_{ei}^t = LSTM(h_{ei}^{t-1}, e_i^t;W_{encoder})\]

    2. Pooling Module使用max pooling实现,用于共享行人间信息。最终输出的是\(c_i^t\),作为Decoder输入的一部分。

      \[P_i = PM(h_{e1}^{t_{obs}},h_{e2}^{t_{obs}},h_{e3}^{t_{obs}}...)\]

      \[c_i^t = \gamma (P_i, h_{ei}^{t_{obs}};W_c)\]

      *\(\gamma(.)\)是使用Relu的多层感知机(含有多个隐藏层的全连接层)

    3. Decoder使用LSTM序列模型实现,用于生成预测的轨迹。不同于其他LSTM,其Hidden State初始值并不随机,而是由\(h_{di}^t = [c_i^t, z]\)拼接而成,前者为PM生成的结果,后者是加入的随机噪音以便生成多种轨迹。Decoder实际可被看做是带输入条件的生成器。

      这其中需要:注意实验默认的Pooling Module在Decode阶段每步运行都会进行池化。

      \[e_i^t = \phi(x_i^{t-1}, y_i^{t-1}, W_{ed})\]

      \[P_i = PM(h_{d1}^{t-1},...,h_{dn}^{t-1})\]

      \[h_{di}^t = LSTM(\gamma(P_i,h_{di}^{t-1}),e_i^t,W_{decoder})\]

      \[(\hat{x_i^t},\hat{y_i^t}) = \gamma(h_{di}^t)\]

      *\(\gamma(.)\)是使用Relu的多层感知机

  2. Discriminator:判别器结构相对简单,由一个LSTM实现的Decoder和对[Decoder输出, 已知轨迹部分]进行多层感知的全连接层组成,最终输出对于路径真假性的评分。


模型特点与创新

损失函数

SGAN模型训练是分别针对生成器和判别器的,因而两部分的损失函数也需要分别定义,SGAN的损失函数基础量是Adversarial Loss,除此之外还附加了Variety Loss增加路径生成的多样性

  1. 生成器

    \[L_G = L_{adversarial}+L_{variety}\]

    1. \(L_{adversarial}\):惩罚“生成的轨迹被判别器判为假”:判别器对轨迹的scores与[0]向量的交叉熵。
    2. \(L_{variety} = \min_k||Y_i - \hat Y_i^{(k)}||_2\):这是基于\(L_2\)损失改进的,k指代Generator中在生成Decoder的初始隐藏状态时,\(z\)的随机取样次数。按原文来讲,该函数只惩罚\(L_2\)误差最小的预测路径,鼓励“hedge its bets”(多下注,留退路),生成多种可行的路径。(与MoN损失函数类似,但并未在此领域使用过)。
  2. 判别器

    \[L_D = L_{adversarial}\]

    1. 惩罚“生成的轨迹被判别器判为真”:判别器对轨迹的scores与[0.7-1.2]向量的交叉熵。
    2. 惩罚“真实的轨迹被判别器判为假”:判别器对轨迹的scores与[0]向量的交叉熵。


池化模块

SGAN提出了异于Social Pooling的新型池化模型,这种池化模型将全局行人的信息纳入考量,并且源信息在LSTMs的Hidden States基础上增加了行人间的位置信息。后续的实验结果表明新的池化模型在量化指标上稍逊Social Pooing,但生成轨迹更符合社会规则。

此处有几点需要注意:

  1. Pooing Module的输入由[Hidden States, Relative Location]两部分组成——每个LSTMs的隐藏状态和其他行人对目标行人的相对位置x,原文中在两处分别提到了这两个数据源,但并没有统一结合说明。
  2. 由于不同场景的人数不相同,模型为保证对池化结果维数相同,使用的是max pooling,对于每个行人(num_ped,N)的张量变为(1, N)。
  3. 实现代码中有关相对位置的计算和批量矩阵化运算的实现细节比较巧妙,如有需求请参考代码model.py - PoolHiddenNet部分和实验代码解析。


其他

  1. 路径数据相对VS绝对:虽然文章中仅在Pooling Module部分重点提出过使用相对位置(不同人在同一时刻之间),但经过通过阅读实验代码,生成器从输入到最终的输出,都是相对位置(同一人某时刻相较于前一时刻的位置变化),而绝对位置虽传入模型但仅作为计算相对位置、合成绝对位置、计算grid等功能。

  2. 生成器输出:在Social LSTM中,作者基于LSTM最终输出的隐藏状态呈现位置信息的二维高斯分布,并以此预测位置和计算损失;但在SGAN中,文章以该方法在反向传播时不可微的原因使用多层感知机直接预测二维目标,并用\(L_2\)计算损失。

    Question

    为什么用二维高斯分布的假设预测位置时,会出现反向传播时不可微的情况?


模型评估与实验

  1. 实验数据库:ETH和UCY,4种场景,1536条行人轨迹,未经归一化处理。

  2. 评价指标:ADE - Average Displacement Error,FDE - Final Displacement Error。鉴于SGAN生成路径的多样性,评价时将对一条路径的多种预测取最小误差作为结果

  3. 无关变量控制

    1. 预测时间:输入3.2秒,预测3.2秒或4.8秒。
    2. SGAN实验模型的编号为SGAN-kVP-N:kV表示训练时使用Variety Loss的生成次数(1表示没有使用Variety Loss);有无P表示是否使用新型池化结构;N表示计算Error前,对于一条已知路径生成了多少条备选路径。
  4. 定量实验结论

    1. 全场最佳:SGAN模型编号SGAN-20V-20整体表现最佳,SGAN-20VP-20在量化结果上稍逊前者(后文解释)。

    2. 多样化输出显著,模型对噪音敏感:若在评估时只取模型随机生成的一个轨迹,那么量化指标结果差于Social LSTM,这表示模型对噪音\(\alpha\)是敏感的。同时,随着评估参考轨迹的数量上升,评估结果也显著提高,最高在\(k=100\)(100条轨迹中选误差最小的)时能够降低33%的错误率。

    3. 速度提升显著:得益于池化结构简化,SGAN生成速度可达Social LSTM的16倍。

      注:Social LSTM整体表现比Vanilla LSTM差,原文章的实验结果使用真实数据训练+加强数据测试的策略无法复现。

  5. 定性实验结论

    虽然具有新型池化结构的模型比原池化的模型的数据表现略逊一筹,但将轨迹数据可视化后,新型池化的预测要比原模型更符合社会规则性。文中特别提取几种常见社交场景进行对比,具体请参见原文:

    1. 冲突场景:一对一相遇、一对多相遇、追尾式相遇、带有角度的侧面相遇。

    2. 人群聚合场景、人群回避场景(人人间互相回避)、人群跟随场景


Guess you like

Origin www.cnblogs.com/sinoyou/p/11370602.html