Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning

Cody Rioux, Sadid A. Hasan, Yllias Chali

##Abstract

  • Achieve the largest coverage of the docu
    ments content.目标的覆盖整个文档的内容
  • Concentrate distributed information to hidden units layer by layer. 通过一层一层的隐藏单元集中分散的信息
  • the whole deep architecture is fine tuned by minimizing the information loss of reconstruction validation. 整个框架是减少重建确认时发生的信息丢失
  • According to the concentrated information, dynamic programming is used to seek most informative set of sentences as the summary
    DP被用来计算最有信息量的集合,来作为摘要
    ##Relatedwork
  • We explore the use of SARSA which is a derivative of TD(lamada) that models the action space in addition to the state space modelled by TD(lamada). Furthermore we explore the use of an algorithm not based on temporal difference methods, but instead on policy iteration techniques
  • REAPER (Relatedness-focused Extractive Automatic
    summary Preparation Exploiting Reinfocement learning)
    以相关性为中心的抽取自动摘要准备利用强化学习
    ##Motivation
    TD(lamada) is relatively old as far as reinforcement learning (RL)
    algorithms are concerned, and the optimal ILP did not outperform ASRL using the same reward function.
    强化学习有很大打提升空间
    基于查询的摘要得到广泛关注
    不对句子压缩的效果做进一步探讨
    ##Model
  • TD(lamada)
    时间差(TD)学习是一种基于预测的机器学习方法。它主要用于强化学习问题,据说是“ 蒙特卡罗思想和动态规划(DP)思想的结合”。[1] TD类似于蒙特卡洛方法,因为它根据某种策略通过对环境进行采样来学习,并且与动态规划技术相关,因为它基于先前学习的估计来逼近其当前估计(称为自举)。TD学习算法与动物学习的时间差模型有关。[2]
    temporal difference methods-wiki
  • Approximate Policy Iteration
    近似策略迭代(API)遵循一个不同的范式,通过迭代地改进马尔可夫决策过程的策略,直到策略收敛为止。
  • Sarsa算法
    Q算法是当选择下一步的时候 会找最好的一个走(选最大Q值的) 而sarsa是当选择下一步的时候 运用和上一步一样/想等的Q值 但是最后都会更新之前的一步从而达到学习的效果~
    On-policy Sarsa算法与Off-policy Q learning对比
    ##Experiment
  • Feature Space depends on the presence of top bigrams,而不用
    tf *idf words
  • Reward Function
    based on the n-gram concurrence score metric
    the longest-common-subsequence recall metric
  1. Immediate Rewards
  2. Query Focused Rewards

猜你喜欢

转载自blog.csdn.net/houking_can/article/details/83548776