He began the study of intelligent conversation.
Target algorithm: Let rounds of conversation as much as possible; not universal reply; do not reply to remark
The main approach: reinforcement learning.
Information 1
Document 1: Song Haoyu Zhang M, Liu Ting (2018) based on DQN open field several rounds of dialogue policy learning Chinese Information Technology, 32 (7), 99. The... Http://jcip.cipsc.org.cn/CN/abstract/ article_2604.shtml
Ideas: several rounds of conversation strategy, so that an increase in the number of rounds; do not consider return generation, consider only the largest overall revenue from existing reply, select Options objective of maximizing
The main data set is: microblogging short text. Weibo text + comments, form a conversation bout.
Information 2
文献2: Cuayahuitl, H., Yu, S., Williamson, A., & Carse, J. (2017). Deep Reinforcement Learning for Multi-Domain Dialogue Systems. Proceedings of the International Joint Conference on Neural Networks, 2017-May, 3339–3346. https://doi.org/10.1109/IJCNN.2017.7966275
Idea: to generate a reply, generate dialogue; reinforcement learning, multi-domain reply text generation
Data set: restaurant and hotel data