Dialogue to Action: Building Task-Oriented Dialogue Systems via Action-Level Generation

Table of contents

Dialogue to Action: Building Task-Oriented Dialogue Systems via Action-Level Generation

1 Introduction

2 frame description

2.1 Overview

2.2 The first step: dialog action construction

2.3 Step 2: Response Normalization Response Normalization

2.4 Step 3: Action Sequence Prediction

2.5 Step 4: Generate Response

3 experiments

3.1 Experimental setup

3.2 Main results

3.3 In-depth analysis

4 Conclusion

5 Profile of the host

6Company Profile


Dialogue to Action: Building Task-Oriented Dialogue Systems via Action-Level Generation

Dialog-to-Actions: Building Task-Oriented Dialogue System via Action-Level Generation

Abstract: End-to-end generation based methods have been studied and applied in task-oriented dialogue systems. However, in industrial scenarios, existing methods face bottlenecks in reliability (e.g. inconsistent domain responses, repetition issues, etc.) and efficiency (e.g. long computation time, etc.). In this paper, we propose a task-oriented dialogue system. Specifically, we first construct dialogue actions from large-scale dialogues, and represent each natural language (NL) response as a sequence of dialogue actions. Furthermore, we train a sequence-to-sequence model that takes the dialogue history as input and outputs a sequence of dialogue actions. The generated dialogue actions are converted into spoken responses. Experimental results show that the method has good performance, and has the advantages of reliability and efficiency.

Keywords: task-oriented dialogue system, action-level generation, dialogue-action

1 Introduction

Recently, end-to-end generation methods that directly output appropriate natural language responses or API calls have been intensively studied in task-oriented chatbots [2, 5, 9, 16, 22] and proved to be useful for real-world business value, especially after-sales customer service [1,7,8,13,14,19,21,24,25]. Generation-based methods are based on large-scale pre-trained language models [10, 11], which have the advantage of simpler architecture and anthropomorphic interactions. Despite significant progress, we find that these token-level generation methods have the following two limitations in practical scenarios.

1. Token-level generation methods have limited reliability, which is critical for task-oriented industrial dialogue systems. Due to the nature of pre-trained language models, the model may generate responses learned from the pre-trained corpus. In some cases, such responses are nonsensical and semantically inconsistent with the current business domain, disrupting online interactions. Worse, the model occasionally produces repeated responses across multiple episodes (e.g., asking the user the same information repeatedly). The above issues have also been widely observed by other researchers [4, 12] and practitioners.

2. Researchers [4,12] and practitioners. Token-level generation methods may not meet the efficiency requirements of industrial systems, especially when the decoding step is large. The long computation time of token-level generative models leads to unacceptable response delays for online dialogue systems, especially when the model-generated sentence length exceeds a threshold (e.g., T5 is 1,544 ms for a 30-word sentence, as shown in Figure 3 ). Due to latency issues, a large number of business requests may be suspended or blocked during peak hours. Furthermore, the computing resources (such as GPUs) required by the above systems may not be affordable for small companies.

To address the above two issues, this paper proposes a task-oriented dialogue system based on an action-level generative method. In [19], we represent responses by Dialogue Actions, a class of responses with unique and identical semantics, which can be automatically obtained by clustering. [19] directly regards the whole response as a specific dialogue action, we split a response into multiple response segments [6] and each segment can be mapped to a dialogue action. In this way, each response is represented as a sequence of dialog actions. Given a dialogue context, use a Seq2Seq model with an action-level recurrent decoder to generate dialogue action sequences. Furthermore, a frequency-based sampling method is used to compose the final responses from the generated dialog action sequences. Since the core component of our method is a generative model that takes dialog context as input and output actions, our method is named Dialog-T-o-Actions (abbreviated as DTA). Compared with existing token-level generation-based systems, our DTA has the following advantages: 1) reliability, since the generated natural language responses come from predefined dialogue actions; 2) efficiency, since the decoding space (i.e., dialogue actions ) and decoding steps are much smaller.

2 frame description

2.1 Overview

We follow the workflow previously used in end-to-end task-oriented dialog systems [2,9], where the system takes a dialog history as input and generates a text string that can be spoken to the user or an API call Human response (eg, information query, action execution, etc.). When the API is called, the information returned from the API will be incorporated into the next response from the system. An example of a dialog following such a system interaction lifecycle can be found in Figure 1(b). 

The key idea of ​​our work is to generate dialogue actions and then compose a spoken response. To this end, we first construct dialogue actions from large-scale dialogues (step 1), and represent each response as a sequence of dialogue actions (step 2), as shown in Figure 1(a). Use a Seq2Seq model with an action-level recurrent decoder to generate dialogue actions (step 3), and use the generated actions to further compose verbal responses (step 4). We take the after-sales customer service of e-bike rental as an example, where users and employees communicate online via SMS. The technical details are presented below.

2.2 The first step: dialog action construction

Dialogue acts refer to a group of utterances or utterance fragments that have the same semantics and represent a common communicative intention, such as making a request or querying information. However, an oversimplified setting, i.e. abstracting the entire utterance into an action, leads to relatively limited expressiveness and scalability. To make the responses more targeted and flexible, we construct dialogue actions based on (employee's) utterance segments rather than utterances. Specifically, each utterance is divided into multiple segments by a rule-based method [6]. Specifically, ConSERT [20] is used to generate representations for each utterance segment, and then K-means is used to cluster the segments. We choose the number of clusters ? empirically to balance the purity and number of clusters, and treat each segment cluster as a dialogue action (e.g., ?1 and ?2 in Fig. 1(a)).

2.3 Step 2: Response Normalization Response Normalization

Aims to normalize responses (from large-scale dialogues) by mapping each response to a sequence of dialogue actions. Following Yu et al. In [23], we utilize a retrieval-based approach that retrieves the most similar clustered segments to a given input utterance segment and labels the input according to the corresponding clusters. As shown in Figure 2, given an input segment ? we use BM25 to recall the top ? segment {?1,...,? ? } from all aggregated segments. Further, we developed a BERT-based text similarity calculation model. Reordering? Segmentation and selection.

2.4 Step 3: Action Sequence Prediction

2.5 Step 4: Generate Response

3 experiments

3.1 Experimental setup

3.2 Main results

3.3 In-depth analysis

4 Conclusion

In this paper, we propose a task-oriented dialogue system via action-level generation. An efficient framework is proposed to build generative models from large-scale conversations with minimal human effort . Experimental analysis shows that the system is able to solve the reliability and efficiency problems encountered by existing end-to-end generation methods. In the future, we are interested in exploring the unification of discrete modules in DTA into an integrated system in an end-to-end architecture.

5 Profile of the host

Moderator: Hua Yuncheng. He is an algorithm engineer at Meituan, focusing on researching and building dialogue systems.

6Company Profile

Meituan is a leading shopping platform in China that provides local consumer goods and retail services, including entertainment, dining, food delivery, travel and other services.

Xi Xiangyu (Xi Xiangyu) - Homepage

Guess you like

Origin blog.csdn.net/as472780551/article/details/130328366