TASLP21-Reinforcement Learning-based Dialogue Guided Event Extraction to Exploit Argument Relations

TASLP2021-Reinforcement Learning-based Dialogue Guided Event Extraction to Exploit Argument Relations

Paper: https://arxiv.org/abs/2106.12384

Code: https://github.com/xiaoqian19940510/TASLP-EAREE

Journal/Conference: TASLP2021

Summary

Event extraction is a fundamental task in natural language processing. Finding the roles of event arguments such as event participants is crucial for event extraction. However, doing so is challenging for real-life event descriptions, since the role of an argument often differs in different contexts. While the relationships and interactions between multiple arguments are useful for determining argument roles, existing methods largely ignore such information . In this paper, we propose a better method for event extraction by explicitly exploiting the relation of event arguments. We achieve this with a carefully designed task-oriented dialogue system . To model the argument relationships, we employ reinforcement learning and incremental learning to sample multiple arguments through a multiple iterative process . Our method exploits the knowledge of already sampled arguments of the same sentence to determine the role of arguments that are difficult to decide individually. It then uses the newly obtained information to improve decisions on previously drawn arguments. This two-way feedback process enables us to exploit argument relations to efficiently determine argument roles for better sentence understanding and event extraction. Experimental results show that our method consistently outperforms seven state-of-the-art event extraction methods in terms of event classification, argument roles, and argument identification.

1 Introduction

The purpose of event extraction is to extract all arguments corresponding to each event and their corresponding argument roles. There is a huge challenge in doing this, there are often multiple arguments for an event, and their roles may be different in different texts. As shown in Figure 1 troops, it plays different argument roles in the three sentences.

The key insight is that multiple arguments related to an event are usually strongly correlated . Also as shown in FIG. 1 , in S1 , useand weapons, can be used to troopspredict the pair, which improves the accuracy of the prediction. The relationship between arguments is very conducive to the extraction of events, but many works ignore this relationship. Previous work extracts arguments at the same time , or extracts sorted arguments sequentially , without considering the order of argument extraction .

As shown in Figure 2, this paper formulates the event extraction task in a person-oriented dialogue system. Transform the event extraction task into a slot filling task, extracting relevant arguments and their corresponding roles from the input sentence. To this end, a multi-turn dialogue system with two agents is developed to solve the slot-filling problem iteratively. In each round, an agent chooses an argument role and generates a query through a dialogue generator. This iterative generation and answering paradigm enables us to incorporate knowledge gained from previous rounds when drawing the current argument. While our multi-turn dialogue system provides potentially powerful event extraction, its full potential can only be fully unleashed if arguments are processed in the correct order. Since we draw arguments and determine their role in order by exploiting knowledge gained from previously drawn arguments, the order in which arguments are drawn is critical. Ideally, we would like to start with event arguments whose role might be accurately determined using the information already extracted, leaving the more challenging ones once we have gained enough information from the other arguments. to the back.

We address the challenge of argument extraction order by using reinforcement learning (RL) to rank arguments to optimally exploit argument relations . For RL to be able to navigate potentially large problem spaces, we need to find the correct representation for each word in the target sentence and use that representation to predict where each argument starts and ends. To this end, we use a lexicon-based graph attention network and an event-based BERT model to learn word representations from both semantic and contextual perspectives . We then use the learned representations to determine which arguments to draw and in what order. We further design an incremental learning strategy to iteratively incorporate argument relations into the multi-round event extraction process by continuously updating event representations across rounds . By doing so, the representation becomes increasingly accurate as the argument extraction process progresses, which in turn improves the quality of the generated arguments and event extractions.

The main contributions of this paper are:

  • To develop a multi-turn, task-oriented dialog-guided event extraction framework that aims to populate arguments extracted from input text for specific argument roles.
  • Use reinforcement learning to sort the order of argument extraction, and use argument correlation to event extraction.
  • Under the incremental learning framework, word representations are learned for event extraction using a lexicon-based graph attention network and event-based BERT.

2. Event extraction framework

Figure 3 outlines our framework, which consists of three components for (1) dual-view event representation, (2) ranking argument extraction, and (3) event classification. The argument extraction module automatically generates dialogs based on event types and selected predictive arguments. The selected arguments are generated by an incremental event learning method to add pseudo-labels as training data and attach pseudo-relations to the lexicographic graph. Pseudo-labels are the arguments predicted by our ranking argument extraction model. The pseudo-relationship is the argument role of the prediction argument.

We add pseudo-edges by adding an edge to the event type's prediction argument to update word representations with existing predictions. It takes sentences and event types as input. The event classification module detects whether an input sentence is an event, and classifies the event type to which the text belongs. We design a multi-task learning module to compute the combined loss of two tasks to overcome the low recall caused by the imbalance of event types. For different event types, different event patterns are designed to extract different arguments according to the pattern. Our framework is first trained offline using a small amount of labeled data. To expand the training data, we design a dialogue generation module to generate multiple question-answer pairs for each trigger word or argument to augment the data. The trained model can then be applied to extract event types and associated event arguments.

During the training phase, the reinforcement learning-based, dialog-guided argument extraction model learns how to extract event arguments by taking the target sentence and the label of the event type as input. Our framework will first learn several rounds of conversational argument extraction based on event types and sentences, and train an event classification model based on event arguments. In each round, the predicted arguments are provided as pseudo-relations in dictionary-based graphs and pseudo-labels in the role embeddings of event-based BERT used by the incremental event learning method. It updates the textual representation by adding pseudo-argument knowledge. The event classification model is then trained to predict event types using the pseudo-relational knowledge provided by the event extraction module.

During deployment, we use the trained model in an iterative process to perform event extraction. We will first predict event types with an event classification model, and then implement argument extraction based on the predicted event types. Therefore, the model ends up running for all predicted event types. To this end, we first use an event classification model to predict event types without pseudo-labels and relationships. Next, we use an argument extraction model to identify all arguments associated with the predicted event type. Then, we go back and again ask the event classification module to update the event types with the pseudo-labels and argument relations extracted by the argument extraction model. This two-stage iterative process uses the predicted event types to extract arguments, and the extracted information helps the event classification model improve its predictions.

2.1 Dual View Event Representation

The first step in our argument extraction pipeline is to learn a representation (or embedding) for argument selection. We first build a lexicon-based graph from which we learn lexicon-based representations for individual words. We then learn contextual representations of multiple words at the sentence level. Then, we use a lexicon-based graph attention network and an event-based BERT model to learn word representations from both syntactic and contextual levels.

Lexicon-Based Representations : Lexical-based graph neural networks have been designed for node classification tasks and proved to be an effective way to learn global semantics. Therefore, we use lexical knowledge to connect characters capturing local components, and use global relay nodes to capture long-range dependencies.

We convert sentences into directed graphs (as shown in Figure 4), where each word is represented as a graph node and graph edges represent one of five relationships: word-in-vocabulary; dictionary-to-dictionary; relay nodes, the Relay nodes connect to all nodes; co-occurrence words; and pseudo-relationships between arguments of events. The first word connects the words in the phrase in order until the last word. The second is to create a line between phrases, i.e. the last word of the current phrase is connected to the next phrase, and each edge represents the potential features of the words that may exist. We also use a relay node as a virtual hub, which connects to all other nodes. It collects information of all edges and nodes, disambiguates boundaries between words, and learns long-term dependencies. Therefore, the representation of relay nodes can be regarded as the representation of sentences. The fourth edge represents a pseudo-argument relationship by connecting the predicted arguments in the event. The last one is to calculate the co-occurrence probability of words within a sliding window in the corpus. Edge weights are measured by pointwise mutual information (PMI):
PMI ( wi , wj ) = log ⁡ p ( wi , wj ) p ( wi ) p ( wj ) = log ⁡ N wi . wj N s N wi N wj (1) \text{PMI} (w_i,w_j)=\log \frac{p(w_i,w_j)}{p(w_i)p(w_j)}=\log \frac{N_{w_i.w_j}N_s }{N_{w_i}N_{w_j}} \tag{1}PMI(wi,wj)=logp(wi)p(wj)p(wi,wj)=logNwiNwjNwi.wjNs(1)
N w i , N w j , N w i , w j N_{w_i},N_{w_j},N_{w_i,w_j} Nwi,Nwj,Nwi,wjis contained in single clause wi , wj , wi → j , i , j ∈ [ 1 , N ] w_i,w_j,w_{i \to j},i,j \in [1,N]wi,wj,wij,i,j[1,N ] the number of sliding windows. N s N_sNsis the total number of sliding windows in the corpus.

To learn word-level representations, we extend a lexicon-based graph attention network (LGAT), which aims to learn global semantics for node classification. We extend LGAT by adding pseudo-edges, i.e., by adding edges in the event-type prediction argument. Our goal is to use existing predictions to update word representations. Given NNN -word textT = { T 1 , T 2 , … , TN } T=\{T_1,T_2,\ldots,T_N\}T={ T1,T2,,TN} , the model embeds the initial input text through the pre-trained embedding matrixET = { ET 1 , ET 2 , … , ETN } ET=\{ET_1,ET_2,\ldots,ET_N\}ET={ ET1,ET2,,ETN} . Predicted event roleR = { R 1 , R 2 , … , RN } R=\{R_1,R_2,\ldots,R_N\}R={ R1,R2,,RN} is expressed asER = { ER 1 , ER 2 , … , ERN } ER=\{ER_1,ER_2,\ldots,ER_N\}IS={ E R1,E R2,,E RN} . The model takesEI = ET ⊕ ER EI=ET \oplus ERE I=ETER as input, where,⊕ \oplus indicates concatenation and is the textTTT generates hidden representationH n = { h 1 n , h 2 n , … , h N n } H^n=\{h^n_1,h^n_2,\ldots,h^n_N\}Hn={ h1n,h2n,,hNn} . hinh^n_ihinIndicates the nnthiion n hidden layersText features of i words. Therefore, the final node expression is:
LT i = f ( H ​​i N ) , i = 0 , 1 , 2 , … , N (2) LT_i=f(H_i^N),i=0,1,2,\ ldots,N \tag{2}LTi=f(HiN),i=0,1,2,,N( 2 )
whereLT 0 LT_0LT0is a relay node for predicting event types. It is then used to obtain optimal decisions via conditional random fields (CRF). The probability of a label sequence is y ^ = c ^ 1 , c ^ 2 , … , c ^ k \hat y=\hat c_1,\hat c_2,\ldots,\hat c_ky^=c^1,c^2,,c^k,可以定义如下:
p ( y ^ ∣ s ) = exp ⁡ ( ∑ i = 1 N ϕ ( c ^ i − 1 , c ^ i , L T ) ) ∑ y ′ ∈ L ( s ) exp ⁡ ( ∑ i = 1 N ϕ ( c i − 1 ′ , c i ′ , L T ) ) (3) p(\hat y|s)=\frac{\exp (\sum_{i=1}^N \phi(\hat c_{i-1},\hat c_i,LT))}{\sum_{y' \in L(s)} \exp (\sum_{i=1}^N \phi (c_{i-1}',c_i',LT))} \tag{3} p(y^s)=yL(s)exp(i=1Nϕ ( ci1,ci,LT))exp(i=1Nϕ (c^i1,c^i,LT))( 3 )
whereL ( s ) L(s)L ( s ) is the set of all arbitrary label sequences.
ϕ ( c ^ i − 1 , c ^ i , h ) = W ( ci − 1 , ci ) ci T + b ( ci − 1 , ci ) (4) \phi(\hat c_{i-1},\ hat c_i,h)=W_{(c_{i-1},c_i)}c_i^T + b_{( c_{i-1}, c_i)} \tag{4}ϕ (c^i1,c^i,h)=W(ci1,ci)ciT+b(ci1,ci)(4)
其中 W ( c i − 1 , c i ) ∈ R k × N , b ( c i − 1 , c i ) ∈ R k × N W_{(c_{i-1},c_i)} \in \mathbb{R}^{k \times N},b_{(c_{i-1},c_i)} \in \mathbb{R}^{k \times N} W(ci1,ci)Rk×N,b(ci1,ci)Rk × N is the labelci − 1 c_{i−1}ci1and ci c_iciUnique weight and bias parameters, kkk is the number of event types,NNN is the input length of the text.

Event-Based Contextual Representation : BERT is a multi-layer bidirectional Transformer that achieves significant performance improvements on event extraction tasks. We use the BERT model to learn contextual representations. Specifically, we feed sentence text into an event-based BERT model toT , the predicted event roleRRR and ProxyAAA的设计EA = { EA 1 , EA 2 , … , EAM } EA=\{EA_1,EA_2,\ldots,EA_M\}EA={ EA1,EA2,,EAM} to encode. We learn agent AAby adding a self-attention mechanismA (expressed asSA SAS A ) and the input text (denoted asST STST ), thus extending BERT.

Final representation : We concatenate the dictionary-based and event-based representations to produce the final representation CT i CT_iCTi, will be used for argument extraction:
CT i = LT i ⊕ ST i (5) CT_i=LT_i \oplus ST_i \tag{5}CTi=LTiSTi(5)

2.2 Argument Extraction Sorting

Given an event type, our argument extraction component aims to generate high-quality dialogue content by sorting the argument extraction order and utilizing historical dialogue content. It consists of four main modules: dialogue generation, argument extraction, incremental event learning, and reinforcement learning-based dialogue management. The dialog-guided argument extraction model automatically extracts arguments by inputting actual event types and text. The incremental event learning module then adds pseudo-labels as training data and attaches pseudo-relations to the dictionary-based graph. After generating the event representation of the target sentence, we follow a dialogue-guided strategy for argument extraction. Specifically, agent A uses the question set to generate queries (or questions) about events or selected arguments (e.g., “What is the trigger word for event X?”). Agent B then answers the query by predicting the argument role or event type. Based on the answer, agent A will generate a new query in the next round of argument extraction. This iterative process is driven by an RL-based dialog management system designed to optimize the order of argument extraction. The answer given by agent B will also be fed into the incremental event learning module described later to update the previous answer for the next round of argument extraction. Later in Table III we give examples of dialogues produced by our method. The automatically generated dialogue yields additional information to be used in the next round of argument extraction. Our approach is flexible enough to allow customizing the event extraction framework to a specific domain by populating the problem set with domain knowledge.

Dialog Generation : Our dialog generation module uses two agents (A and B) to facilitate event extraction through a series of question-answer dialogs. Here, agent A generates the dialogue content according to the currently handled role. For each current role, it generates a question set [7] to create more training data for argument extraction. For example, when agent A's goal is to generate dialogues for the argument role "Instrument", we choose a pre-designed question set template to generate dialogues. All we need to do is populate the argument roles for the given template. Agent A generates a mix consisting of the current argument role and the extracted arguments. Agent B then generates content, including predicted arguments given by argument extraction (described in the next paragraph). The predicted arguments are then fed into agent A to generate a new dialogue for the next round of argument extraction. Similar to Agent A, Agent B also provides a template for dialog content generation, and only needs to populate that template with the predicted arguments. If the predicted argument satisfies the confidence condition, it will be part of the content of agent A. The content of agent B will also be fed into the incremental event learning module described later to add high confidence results for the next round of argument extraction.

Argument extraction : Agent B responds to queries by filling the answer slots of simple answer templates designed for each specific question template. It does this by using the learned representation CT i CT_iCTito locate the start of the argument in the target sentence ( is i_sis) Japanese tie ( ie i_eie)Location. Specifically, we obtain the word probability for a selected argument as:
P start ( r , t , k ) = exp ⁡ ( W rs CT k ) ∑ i = 1 i = N exp ⁡ ( W r CT i ) (6) P_{start}(r,t,k)=\frac{\exp (W^{rs} CT_k)}{\sum_{i=1}^{i=N} \exp (W^{r } CT_i)} \tag{6}Pstart(r,t,k)=i=1i=Nexp(WrCTi)exp(WrsCTk)(6)

P e n d ( r , t , k ) = exp ⁡ ( W r s C T k ) ∑ i = 1 i = N exp ⁡ ( W r e C T i ) (7) P_{end}(r,t,k)=\frac{\exp (W^{rs} CT_k)}{\sum_{i=1}^{i=N} \exp (W^{re} CT_i)} \tag{7} Pend(r,t,k)=i=1i=Nexp(WreCTi)exp(WrsCTk)(7)

Among them, W rs W^{rs}Wrs W r e W^{re} Wre isCT k CT_kCTkA vector that maps to a scalar. Each event type ttt has a type-specificW rs W^{rs}Wrs W r e W^{re} Wre . argument as argument rolerrr and event typettThe probability of description (or answer) of t P span ( r , t , ais , ie ) P_{span}(r,t,a_{i_s},i_e)Pspan(r,t,ais,ie)如下所示:
P s p a n ( r , t , a i s , i e ) = P s t a r t ( r , t , i s ) × P e n d ( r , t , i e ) (8) P_{span}(r,t,a_{i_s},i_e)=P_{start}(r,t,i_s) \times P_{end}(r,t,i_e) \tag{8} Pspan(r,t,ais,ie)=Pstart(r,t,is)×Pend(r,t,ie)( 8 )
Incremental learning: Our incremental argument learning module tries to incorporate information obtained in the current round of argument extraction to extract new arguments in the next round. We provide additional information for extracting new arguments by adding to the input text drawn argument roles (i.e., pseudo-labels) whose reward (as assessed by RL) is greater than a configurable threshold. We also add a new edge (i.e. pseudo-relation) to connect the extracted arguments in the dictionary-based graph of events, so that we can update the dictionary representation used in the next round of dialogue.

Dialogue Systems Based on Reinforcement Learning : In an iterative, dialogue-guided argument extraction process, we use RL to optimize the order of argument extraction.

dialogue action . Dialogue-guided event extraction methods define actions as a set of event patterns and arguments. This shows that reinforcement learning algorithms need to determine the role of arguments from the current argument to the next argument. Different from previous reinforcement learning based methods, we design two agents with different action spaces. For agent A, action a A a_AaAis a role in the event schema, which is the action of agent A. For agent B, action a B a_BaBis the argument, which is the action of agent B. But it must be determined whether the event type of the current dialogue turn needs to be converted to the next event type. This means that our method can well determine changes in event types.

dialog state : at time step ttThe state defined by t S t ϵ S S_t \epsilon SStϵS通过 S t = ( R t , c t A , c t B , Q 0 , H C ) S_t=(R_t,c^A_t,c^B_t,Q_0,H^C) St=(Rt,ctA,ctB,Q0,HC )to characterize. Among them,R t R_tRtare the argument roles selected by reinforcement learning-based dialogue management, ct A , ct B c^A_t,c^B_tctA,ctBis the current embedding of agents A and B, Q 0 Q_0Q0is agent A's initial question about event types, HCH^CHC represents the history of the conversation. Different history sessionsHCH^CHC contributes differently to target argument extraction. states A s^AsA ands B s^BsB depends on history, dialogue on history andTTT to encode. s A s^AsA ands B s^BsB is from the most recent conversationst − 1 A s^A_{t-1}st1Aand st − 1 B s^B_{t-1}st1B, and the current content is embedded in ct A , ct B c_t^A,c_t^BctA,ctBConcatenation:
st A = st − 1 B ⊕ ct A (9) s_t^A=s_{t-1}^B \oplus c_t^A \tag{9}stA=st1BctA(9)

s t B = s t − 1 A ⊕ c t B (10) s_t^B=s_{t-1}^A \oplus c_t^B \tag{10} stB=st1ActB(10)

Dialogue Policy Networks : Policies are choosing the right actions for role selection. A policy network is a parameterized probability map in the action space and beliefs, aiming to maximize the expected cumulative reward.
π θ ( a A , a B ∣ s A , s B ) = π ( a A , a B ∣ ( s A , s B ) ; θ ) = P ( at A = a A , at B = a B ∣ st A = s A , st B = s B , θ t = θ ) (11) \pi_{\theta}(a^A,a^B|s^A,s^B)=\pi (a^A, a^B|(s^A,s^B);\theta)=\mathbb{P}(a_t^A=a^A,a_t^B=a^B|s_t^A=s^A,s_t^ B=s^B,\theta_t=\theta) \tag{11}Pii(aA,aBsA,sB)=π ( aA,aB(sA,sB);i )=P(atA=aA,atB=aBstA=sA,stB=sB,it=i )( 11 )
whereθ θθ is a learnable parameter denoting the weights on our dialogue policy network. Dialog policy network decides fromTTSelect action in T. It consists of two networks. The first network is a feed-forward network used to encode the dialogue history, implemented using the softmax function. The second network is a BiLSTM for encoding the conversation historyH t C = ( H t − 1 C , at − 1 , st A , st B ) H^C_t=(H^C_{t−1},a_{t −1},s^A_t,s^B_t)HtC=(Ht1C,at1,stA,stB) , which is a continuous vectorH t CH^C_tHtC H t C H^C_t HtCis an observation, and at − 1 a_{t−1}at1is an action sequence, updated by BiLSTM.
e T = softmax ( WTF ( s A , s B ) + b T ) (12) e_T=softmax(W_TF(s^A,s^B)+b_T) \tag{12}eT=softmax(WTF(sA,sB)+bT)(12)

h t C = B i L S T M ( h t − 1 C , [ a t − 1 ; s t A , s t B ] ) (13) h_t^C=BiLSTM(h_{t-1}^C,[a_{t-1};s_t^A,s_t^B]) \tag{13} htC=BiLSTM(ht1C,[at1;stA,stB])(13)

W T , b T W_T,b_T WT,bTis the parameter, F ( st ) F(s_t)F(st) is the state vector,e T e_TeTis the input sentence TTThe vector of T ,ht − 1 C h_{t-1}^Cht1Cis t − 1 t-1tRepresentation of 1 round of exchanges,at − 1 , st A , st B a_{t-1},s_t^A,s_t^Bat1,stA,stBis the action representation, the current state representation of agents A and B.

Dialogue return : the dialogue management module saves the historical dialogue. For a specific event type, the search space is limited. We design a reward function to evaluate all actions. Reward R ( s B , a A , a B ) R(s^B,a^A,a^B)R(sB,aA,aB )Subject to partial relevance.
R ( s B , a A , a B ) = ∑ i P span ( ri , t , ais , ie ) (14) R(s^B,a^A,a^B)=\sum_{i} P_{ span}(r_i,t,a_{i_s,i_e})\tag{14}R(sB,aA,aB)=iPspan(ri,t,ais,ie)( 14 )
Reward signals can be used to efficiently optimize policy agents. Note that we simultaneously identify all remaining arguments with rewards smaller than a threshold in the final round to avoid error propagation. The threshold in our model is 0.75.

2.3 Event Classification

Event classification detects whether an input sentence is an event and classifies the event type to which the sentence belongs. Each sentence is fed into a dictionary-based graph neural network model and an event-based BERT model to learn global knowledge and contextual knowledge of sentences, respectively. The event classification model detects which types of events are contained in a sentence by adding knowledge of pseudo-argument relations. If the statement contains no events, NULL is output and subsequent modules are not executed. making it possible to distinguish forecast errors.

We use fully connected layers to compute context-aware utterance representations yi y_iyi
y i = RELU ( W ( L T i ⊕ S T i ) + b ) (15) y_i=\text{RELU}(W(LT_i \oplus ST_i)+b) \tag{15} yi=RELU ( W ( LT _iSTi)+b)( 15 )
Among themWWW andbbb is a trainable parameter, and⊕ \oplus represents vector concatenation operation. For the event classification subtask, ReLU activations are used to enforce sparsity. To improve event classification performance, we design an additional task to predict the number of event types. Our multi-task event classification model computes the combined loss of the two tasks to overcome the low recall caused by the event type imbalance.

Trigger word recognition : Existing event classification is based on trigger word recognition to identify event types, but our method identifies event types directly from input sentences. Therefore, when we evaluate the performance of trigger word classification, we use the trigger words predicted in the previous ranking argument extraction. We concatenate the predicted trigger words and input sentences to classify event types.

Multi-task joint loss for event classification : A multi-task joint loss function estimates the difference between predictions and ground truth. We design two tasks to learn the difference between prediction errors. For the event classification task, we use the cross-entropy loss function, defined as:
LT = − ∑ iyti log ⁡ ( y ^ ti ) + ( 1 − yti ) log ⁡ ( 1 − y ^ ti ) (16) L_T=- \sum_ {i} yt_i \log(\hat yt_i) +(1-yt_i) \log(1-\hat yt_i) \tag{16}LT=iytilog(y^ti)+(1yti)log(1y^ti)(16)
y t i , y ^ t i yt_i,\hat yt_i yti,y^tiRespectively, the ii in the event classification taski real labels and predicted labels. For quantitative prediction of types, we use mean square error with second normal form:
LN = ∑ i ∥ y ^ li − yli ∥ 2 + η ∥ Θ ∥ 2 L_N=\sum_{i} \parallel \hat yl_i -yl_i \parallel^2 + \eta \parallel \Theta \parallel_2LN=iy^liyli2+ hTh2
y ^ l i , y l i \hat yl_i,yl_i y^li,yliis the predicted label and the true label under the second task. Θ , η \Theta,\etaTh ,ηdB is a model parameter and a regularization factor. The total loss function is:
L = λ 1 LT + λ 2 LN (18) L=\lambda_1 L_T +\lambda_2 L_N \tag{18}L=l1LT+l2LN(18)

3. Experiment

Experimental results :

Ablation experiment :

4. Summary

We propose a new method for event extraction that exploits event-argument relations. We address this problem within a task-oriented dialogue guidance framework designed to extract events. Our framework is driven by reinforcement learning. We use RL to decide the order in which to extract sentence arguments, aiming to maximize the likelihood of successfully inferring argument roles. We then use the already extracted arguments to help resolve arguments whose roles are difficult to resolve by considering the arguments in isolation. Our multi-turn event extraction procedure also uses newly obtained argument information to update decisions on previously extracted arguments. This two-way feedback process allows us to exploit the relationship between event arguments to classify the role of independent variables in different textual contexts. We evaluate our method on the ACE 2005 dataset and compare it with seven previous event extraction methods. Experimental results show that our method can enhance event extraction and outperform competing methods in most tasks. In the future, we plan to improve multi-semantic representations for dialogue-guided event extraction by introducing commonsense knowledge.

a

Guess you like

Origin blog.csdn.net/qq_45041871/article/details/130731642