Getting to extract detailed overview of the event (Event Extraction) & ACE2005 data set

Event extracted definitions

  • event

As a form of information, which is defined as a specific person, the objective facts interactions at a particular time and a specific place, it is generally the sentence level.

  • Constituent elements

    Each of the elements of the event include: a trigger word, event type, argument and argument roles.

    Event trigger word (the Trigger Event) : represents the core word events, mostly verb or noun;
    Event Type : ACE2005 defines eight event types and 33 subtypes. Among them, most events are used to extract 33 kinds of event types. Event recognition is based on 34th word (33 class Event Type + None) multivariate classification task, the role of classification is based on the 36th word of the (35 class role type + None) multivariate classification task;
    event argument (Event argument) : Event the participants, mainly by the entity, value, time components. Value is a non-event participant entities, such as jobs;
    thematic roles : Event on the Role dollars as in the event. A total of 35 class role, for example, the attacker, the victim and so on.

Understand the definition

Event extraction is not unknown to extract events from unstructured text, but has defined the type of event after 34
looking from the text event trigger word (the Trigger Event) , to match the event
and pre-defined event template each role ( argument roles ), find its corresponding entity.

For example, Xiao Ming attack the red
by long defined template

Attacks
including the attacker & attacker & trigger word (attack, hitting, etc.)

By attacking containing attacks sentence determination: matching event
then Bob corresponds to the attacker
red correspond to the attacker : Event argument (event argument) matches

data set

ACE2005 data set

ACE event is defined by the events, but also the use of ACE data set of natural events to extract data sets - ACE2005 data set

ACE2005 database to solve the basic tasks 3 - entity recognition, the value of the event expression, relationships and events

Structured as follows:

1P: data subject to first pass (complete) annotation
1P: 须先通过(完整)注释的资料
DUAL: data also subject to dual first pass (complete) annotation
DUAL:数据也服从对偶第一遍(完整)注释
ADJ: data also subject to discrepancy resolution/adjudication
ADJ: 资料也有经争议解决/裁定
NORM: data also subject to TIMEX2 normalization
NORM: 数据也要服从TIMEX2标准化 

In simple terms, each data label should be carried out in two ways, namely 1p marked, and labeled DUAL both marked the same result naturally think properly marked, labeled after the arbitration ruling by a different form ADJ information.

ACE2005EDC data set

EDC representatives drawn event

ACE2005EDC data set for the event text contains a type of event, event trigger word, event yuan on the role played in the incident have been labeled.

It includes English, Chinese, Arabic three languages

In addition to ACE2005EDC data set, I have not found a label containing the event argument plays a role in the event of a data set.

Obtaining a data set

ACE2005 data sets for a fee, can be purchased at the LDC league's official website

LDC alliance -ACE2005

Buying process is quite complex, first in the name of the organization to join LDC, membership fees

  • Nonprofit Organizations: 2400 USD / Year
  • Profit organizations: 24,000 US dollars / year

LDC owns rights organization administrator account, you can pull into account other LDC issues, shared datasets acquired

Become a member to purchase a variety of data sets, when the members of that year's free use of the data set, not the members still have the right to use the year of data sets, each of which has other data sets offer.
ACE2005 data sets offer $ 4000.

Event extraction method

(Elaborate on the general idea, almost see-known link below)

Generally speaking, the basic task of the event can be extracted with four sub-task decomposition:

事件触发词检测 Event (trigger) detection

事件触发词分类 Event trigger typing 

事件论元识别 Event Argument Identification

事件论元角色识别 Event Argument Role Identification

Note: Event Argument different translation, this translates event Argument

In 2015 and before, to extract event (Event Extraction, EE) focuses primarily on the work of thinking pattern matching or statistical machine learning methods.

The method based on pattern matching can achieve good performance in specific areas, but less portable;
usually better portability based on statistical learning method, but is heavily dependent on the labeled data.

From 2015, researchers try to use the CNN / RNN (neural network) to extract Event Mention semantics, a typical example for DMCNN with JRNN peer model, evaluation results than earlier some of Structure-Based Method significantly improved. Another benefit of using DNN to capture the semantics is the use of the inherent features of a more extensive word vector, so the event extraction results are no longer largely dependent on local / global characteristics defined manually.

Pipelined Approach & Joint Approach

All sub-tasks independently regarded as the ideological classification is called Pipelined Approach, will build several different models based on the idea of such a method (or use the model with a slight modification of the order applied to each sub-problem) to turn to solve .
The biggest drawback of this method is Error Propagation : From Intuitively, if an error has occurred on the first step Triggers recognition, recognition accuracy of Arguments then will be lower. Nevertheless, the use of Pipelined Approach demarcation way to simplify the task of extracting the entire event, so it is widely used. Pipeline method more classic dynamic multi-pooling convolution model (DMCNN) 15 years proposed.

Another corresponding research model is an attempt to build a model information all at the same time extracting more, that is Joint Approach. Such methods only goal is to establish a model for both extraction Triggers and Arguments, the big advantage of such a method is able to generate two-way flow of information and interaction between Triggers Arguments (Pipeline information can only flow from the Triggers Arguments) before DNN method of application, the best performance is [Li et. al.] structured perceptron model is proposed, in 16 years from the [Nguyen et. al] proposed JRNN model, RNN applied to event extraction task.

Missing Data Set

Although researchers have designed on the model spent a great deal of thought, but there is a lie can not be ignored on the task of extracting the event: namely 数据集的缺失.
Current Events extract data sets of the most widely used is [ACE, 2005]. ACE data set to, for example, only the data in its entirety from the English document 599, 33 event types defined number of types of samples had more than 60% of not more than 100, or even three types of samples no event more than 10 the fundamental reason is that the data is sparse use of time and manpower to manually annotate text cost costly. Therefore, there is a growing scholars began to study the enhancement of the data set, such as the use of semantic knowledge outside the framework of the automatic annotation data, semi-supervised learning to use the information cluster labeling, these methods focus on the automatic annotation of data to improve generalization of the model. More directly, in addition to some scholars try to overcome the problem of sparse data directly from a modeling point of view, such as the use of Zero-Shot Transfer Learning ways to enhance the effect of model predictions for unknown event types.

About the event extraction recent progress as well as more classic model approach, here to give a better answer to know almost link
Zhang Chengcheng to know almost to the answer

Released eight original articles · won praise 14 · views 466

Guess you like

Origin blog.csdn.net/qq_39304851/article/details/104694179