Knowledge Extraction - Extraction events

Event extraction

The event is to promote the state of things and relationships change conditions. Most of the current existing knowledge resources (such as Wikipedia, etc.) the relationship between the entity and the entity description is static, and the event can be described in greater granularity, dynamic, structured knowledge is an important supplement existing knowledge resources .

Compared with the relation extraction, extracting the same event need to extract the predicate (the predicateA) and corresponding arguments (event elements) from the text, but the difference is, binary relation extraction problem (binary), and usually two arguments appear in the same sentence, and the event extraction difficulty is that there are multiple arguments and modifiers (modifiers), may be spread across multiple sentences, and some arguments are not necessary (in any given instance event Some of these will be omitted), which makes the bootstrapping / distant learning / coreference becomes very difficult.

Event extraction tasks can be divided into two categories :

Event identification and extraction

Identifying and extracting event information and presented in a structured form, including time, place of occurrence, and the roles involved in changing state of the associated action or event information from the description text.

Event detection and tracking

Event detection and tracking is designed to flow text news events are organized according to their reports, multiple sources for the traditional news media monitoring provides the core technology to allow users to understand the news and its development. Specifically, the event discovery and tracking consists of three main tasks: segmentation, detection and tracking, will be broken down into text news events, the discovery of new (unforeseeable) event, and track the development of previously reported events.
Event discovery tasks can be divided into historical events and online events found found in two forms, the former goal is to find an event not previously identified from news documents sorted by time, the latter is the real-time news stream real-time discovery of new event.

This paper focuses on event recognition and extraction. First, look at the related core concepts:

  • Event Description (Event Mention):
    description of the event phrase / sentence / sentence group, contains a trigger (trigger) and any number of arguments

  • Event Trigger (Event Trigger):
    event description vocabulary that best represents the incident, decided an important feature of the event category, usually a verb or noun

  • Event element (Event Argument)
    important information about the event, or is an entity described in (entity mention), mainly composed of entities, attributes, values, etc. express the full semantics of fine-grained unit

  • The role of elements (Argument Role)
    event elements play a role in the incident, semantic relations event elements and events can be understood as slot

  • The type of event (Event Type)

Event recognition and extraction of understanding

Intuitively, the event can be understood as the extraction task to find a specific event category from the text, and then the process of filling out forms.

Here Insert Picture Description
Extraction system-defined event

Given a text document, an event extraction system should predict event triggers with specific sub-types and their arguments for each sentence.

Given a text document, an event extraction system must produce the predicted event trigger word for each sentence, each word contains a trigger event specific sub-type of the event and its specific elements.

In other words, the most basic task of extracting events section includes:

  • Word recognition event triggers and event type
  • Event extraction element (Event Argument) at the same time determine its role (Argument Role)
  • Extraction phrase or sentence that describes the event

Of course, there are other sub-tasks include labeling event attributes, event coreference resolution and so on.

Event extracted mostly carried out in stages, usually starting from the trigger classifier (classifier trigger), if there are trigger, the trigger and the context in which it classifies as a feature to determine the type of event, then the next step of the argument classifier (event elements Ferre its ), the sentence for each entity mention (entities involved) to classify, determine whether the argument (event element), and if so, determine its role slot.

The method based on pattern matching

MUCs the beginning, the event rules artificial extraction systems are written based on the syntax tree or a regular expression, such as CIRCUS (Lehnert 1991), RAPIER (Califf & Mooney 1997), SRV (Freitag 1998), AutoSlog (Riloff 1993) , LIEP (Huffman 1995), PALKA (Kim & Moldovan 1995), CRYSTAL (Soderland et al. 1995), HASTEN (Krupka 1995) and so on, then, the model has slowly supervised learning, the ACE stage, large most systems are based on supervised learning, but because of the consistency of the labeling issue, generally poor performance of the system, ACE event extracting held only once, in 2005.

First look at the following extraction template-based approach, is to be identified by the basic syntax (syntactic) and semantic constraints (semantic constraints).

Based on manual annotation corpus

In the early beginning, template creation process is usually from a large set of labels, generate templates based entirely on the manual annotation corpus, learning effect is highly dependent on the quality manual annotation.

AutoSlog (Riloff)
basic assumptions:
. A at the first mention of the event elements to determine the relationship between the elements and the event
b surrounding the event element statement contains a description of the role of event element in the event of.

To create extraction rules by supervised learning and manual review. By training data has been filled in a good groove (filled slot), AutoSlog syntactic parse structure near the slot, the extraction rule is formed automatically, since the template generated by this process too-general, it is necessary to review artificial. Formed essentially a dictionary.
for example

Ricardo Castellar, the mayor, was kidnapped yesterday by the FMLN.
Mayor Ricardo Castelar (Ricardo Castellar) yesterday kidnapped FMLN.

Ricardo Castellar hypothesis is marked became the victim (victims had been labeled), AutoSlog based on parsing judge Ricardo Castellar is the subject, and then trigger the relevant rules of the subject (subj) passive-verb, the word in a sentence related to filling into to give a rule (victim) was kidnapped, so after the text, as long as Kidnapped present in a passive configuration, which corresponds to the subject will be marked victim.

Hiring

Basic assumptions: language expression in specific areas of high frequencies is countable

Frame mode and semantic structures to represent a phrase in a specific field extraction mode, through the integration of the WordNet semantic information, can be obtained nearly pure artificial Palka extraction effect in specific areas.

Based on weak supervision

Manual annotation time-consuming, and there is consistency, and weak supervision method does not require corpus fully dimensioned, only artificial corpus of certain pre-sorting or seeds developed templates automatically by the machine according to the pre-classification corpus or seed template the mode of learning.

  • TS-AutoSlog
    Riloff and Shoen, 1995
    AutoSlog-TS does not require labeling text, only need a good pre-classified training corpus, the category is related to the field or not relevant. Process is to go over the corpus, for each noun phrase (based on syntactic analysis to identify) are generated corresponding extraction rules, and then go over the whole corpus, some relevant statistical data generated for each rule, the basic idea is not related to the text compared extraction rules in the relevant text more often are more likely to be good extraction rules. Suppose the training data associated with an unrelated text ratio is 1: 1, the ratio of relevance rate calculating a correlation of each of the extraction rules generated, rule the number of instances of the number of instances in which rules appear related documents / occurs throughout the corpus, the relevance rate <50% of the extraction rules are discarded, the remaining rules in the form of descending sorted relevance_rate * log (frequency), and then manually reviewed.

  • The TIMES
    Chai and Biermann, 1998
    introduced the concept of independent knowledge WordNet art, the generalization ability to enhance learning mode, and manually or word sense disambiguation rules, so that the final pattern is more accurate

  • NEXUS
    Piskorski et. Al., 2001; Tanev et. Al., 2008
    corpus pretreated with clustering

  • GenPAM
    Jiang, 2005
    in the learning process by the special case of generalized pattern generated, the similarity between the WSD achieve effective utilization mode, minimize manual intervention and system workload

summary

Pattern matching method based on better performance in specific areas, the knowledge representation simple, easy to understand and subsequent applications, but has a different degree of dependence, poor coverage and portability for language, art forms and documents.

The method of pattern matching, the template accuracy is an important factor affecting the performance of the overall process. In practice, the pattern matching method is widely used, mainly characterized by high accuracy low recall rate, to improve the recall rate, it is necessary to establish a more complete template library, and second, semi-supervised method can be used to build trigger dictionary.

Based on statistics - Traditional machine learning

Built a statistical model based on event extraction method can be divided into two major categories of pipeline and joint model.

Pipeline

The event extraction task into a multi-stage classification problem (pipeline extraction), you need to perform the following classification order:

  1. Event trigger word sorter (Trigger Classifier)
    to determine whether an event trigger word vocabulary, and event category
  2. Element classifier (Argument Classifier)
    phrases is whether the event element
  3. The role of elements classifier (Role Classifier)
    determine the role category elements
  4. Property classification (Attribute Classifier)
    determination event properties
  5. Reportable classifier (Reportable-Event Classifier)
    determine reportable event instance exists
Released eight original articles · won praise 14 · views 469

Guess you like

Origin blog.csdn.net/qq_39304851/article/details/103875727