Technology Trends | Key cutting-edge technology for event graph schema generation: how to generate Event Schema based on language model to represent work interpretation...

Reprint public account | Lao Liu said NLP


Friends who have done event extraction know that event schema is a very important existence, and event schema is also called event mode.

The event schema provides a conceptual, structural and formal language to represent events and model knowledge of world events.

However, in the actual research and development process, we will find that due to the openness of real-world events, the diversity of event expressions, and the scarcity of event knowledge, it is difficult to automatically generate high-quality and high-coverage event patterns.

1f627c1363e9a9706ada07038c0b76b3.jpeg

Recently, the work of "Harvesting Event Schemas from Large Language Models" is very interesting. This work proposes a new event pattern induction paradigm, acquires knowledge from large-scale pre-trained language models, and designs an event pattern collector (ESHer ), to automatically induce high-quality event patterns through context-generated conceptualization, confidence-aware pattern structuring, and graph-based pattern aggregation.

This article introduces the work for your reference.

1. Background

An event is one of the basic units for humans to understand and experience the world, and an event is a specific event involving multiple participants, such as bombings, elections, and marriages.

To represent events and build a knowledge model of world events, the event schema provides a conceptual, structural, and formal language that can describe the types of events, the semantic roles (slots) of specific events, and the relationships between different events.

Due to its importance, it is crucial to automatically discover and construct large-scale, high-quality, and high-coverage event patterns, namely, event pattern induction.

However, due to the openness of real-world events, the diversity of event expressions, and the scarcity of event knowledge, event pattern induction is a non-trivial task.

First, in real-world applications, the scale of event types is very large, and new event types are constantly emerging. To solve this open problem, event patterns need to be automatically generated with high coverage in different domains.

Second, events are often expressed in very different natural language sentences, so different event expressions are regulated by conceptualizing and structuring them into formal event schemas.

Finally, due to the economic principles of language, event representations are mostly incomplete and many event arguments are missing. In order to solve this sparsity problem, inductive methods of event patterns must aggregate scattered events.

However, so far, most of all event patterns are still manually designed by human experts, which is expensive and labor-intensive, typically including MUC, ACE and TAC-KBP.

On the other hand, traditional automatic event pattern induction methods still cannot overcome the challenges of openness, diversity and sparsity.

For example:

A bottom-up approach to concept linking. Eliminate coverage of event types/slots by parsing and linking event expressions with external schema sources, such as FrameNet, which is limited by the quality and coverage of external schema resources.

Top-down clustering methods. Cluster event expressions according to predetermined pattern templates (such as 5W1H templates, or templates with predetermined number of event types/slots), which are highly constrained by the predetermined templates.

a69920862f4967039831ae2c2709e99b.png

2. The composition of Event Schema Harvester

Formally, given an unlabeled corpus C and a PLM, event pattern induction methods can discover event clusters Y = {y1, y2, ..., yN}, where N is the number of event types found.

For each event cluster y, it is automatically conceptualized as a name t and its corresponding semantic roles {st1,st2,...}, where et ∈ T, s ∈ S and T/S are open-domain event types/slots The name.

The Event Schema Harvester (ESHer) is designed, and its framework is shown in Figure 2.

294247796e05d5926adf578b155fa31c.png

ESHer consists of three parts: 

1) Textual conceptualization based on context generation, transforming different event expressions into conceptualized event patterns based on contextual presentation;

2) Confidence score based pattern structuring to structure event patterns by selecting and associating event types with their significance, reliability, consistency;

3) Graph-based pattern aggregation, which aggregates scattered sparse event knowledge into a single event pattern through graph-based clustering to form a complete event pattern.

3. Text Conceptualization Based on Context Generation

7a81debb020d9f8d7dbb778cc5f284d5.jpeg

This link converts different event expressions into conceptualized event patterns based on contextual presentations through context-generated textual conceptualization;

Formal events are often expressed in different natural languages, which pose serious challenges for schema induction.

For example, "terrorist attacks have killed more than 35,000 people" and "more than 35,000 deaths were caused by terrorist attacks" express the same event, but with completely different words and syntactic structures.

To address this, different events are conceptualized as pattern candidates, which can distill event pattern knowledge and represent them uniformly.

For example, the event types and semantic roles in the above two examples will be distilled into the same schema "Type: die, Slots: agent; attacker; instrument; victim".

In this section, an unsupervised text-to-pattern framework is proposed using the context generation capability of PLM.

Specifically, textual conceptualization is modeled as a generative process in context:

→ Schema。

Among them: Demonstrations means demonstration, "demonstration" is a list of examples, used to guide PLM how to conceptualize text into patterns, each demonstration is a <text, pattern> pair, expressed as "text → pattern".

Here "text" is the event corpus one wants to conceptualize, "pattern" is the schema of the conceptualization, expressed as "type: t, slot: st1; st2...", and "→" is a special token that combines text and events mode separately. 

In terms of data composition, this work directly samples from existing human-annotated event datasets (such as ACE, DuEE), and in addition, in order to invoke more event knowledge from one instance, n pattern candidates are generated for each text c1, c2, ...cn, where n is a hyperparameter.

4. Schema Structuring Based on Confidence Scoring

d4be7b40858bfca76170ceae15f4d472.jpeg

The text-to-schema component distills and conceptualizes event knowledge in different representations.

This session aims to address how these conceptualized event patterns can be structured by selecting and evaluating salient, reliable, and consistent slots for each event type.

For example, a "die" event frame can be structured by evaluating the association between the event type "die" and the slot "agent;attacker;instrument;victim". 

Formally, as shown in Algorithm 1 below, O is used to denote the result of textual conceptualization.

31a05d06f672666b25c9884e0eea0d19.png

where the jth instance is (textj,{cj1,cj2,...cjn}) and {cj1, cj2,...cjn} are n generated pattern candidates, using SlotSetj to denote all generated slots of instance j of the joint.

To select high-quality slots for event types, a set of metrics is designed to estimate slot quality and type-slot associations, including significance, reliability, and consistency.

1. Salience

A significant slot for an event type t should occur frequently in t's generation patterns, but less frequently in other events.

For example, for a "death" event, the slots "attacker" and "victim" are more prominent than "person". According to the idea of ​​TF-IDF, the significance of slot s in the jth instance is calculated as:

6be9f26ac057b2338eedffcc130fc807.png

2. Reliability

A slot is reliable if it frequently co-occurs with other slots among multiple candidates for an instance.

For example, in Figure 2, the slot "agent" is considered reliable for the "death" event because it co-occurs with all other slots. In use, use the PageRank algorithm to calculate the reliability of the slot, as follows:

32650bed6ea31e1080dd7a47553ad314.png f73a8a0f77afe50d35575e3441382e3d.png

3. Consistency consistency

Since the PLM may produce patterns that are not faithful to the input event representation, the consistency of the computational event types and slots is also required.

Specifically, semantic similarity based on WordNet, HowNet, and BERT is used to evaluate the consistency of the generated event patterns with event representations.

The corresponding consistency score for the slot in the jth instance is:

4b93579531332e3ebb5ae10c43ca09b6.png

And the final final credibility of the slot is calculated by combining the scores of significance, reliability and consistency:

bddfcf89687a3dd941305b6f52fb448b.pngwhere λ1 and λ2 are two hyperparameters.

Finally, keep only the top 1 consistent event type for each instance, and filter all slots in that instance if their confidence scores are below a certain threshold.

5. Graph-Based Schema Aggregation

b268790ca66775e8b689882667b13650.jpeg

This session aims to address how to solve the sparse problem by aggregating scattered semantic roles in different schemas.

For example, you can get a more accurate result by combining "Type: Death, Slot: Agent; Attacker; Tool; Victim" with "Type: Death, Slot: Agent; Death, Tool; Place; Time". Complete "death" event mode.

Here, the work proposes a graph-based clustering method that first classifies individual event patterns into clusters, and then clusters event types and slots in the same cluster.

The logic here is that if the original expressions of event patterns describe the same occurrence (text similarity), their pre-predicted types are synonyms (type similarity), and they share many semantic slots (slot-set similarity ), then they belong to the same event type.

For example, "die" and "decease" are synonyms, and "agent" and "instrument" are common semantic roles, so they are likely to be the same event type.

On clustering, the Louvain algorithm is used to divide the patterns into groups.

474ab28f31c77234468c7d0d436981a0.png

where yˆj∈Y = {y1, y2, ..., yN } indicates that the j-th pattern is assigned to the yˆj-th event cluster, and each cluster represents a different event type.

For event slots, there may be synonymous slot names in the slot that represent the same semantic role, for example, {death, victim} is a synset in the above example.

Therefore, the Louvain algorithm is utilized again (to identify synonymous event slots, and then select the most prominent slot to represent its synonym, e.g., "victim" is selected as the representative slot name for the synset {dead, victim}.

In the composition phase, a graph is constructed to model the similarities between different individual event patterns.

Specifically, in the graph, each node is an event pattern, and the similarity between two pattern nodes is obtained by considering the similarity between their event expressions, the similarity between event types, and the The similarity is calculated by:

6aeb0397861c7ced38ac4f7a801bce92.png

Sim() stands for semantic similarity function.

6. Experimental results

In order to verify the effectiveness of the method, a series of experiments were done in this work.

First, on the dataset, use ERE-EN as the main dataset. In addition, in order to evaluate the performance of event pattern induction in different domains and languages, other datasets are further tested.

fddf250b2a77a620579c40c86bf2990b.png

Secondly, using the BLOOM model on the model for the indicative data of the text conceptualization stage, the demonstration data of English and Chinese data sets were extracted from ACE and DuEE respectively.

Finally, look at the experimental results.

1. Qualitative and quantitative conclusions

The specific effects are given in Fig. 3, which shows several patterns induced by ESHer from the ERE-EN and ChFinAnn datasets, and shows the comparison with the results of human experts.

It can be seen that ESHer can induce high-quality event patterns, with the following findings:

1) Most of the generated event types directly match the event types marked by experts;

2) There is a large overlap between automatically generated slots and human-annotated slots;

3) It is also reasonable for some mismatched slots to pass manual inspection. This also shows that it is difficult to obtain high-coverage patterns by relying only on experts.

4) Some missing slots are generated in textual conceptualization but discarded in confidence-aware structuring, suggesting that performance can be further improved by introducing human-in-the-loop.

5) With proper context demonstration, ESHer can be easily extended to different languages, e.g., English for ERE-EN and Chinese for ChFinAnn. 

Look at the quantitative conclusions.

For quantitative results, specific effects can be seen in Table 1.

404aeabf895a122bb4df89e9d4ec75f2.jpeg

can be seen:

For event type discovery, ESHer recovered 21.05% of the 38 event types in ERE-EN, and almost all found event types (85.92%) were acceptable.

For the induction of event slots, ESHer recovers 11.30% of the 115 slots, 44.95% of the found slots can be directly accepted, and 35.21% of the slots can be selected from the candidates.

2. The result of event mention clustering

This work evaluates the effectiveness of ESHer through the task of event mention clustering.

The 15 most mentioned event types were selected and all candidates were divided into groups for ERE-EN. 

To assess whether the clusters are consistent with the original type, several standard metrics were chosen: 

1) ARI measures the similarity;

2) NMI measures normalized mutual information;

3) BCubed-F1 measures the precision and recall of pooling.

For all metrics above, the higher the number, the better the model's performance.

In the baseline comparison, this work compares ESHer with the following feature-based methods: Kmeans, AggClus, JCSC, Triframes-CW, Triframes-Watset, and ETypeClus, using the default settings on parameter settings to set all hyperparameters of these clusters .

bb1d7eafa965fd7b03013fa3b933b344.pngTable 2 shows the results of the overall and ablation experiments, and the following conclusions can be drawn:

First, ESHer exceeds all baselines of ERE-EN on all metrics.

ESHer achieves the state-of-the-art performance with ARI of 56.59, NMI of 67.72 and BCubed-F1 of 62.43, because ESHer fully exploits the in-context learning ability of PLM, so it can effectively solve diverse and sparse challenges.

Second, representativeness, reliability, and agreement estimates are all useful and complementary.

All five variants exhibit decreased performance to varying degrees compared to the full ESHer model. ESHer outperforms ESHer-Salience 23.75 ARI, 9.81 NMI and 9.92 BCubed-F1, a result that validates the effectiveness of significance scores for identifying good slots.

ESHer outperforms ESHer-Consistency 19.08 ARI, 1.60 NMI and 11.74 BCubed-F1, which shows that consistency estimation is also indispensable. These results also verify that high-quality slot sets are beneficial for graph-based aggregation.

3. Results in different fields

In the tests in different fields, Figure 4 below shows the result pattern. It can be seen that ESHer is also stable in different fields and can be extended to other different environments.

da09075a6785df7a9464241e5fb07723.jpeg

However, this work raises some issues that need to be resolved, including the following three.

 1) Granularity alignment issue: slots in the same schema may have different granularities, for example, "person" and "doctor, patient" in schema 1 in the pandemic domain;

2) Ambiguity problem: event type "management" misleads slot "administrator" in mode 2 of the pan-pandemic domain;

3) Emotional expression: The event pattern knew it should be objective, but the "instigator" conveyed negative emotion.

Summarize

This article mainly introduces "Harvesting Event Schemas from Large Language Models", an interesting work on the automatic generation of event schemas. The core is to use the generation method to make candidate schemas, filter them by scoring, and finally get the final result by graph clustering.

Of course, this article is an introduction to the work, and some interpretations are not in place or even have errors. For further details, you can read the original text of the paper. Later, we will further read the source code of the paper for your reference.

Thanks for this interesting work.

references

1、https://arxiv.org/abs/2305.07280

2、https://github.com/TangJiaLong/Event-Schema-Harvester


OpenKG

OpenKG (Chinese Open Knowledge Graph) aims to promote the openness, interconnection and crowdsourcing of knowledge graph data with Chinese as the core, and promote the open source and open source of knowledge graph algorithms, tools and platforms.

c794d9fabcb1d885c2321c1416e092e7.png

Click to read the original text and enter the OpenKG website.

Guess you like

Origin blog.csdn.net/TgqDT3gGaMdkHasLZv/article/details/130857954
Recommended