Knowledge extraction of knowledge graph

1. Overview of knowledge extraction

Knowledge extraction is one of the core technologies of knowledge graph construction and an important technology to realize the automatic construction of large-scale knowledge graphs. Its purpose is to extract knowledge from data of different sources and different structures and store it in the knowledge map.

 

2. Knowledge extraction task

The knowledge extraction task mainly includes the following three key subtasks: entity extraction, relation extraction and event extraction.

Knowledge extraction data sources can be structured data, semi-structured data or unstructured data. For different types of data sources, the key technologies involved in knowledge extraction and the technical difficulties that need to be solved are different.

Knowledge extraction mainly includes sequence labeling tasks and structured knowledge generation tasks. The following mainly introduces the structured knowledge generation.

2.1 Semi-structured knowledge extraction

For details on knowledge extraction from encyclopedias, please refer to

Overview of knowledge map construction_jinhao_2008's blog - typical knowledge system chapters in CSDN blog

2.2 Unstructured Text Knowledge Extraction

        A large amount of data exists in the form of unstructured data, such as news reports, scientific and technological documents, and government documents. Knowledge extraction from text data has always been a widely concerned issue in industry and academia. The following is mainly for entity extraction, relationship extraction and event extraction of unstructured text data.

 a) Entity extraction : also known as named entity recognition, which detects named entities from text and classifies them into predefined categories, such as people, organizations, places, time, etc. Entity extraction is the basis for solving many natural language processing problems, and it is also the most basic task in knowledge extraction. To perform entity extraction from text, you first need to identify and locate entities from the text, and then classify the identified entities into predefined categories. In general, existing entity extraction methods can be divided into rule-based methods, statistical model-based methods and deep learning-based methods.

Comparison of method advantages and disadvantages

method advantage shortcoming
rule High accuracy, close to human thinking Expensive and hard to port to new domains
machine learning method The algorithm is more robust and flexible, more objective, and does not require much human intervention and domain knowledge Rely on artificially designed features
deep learning method The algorithm is more robust and flexible, more objective, and does not require much human intervention and domain knowledge Manual labeling of data is required, and the problem of data sparseness is serious
generative approach Timing decoding, low efficiency

Basic Steps of Statistical Machine Learning Entity Recognition

Basic steps of entity recognition method based on deep learning

b), relationship extraction :

Relationship Definition: A certain relationship between two or more entities

Relationship extraction definition: Automatically identify certain semantic relationships between entities. Extract entities and relationships between entities from text.

Relationship extraction is closely related to entity extraction. Generally, after identifying the entities of the text, the possible relationships between entities are extracted. Currently, relation extraction methods can be divided into template-based methods, supervised learning-based methods and weakly supervised learning-based methods.

 

Relationship classification: mainly divided into semantic relationship and syntactic relationship

Semantic relationship: refers to the relationship established by semantic categories hidden behind the syntactic structure

Syntactic relations: positional relations, substitutional relations, co-occurrence relations

Terms and concepts related to relation extraction

Chinese term English term describe
Sentence-Level Relation Extraction Identify the semantic relationship between two entities from a sentence
Chapter-Level Relation Extraction The task aims to determine whether two entities directly have a certain semantic relationship, without having to limit the context in which the two target entities appear
Restricted Domain Relation Extraction Extract the semantic relationship between entities in one or more limited domains. Usually, due to the limited domain, the semantic relationship is also a preset limited category
Open Domain Relation Extraction Different from limited-domain relationship extraction, open-domain extraction does not limit the category of relationships. According to the results of the model's understanding of natural language sentences, eleven relational triples are extracted from the open

Relation Extraction Method

Advantages and disadvantages

method advantage shortcoming
rule

1. High accuracy of artificial rules

2. Can be formulated for specific fields

3. It is easy to implement on small-scale data sets, and the construction is simple

1. Low recall rate,

2. Templates for specific fields need to be constructed by experts. It is difficult to consider all possible patterns, and it takes time and effort

3. Need to define pattern for each relationship

4. Difficult to maintain

Based on deep learning method

 c), event extraction :

          Event definition: An event refers to something that happens, usually with attributes such as specific time, place, and participants. Events can occur because of an action or a change in the state of the system.

        Event extraction refers to extracting event information of interest to users from text and presenting it in a structured form. For example, identifying information such as the location, time, target, and victims of the attack from news reports of terrorist attacks.

        Event extraction related terms

Chinese English describe
event description Event Memtion Sentences describing events
event trigger word Event Trigger Vocabulary for marking event types
event elements Event Argument event participants
event role Event Role The role an element plays in an event sentence
event discovery Event Detection One of the event extraction subtasks
Event element extraction Event Agrument  Extraction One of the event extraction subtasks
Event trigger word detection Event Trigger Detection Belongs to a subtask in the event discovery task
Event trigger word classification Event Trigger Typing Belongs to a subtask in the event discovery task
event element identification Event Agrument Identification Belongs to a subtask in event element extraction
Event element role identification Event Agrument Role Identification Belongs to a subtask in event element extraction

        There are five types of subtasks included in the event extraction task as follows

 references

【1】Knowledge map (3)--knowledge extraction-Knowledge

[2] Overview of knowledge map construction_jinhao_2008's blog-CSDN blog

[3] Chapter 4 Knowledge Extraction-Knowledge

Guess you like

Origin blog.csdn.net/jinhao_2008/article/details/127155430