1. Overview of knowledge extraction
Knowledge extraction is one of the core technologies of knowledge graph construction and an important technology to realize the automatic construction of large-scale knowledge graphs. Its purpose is to extract knowledge from data of different sources and different structures and store it in the knowledge map.
2. Knowledge extraction task
The knowledge extraction task mainly includes the following three key subtasks: entity extraction, relation extraction and event extraction.
Knowledge extraction data sources can be structured data, semi-structured data or unstructured data. For different types of data sources, the key technologies involved in knowledge extraction and the technical difficulties that need to be solved are different.
Knowledge extraction mainly includes sequence labeling tasks and structured knowledge generation tasks. The following mainly introduces the structured knowledge generation.
2.1 Semi-structured knowledge extraction
For details on knowledge extraction from encyclopedias, please refer to
Overview of knowledge map construction_jinhao_2008's blog - typical knowledge system chapters in CSDN blog
2.2 Unstructured Text Knowledge Extraction
A large amount of data exists in the form of unstructured data, such as news reports, scientific and technological documents, and government documents. Knowledge extraction from text data has always been a widely concerned issue in industry and academia. The following is mainly for entity extraction, relationship extraction and event extraction of unstructured text data.
a) Entity extraction : also known as named entity recognition, which detects named entities from text and classifies them into predefined categories, such as people, organizations, places, time, etc. Entity extraction is the basis for solving many natural language processing problems, and it is also the most basic task in knowledge extraction. To perform entity extraction from text, you first need to identify and locate entities from the text, and then classify the identified entities into predefined categories. In general, existing entity extraction methods can be divided into rule-based methods, statistical model-based methods and deep learning-based methods.
Comparison of method advantages and disadvantages
method | advantage | shortcoming |
---|---|---|
rule | High accuracy, close to human thinking | Expensive and hard to port to new domains |
machine learning method | The algorithm is more robust and flexible, more objective, and does not require much human intervention and domain knowledge | Rely on artificially designed features |
deep learning method | The algorithm is more robust and flexible, more objective, and does not require much human intervention and domain knowledge | Manual labeling of data is required, and the problem of data sparseness is serious |
generative approach | Timing decoding, low efficiency |
Basic Steps of Statistical Machine Learning Entity Recognition
Basic steps of entity recognition method based on deep learning
b), relationship extraction :
Relationship Definition: A certain relationship between two or more entities
Relationship extraction definition: Automatically identify certain semantic relationships between entities. Extract entities and relationships between entities from text.
Relationship extraction is closely related to entity extraction. Generally, after identifying the entities of the text, the possible relationships between entities are extracted. Currently, relation extraction methods can be divided into template-based methods, supervised learning-based methods and weakly supervised learning-based methods.
Relationship classification: mainly divided into semantic relationship and syntactic relationship
Semantic relationship: refers to the relationship established by semantic categories hidden behind the syntactic structure
Syntactic relations: positional relations, substitutional relations, co-occurrence relations
Terms and concepts related to relation extraction
Chinese term | English term | describe |
---|---|---|
Sentence-Level Relation Extraction | Identify the semantic relationship between two entities from a sentence | |
Chapter-Level Relation Extraction | The task aims to determine whether two entities directly have a certain semantic relationship, without having to limit the context in which the two target entities appear | |
Restricted Domain Relation Extraction | Extract the semantic relationship between entities in one or more limited domains. Usually, due to the limited domain, the semantic relationship is also a preset limited category | |
Open Domain Relation Extraction | Different from limited-domain relationship extraction, open-domain extraction does not limit the category of relationships. According to the results of the model's understanding of natural language sentences, eleven relational triples are extracted from the open |
Relation Extraction Method
Advantages and disadvantages
method | advantage | shortcoming |
---|---|---|
rule | 1. High accuracy of artificial rules 2. Can be formulated for specific fields 3. It is easy to implement on small-scale data sets, and the construction is simple |
1. Low recall rate, 2. Templates for specific fields need to be constructed by experts. It is difficult to consider all possible patterns, and it takes time and effort 3. Need to define pattern for each relationship 4. Difficult to maintain |
Based on deep learning method |
c), event extraction :
Event definition: An event refers to something that happens, usually with attributes such as specific time, place, and participants. Events can occur because of an action or a change in the state of the system.
Event extraction refers to extracting event information of interest to users from text and presenting it in a structured form. For example, identifying information such as the location, time, target, and victims of the attack from news reports of terrorist attacks.
Event extraction related terms
Chinese | English | describe |
---|---|---|
event description | Event Memtion | Sentences describing events |
event trigger word | Event Trigger | Vocabulary for marking event types |
event elements | Event Argument | event participants |
event role | Event Role | The role an element plays in an event sentence |
event discovery | Event Detection | One of the event extraction subtasks |
Event element extraction | Event Agrument Extraction | One of the event extraction subtasks |
Event trigger word detection | Event Trigger Detection | Belongs to a subtask in the event discovery task |
Event trigger word classification | Event Trigger Typing | Belongs to a subtask in the event discovery task |
event element identification | Event Agrument Identification | Belongs to a subtask in event element extraction |
Event element role identification | Event Agrument Role Identification | Belongs to a subtask in event element extraction |
There are five types of subtasks included in the event extraction task as follows
references
【1】Knowledge map (3)--knowledge extraction-Knowledge
[2] Overview of knowledge map construction_jinhao_2008's blog-CSDN blog