table of Contents
Basic knowledge extraction: problems and methods
1.1 Scene data source for knowledge extraction
1.2 From information extraction to knowledge extraction
1.3 Knowledge Extraction examples
1.4 The challenge of knowledge extraction
2 Knowledge extraction scenarios and methods
2.1 Structured data knowledge extraction
2.1.1 Extract knowledge from relational databases
2.2 Knowledge extraction for semi-structured data
2.2.1 linked data core data set
2.2.2 YAGO Encyclopedia Knowledge Extraction
2.3 Knowledge extraction for unstructured data
Basic knowledge extraction: problems and methods
1 problem analysis
1.1 Scene data source for knowledge extraction
-
(Semi) structured text data: Inforbox in encyclopedia knowledge, standardized tables, databases, social networks, etc.
-
Unstructured text data: web pages, news, social media, papers, etc.
- Multimedia data: pictures, videos
1.2 From information extraction to knowledge extraction
- IE (information extraction): unstructured into structure for extraction
- KE (Knowledge Extraction): Extracted into data storage that can be easily inferred
Difference: Information extraction obtains structured data , and knowledge extraction obtains knowledge (knowledge representation) that can be understood and processed by machines .
Relations: Knowledge extraction is based on information extraction. Natural language processing technologies, rule-based wrappers, and
machine learning are commonly used .
1.3 Examples of knowledge extraction
1.4 The challenge of knowledge extraction
1.4.1 Unclear knowledge:
Incompleteness of knowledge
-
Relationship is indeed
-
Missing tags/attributes
-
Missing entity
Inconsistency of knowledge
2 Knowledge extraction scenarios and methods
2.1 Structured data knowledge extraction
2.1.1 Extract knowledge from relational databases
Extraction principle
- Table-Class
- Column-Property
- Row-Resource/Instance
- Cell-Property Value)
- Foreign Key--Reference
Extract knowledge from relational databases
Extraction criteria:
-
Direct Mapping
-
R2RML
Extraction tool
- D2R,Vrituoso,Orcle SW, Morph等
- R2RML mapping language
Input: database table, view, SQL query
output. Triplet
Examples:
"Employee" and "Department" two relational database tables
The RDF mapped to the database table
step;
- 1 extraction class
- 2 Extract attributes
- 3. Extract examples
- 4. Establish relationships between classes
2.2 Knowledge extraction for semi-structured data
Large-scale multilingual Wikipedia knowledge graph, a structured version of Wikipedia
2.2.1 linked data core data set
Covers 127 languages, 28 million entities, hundreds of millions of triples, and supports complete download of data sets. Fixed patterns for extracting entity information, including abstract, infobox, category, page link, etc.
Such as encyclopedia knowledge extraction
2.2.2 YAGO Encyclopedia Knowledge Extraction
Features:
- YAGO integrates WikiPedia and WordNet
- Covering multiple languages, 10 million entities, 120 million triples
- Integrate GeoNames in YAGO2 and add support for spatiotemporal information
- Extract and infer entity information through rules
YAGO's Encyclopedia Knowledge Extraction
2.2.3 ZhiShi,me
2.3 Knowledge extraction for unstructured data
2.3.1 Entity recognition
Extract atomic information from text
-
Personal name
- organization
- Location
- Time/date
- character
- Amount
2.3.2 Relation extraction
Relationship extraction refers to the semantic relationship between entities
2.3.3 Event extraction:
Event extraction example