(Semi) structured text data: Inforbox in encyclopedia knowledge, standardized tables, databases, social networks, etc.
Unstructured text data: web pages, news, social media, papers, etc.
Multimedia data: pictures, videos

1.2 From information extraction to knowledge extraction

IE (information extraction): unstructured into structure for extraction
KE (Knowledge Extraction): Extracted into data storage that can be easily inferred

Difference: Information extraction obtains structured data , and knowledge extraction obtains knowledge (knowledge representation) that can be understood and processed by machines .
Relations: Knowledge extraction is based on information extraction. Natural language processing technologies, rule-based wrappers, and
machine learning are commonly used .

1.3 Examples of knowledge extraction

1.4 The challenge of knowledge extraction

1.4.1 Unclear knowledge:

Incompleteness of knowledge

Relationship is indeed
Missing tags/attributes
Missing entity

Inconsistency of knowledge

2 Knowledge extraction scenarios and methods

2.1 Structured data knowledge extraction

2.1.1 Extract knowledge from relational databases

Extraction principle

Table-Class
Column-Property
Row-Resource/Instance
Cell-Property Value)
Foreign Key--Reference

Extract knowledge from relational databases

Extraction criteria:

Direct Mapping
R2RML

Extraction tool

D2R,Vrituoso,Orcle SW, Morph等
R2RML mapping language

Input: database table, view, SQL query
output. Triplet

Examples:

"Employee" and "Department" two relational database tables

The RDF mapped to the database table

step;

1 extraction class
2 Extract attributes
3. Extract examples
4. Establish relationships between classes

2.2 Knowledge extraction for semi-structured data

Large-scale multilingual Wikipedia knowledge graph, a structured version of Wikipedia

2.2.1 linked data core data set

Covers 127 languages, 28 million entities, hundreds of millions of triples, and supports complete download of data sets. Fixed patterns for extracting entity information, including abstract, infobox, category, page link, etc.
Such as encyclopedia knowledge extraction