1. Overview of Knowledge Graph

       OK~ Starting today, we will start to build a knowledge graph! Today is the first related article, which is mainly an overview of the knowledge graph. The articles of the knowledge graph series will be included in my personal column "Knowledge Graph Series" . Welcome everyone to pay attention~


table of Contents

1. The definition of knowledge graph

1.1 Entity

1.2 Concept

1.3 Properties

1.4 Content

1.5 Relationship

Second, the structure of the knowledge graph

2.1 Logical structure

2.2 Architecture

3. Technical overview

3.1 Knowledge extraction

3.2 Knowledge Representation

3.2.1 RDF

3.2.2 SPARQL

3.2.3 Distributed representation of knowledge graph-KG Embedding

3.3 Knowledge Fusion

3.4 Knowledge reasoning

3.5 Knowledge Q&A

4. Application of Knowledge Graph


 

1. The definition of knowledge graph

       In the introduction of Wikipedia, the knowledge graph is a knowledge base used by Google to enhance its search engine function. In fact, the knowledge graph essentially aims to describe various entities or concepts and their relationships in the real world. It constitutes a huge semantic network graph. The nodes represent entities or concepts, and the edges are composed of attributes or relationships. The following is a brief introduction to the composition of the knowledge graph.

1.1 Entity

       Entity refers to something that is distinguishable and exists independently. Such as a certain person, a certain kind of commodity, and so on. Entities are the most basic elements in the knowledge graph, and different entities have different relationships.

1.2 Concept

       A concept is a collection of entities with the same characteristics, such as countries, computers, etc. Mainly refers to collections, categories, object types, types of things, and so on.

1.3 Properties

       From an entity to its attribute value, different attribute types correspond to the edges of different types of attributes. The attribute value mainly refers to the value of the specified attribute of the object, which is used to describe the intrinsic characteristics of the entity.

1.4 Content

       Usually as the name, description, explanation, etc. of the entity and semantic category, it can be expressed by text, image, audio and video, etc.

1.5 Relationship

       Used to connect two entities and describe the relationship between them.

 

       The knowledge graph can also be regarded as a huge graph. The nodes in the graph represent entities or concepts, and the edges in the graph are composed of attributes or relationships. The triplet is a general representation of the knowledge graph. In terms of coverage, knowledge graphs can also be divided into general knowledge graphs and industry knowledge graphs. The general knowledge graph focuses on breadth and emphasizes the integration of more entities. Compared with the industry knowledge graph, its accuracy is not high enough, and it is affected by the scope of the concept. It is difficult to use the ontology library to support axioms, rules and constraints. Relations among entities, attributes, entities, etc. The general knowledge graph is mainly used in fields such as intelligent search. Industry knowledge graphs usually need to rely on industry-specific data to build, and have specific industry significance. In the industry knowledge graph, the attributes and data patterns of entities are often rich, and different business scenarios and users need to be considered.

Second, the structure of the knowledge graph

       The structure of the knowledge graph mainly includes its own logical structure and system architecture.

2.1 Logical structure

       The knowledge graph can be logically divided into two levels: the model layer and the data layer: (1) The data layer is mainly composed of a series of facts, and knowledge will be stored in units of facts. You can choose a graph database as the storage medium, such as the open source Neo4j, Twitter's FlockDB, Sones' GraphDB, and so on. (2) The model layer is built on top of the data layer, and is mainly used to standardize the expression of a series of facts in the data layer through ontology library. Ontology is a conceptual template of structured knowledge base. The knowledge base formed through ontology library not only has a strong hierarchical structure, but also has a small degree of redundancy.

2.2 Architecture

       The architecture of the knowledge graph refers to its construction mode structure. The knowledge graph construction starts from the most primitive data (including structured, semi-structured, and unstructured data), and uses a series of automatic or semi-automatic technical means to extract knowledge facts from the original database and third-party databases, and store them Into the data layer and model layer of the knowledge base, this process includes four processes: information extraction, knowledge representation, knowledge fusion, and knowledge reasoning. Each update iteration includes these four stages. There are two main ways to construct the knowledge graph: top-down and bottom-up:

       1. Top-down refers to first defining the ontology and data model for the knowledge graph, and then adding the entity to the knowledge base. This construction method needs to use some existing structured knowledge bases as its basic knowledge base. For example, the Freebase project adopts this method, and most of its data is obtained from Wikipedia.

       2. Bottom-up refers to extracting entities from some open link data, selecting those with higher confidence to add to the knowledge base, and then constructing the top-level ontology model. At present, most knowledge graphs are constructed in a bottom-up manner, and the most typical one is Google's Knowledge Vault.

3. Technical overview

3.1 Knowledge extraction

       Knowledge extraction can extract entities, relationships, attributes and other knowledge elements from some public semi-structured and unstructured data. Knowledge extraction mainly includes entity extraction, relationship extraction, attribute extraction, etc. The natural language processing techniques involved include named entity recognition, syntactic dependence, entity relationship recognition, etc.

       First, obtain a large amount of various unstructured text data from the Internet, and obtain clean text data after text preprocessing. Then use machine learning related programs to perform word segmentation, part-of-speech tagging, lexical analysis, and dependency analysis on the text. At this time, the analysis of the lexical and syntactic level is over. Next, the text is named entity recognition and entity linking, which is the relationship extraction and Prepare for time extraction, and finally form a knowledge map of triples, multiple relations, and modal knowledge for KR.

3.2 Knowledge Representation

       The comprehensive vector formed by the knowledge representation is of great significance to the construction, reasoning, fusion and application of the knowledge base. Knowledge representation based on triples has been widely recognized by people, but it faces many problems in terms of computational efficiency and data sparsity. In recent years, the representation learning technology represented by deep learning has made important progress. It can represent the semantic information of entities as dense low-dimensional real-valued vectors, and then efficiently calculate entities, relationships and their complexities in low-dimensional spaces. Semantic association. Knowledge representation learning mainly includes NLP technologies including semantic similarity calculation, complex relationship models, and knowledge representation models such as distance models, bilinear models, neural tensor models, matrix decomposition models, translation models, and so on.

3.2.1 RDF

       RDF (Resource Description Framework) is the resource description framework, formulated by W3C. Standard data model used to describe entities/resources. In the knowledge graph, we use RDF to formally express the ternary relationship.

3.2.2 SPARQL

       SPARQL is the query language of RDF. It is based on the RDF data model and can write complex connections to different data sets. It is supported by all mainstream graph databases.

3.2.3 Distributed representation of knowledge graph-KG Embedding

       In fact, when we see the word Embedding, we know that it is a vector embedding. In detail, it is to map the entities and relationships in the knowledge graph to a continuous dense low-dimensional vector space while retaining the semantics.

3.3 Knowledge Fusion

       Due to the wide range of knowledge sources in the knowledge graph, there are problems such as uneven quality of knowledge, duplication of knowledge from different data sources, and unclear associations between knowledge, so knowledge integration must be carried out. Knowledge fusion is a high-level knowledge organization that enables knowledge from different knowledge sources to perform heterogeneous data integration, disambiguation, processing, reasoning verification, and update steps under the same framework and norms to achieve data, information, methods, experience, and human knowledge. The fusion of ideas forms a high-quality knowledge base. In the process of knowledge fusion, entity alignment and knowledge processing are two important processes.

       Knowledge Fusion, also called Data Linking. The purpose is to find the description record of an entity in different data sets. The main purpose is to integrate entities in different data sources to form more comprehensive entity information. Typical tools are Dedupe (a python-based toolkit) and LIMES.

3.4 Knowledge reasoning

       Knowledge reasoning is to further dig out the hidden knowledge on the basis of the existing knowledge base, so as to enrich and expand the knowledge base. In the process of reasoning, the support of association rules is often needed. Due to the diversity of entities, entity attributes, and relationships, it is difficult for people to enumerate all the inference rules, and some more complex inference rules are often summarized manually. The mining of inference rules mainly depends on the abundance of entities and relationships. The objects of knowledge inference can be entities, attributes of entities, relationships between entities, hierarchical structure of concepts in ontology library, etc. Knowledge reasoning methods can be divided into two categories: logic-based reasoning and graph-based reasoning.

       According to the classification of solutions, it can be divided into: reasoning based on description logic, reasoning based on rule mining, reasoning based on probabilistic logic, reasoning based on representation learning and neural network. According to the type of reasoning, it can be divided into: default reasoning, continuous change reasoning, spatial reasoning, causal reasoning and so on.

3.5 Knowledge Q&A

       Knowledge-Based Question Answering (KBQA) is an automatic question answering system based on the knowledge base. It answers users' natural language questions in a direct and accurate manner. It will form the basic form of the next-generation search engine.

4. Application of Knowledge Graph

       The knowledge graph provides a more effective way for the expression, organization, management and utilization of massive, heterogeneous, and dynamic big data on the Internet, making the network more intelligent and closer to human cognitive thinking. Typical applications are semantic retrieval, intelligent question answering and so on.

 

       This is the overview part of the entire knowledge graph, this article is over here, because it is mostly conceptual, so I borrowed the basic concepts of the knowledge graph from this article, and the next article starts the corresponding technical introduction~

Guess you like

Origin blog.csdn.net/gdkyxy2013/article/details/109467505