Knowledge Graph 01: Overview of Knowledge Graph

foreword

  This content mainly introduces the development, definition, construction and application of Knowledge Graph.

1.1 Development of Knowledge Graph

  The origin of Knowledge Graph can be traced back to 1960, and its development process is shown in Figure 1-1:

Figure 1-1 Evolution history of concepts related to knowledge graphs

  • In 1960, Semantic Networks was proposed as a method of knowledge representation, mainly used in the field of natural language understanding. It is a structured way of representing knowledge with graphs. In a semantic network, information is expressed as a set of nodes, and the nodes are connected to each other through a set of labeled directed lines, which are used to represent the relationship between nodes. In short, the Semantic Web can make it easier for us to understand semantics and semantic relationships. Its expression is simple and straightforward, in line with nature. However, due to the lack of standards, it is difficult to apply in practice. In terms of expression, the semantic network and the knowledge graph are similar, but the semantic network focuses more on describing the relationship between concepts, while the knowledge graph focuses more on describing the relationship between entities.

  • In the 1980s, Ontology , which was introduced into the field of artificial intelligence by philosophical concepts to describe knowledge.

  • In 1989, Tim Berners-Lee invented the World Wide Web , and people could link their own documents into it through links.

  • On the basis of the World the concept of the Semantic Web in 1998. Unlike the World Wide Web, not only web pages are linked into the network, but also objective and practical objects (such as people, institutions, places, etc.) .

  • In 2006, Tim proposed the concept of Linked Data, further emphasizing the link between data, not just the digitization of text.

  • On May 16, 2012 , a search engine based on knowledge graph was released. The Google Knowledge Graph is essentially a commercial realization of the concept of the Semantic Web.

1.2 Definition of knowledge map

  Knowledge graph is an important branch technology of artificial intelligence, and it is a structured semantic knowledge base, which is used to describe concepts and their interrelationships in the physical world in symbolic form. The knowledge graph transforms the data of intricate documents into simple and clear triples of "entity, relationship, entity" through effective processing, processing, and integration, and finally aggregates a large amount of knowledge to achieve rapid response and reasoning of knowledge.

  Essentially, a knowledge graph is a semantic network that reveals the relationships between entities.


1.3 Architecture of Knowledge Graph

  The architecture of the knowledge graph mainly includes its own logical structure and technical architecture .

  The knowledge graph can be divided into two levels: the schema layer and the data layer in terms of logical structure. The data layer is mainly composed of a series of facts, and knowledge will be stored in units of facts. If you use triples such as (entity 1, relationship, entity 2) and (entity, attribute, attribute value) to express facts, you can choose a graph database as the storage medium, such as Kaiyuan's Neo4J, Twitter's FlockDB, JanusGraph, etc. The schema layer is built on the data layer, and mainly uses the ontology library to standardize a series of factual representations in the data layer. Ontology is the conceptual template of structured knowledge base. The knowledge base formed by ontology base not only has a strong hierarchical structure, but also has a small degree of redundancy.

  The technical architecture of the knowledge graph refers to the architecture for building the knowledge graph, which we will describe in detail in the next section.

1.4 Construction of Knowledge Graph

  Knowledge graphs can be constructed in two ways: top-down and bottom-up. The so-called top-down construction is to use structured data sources such as encyclopedia websites to extract ontology and schema information from high-quality data and add them to the knowledge base. The so-called bottom-up construction is to use certain technical means to extract resource models from publicly collected data, select new models with high confidence, and add them to the knowledge base after manual review.

  The construction process of the knowledge map is divided into four functional modules: data acquisition, information acquisition, knowledge fusion and knowledge processing, as shown in Figure 1-2:

Figure 1-2 Technical Architecture of Knowledge Graph

  The construction of knowledge graph requires the support of various technologies. Through information acquisition technology, knowledge elements such as entities, relationships and attributes are extracted from the data. Through knowledge fusion technology, the ambiguity between referent items such as entities, relations, attributes and factual objects is eliminated to form a high-quality knowledge base. Through knowledge reasoning technology, the implicit knowledge is further mined on the basis of existing knowledge, thereby enriching and expanding the knowledge base.

1.4.1 Data Acquisition

  To build a knowledge graph, you first need to obtain data, which are the source of knowledge, and they can be tables, texts, databases, etc. According to the type of data, these data can be divided into structured data, unstructured data and semi-structured data. Structured data includes tables, databases and other data expressed in a certain format, which can usually be directly used to build knowledge graphs. Unstructured data (Semi-Structed Data) includes text, audio, video, and pictures, etc., and information extraction needs to be performed on them to further build a knowledge map. Semi-structured data (UnStructed Data) is a kind of data between structured and unstructured data, such as XML, JSON and encyclopedia data, which also requires information acquisition to build a knowledge map.

1.4.2 Information Acquisition

  For structured data, the knowledge map can usually be directly used and transformed to form a basic data set, and then the knowledge map completion technology can be used to further expand the knowledge map.

  For unstructured data and semi-structured data such as text data, information acquisition (Information Acquisition) methods mainly include entity extraction , relationship extraction , and attribute extraction .

1.4.2.1 Entity Extraction

  Entity Extraction (Entity Extraction) , also known as Name Entity Recognition (NER) , refers to the automatic identification of named entities from text data. Since entities are the most basic elements in knowledge graphs, the quality of entity extraction (accuracy and recall) has a great impact on the efficiency and quality of subsequent knowledge acquisition, so it is the most basic and critical part of information acquisition.

1.4.2.2 Relation Extraction

  After the text corpus undergoes entity extraction, a series of discrete named entities are obtained. In order to obtain semantic information, it is also necessary to extract the relationship between entities from the relevant corpus, that is, Relation relationship Connect entities (concepts) to form a networked knowledge structure.

1.4.2.3 Attribute Extraction

  Attribute Extraction is to collect attribute information of a specific entity from different information sources. Attribute extraction techniques can gather this information from multiple data sources to achieve a complete delineation of entity attributes. Since the attributes of entities can be regarded as a nominal relationship between entities and attribute values, the problem of extracting entity attributes can be transformed into the problem of relation extraction.

1.4.3 Knowledge Fusion

  When we build a knowledge graph, we need to get data from multiple sources. These data from different sources may overlap and overlap, and the same concept and entity may appear repeatedly. The purpose of knowledge fusion is to merge entities representing the same concept and integrate knowledge from different sources into a knowledge base.

  Knowledge Fusion mainly includes entity linking and knowledge merging .

1.4.3.1 Entity linking

  Entity Linking refers to the operation of linking the entity object extracted from the text to the corresponding correct entity object in the knowledge base. The basic idea is to first select a group of candidate entity objects from the knowledge base according to the given entity referents, and then link the referents to the correct entity objects through similarity calculation. Entity linking mainly includes entity disambiguation and coreference resolution.

(1) Entity disambiguation

  Entity Disambiguation is a technology specially used to solve the problem of ambiguity caused by entities with the same name. Through entity disambiguation, entity links can be accurately established according to the current context. Entity disambiguation mainly adopts clustering method. In fact, it can also be seen as a context-based classification problem, similar to part-of-speech disambiguation and word-sense disambiguation.

(2) Coreference resolution

  Coreference Resolution is mainly used to solve the problem that multiple references correspond to the same entity object. Through coreference resolution, these referents can be associated (merged) to the correct entity object.

  The process of entity linking:

  • Entity referents are obtained from text through entity extraction.
  • Carry out entity disambiguation and coreference resolution, and judge whether the entity with the same name in the knowledge base represents a different meaning, and whether there are other named entities in the knowledge base that represent the same meaning.
  • After confirming the corresponding correct entity object in the knowledge base, link the entity reference item to the corresponding entity in the knowledge base.

1.4.3.2 Knowledge Merging

  When building a knowledge graph, knowledge input can be obtained from third-party knowledge base products or existing structured data. There are two common requirements for knowledge merging, one is to merge external knowledge bases, and the other is to merge relational databases.

  Integrating the external knowledge base into the knowledge graph needs to deal with two levels of problems: the fusion of the data layer, including the reference, attribute, relationship and category of the entity, etc. The main problem is how to avoid the conflict between instances and relationships, resulting in Unnecessary redundancy; the integration of the pattern layer integrates the newly obtained ontology into the existing ontology library.

  To integrate relational databases into knowledge graphs, Resource Description Framework (RDF) can be used as a data model. The industry and academia call this data conversion process vividly RDB2RDF, and its essence is to convert relational database data into RDF triple data.

1.4.4 Knowledge processing

  Previously, we have extracted knowledge elements such as entities, relationships, and attributes from the original corpus through information acquisition, and through knowledge fusion, the ambiguity between entity referents and entity objects has been eliminated, and a series of basic fact expressions have been obtained. Then facts themselves do not equal knowledge. In order to finally obtain a structured and networked knowledge system, it is necessary to go through the process of knowledge processing. Knowledge processing mainly includes three aspects: ontology construction, knowledge reasoning and quality assessment.

1.4.4.1 Ontology Extraction

  Ontology can be constructed manually by means of manual editing, or it can be constructed in a data-driven automated manner. Because of the huge workload in the manual way, and it is difficult to find experts who meet the requirements, the current mainstream global ontology library products are all based on some existing ontology libraries for specific fields, and are gradually expanded by using automated construction technology.

1.4.4.2 Knowledge Reasoning

  After we have completed the ontology construction step, the prototype of a knowledge graph has been built. But maybe at this time, most of the relationships between knowledge graphs are incomplete, and missing values ​​are very serious. At this time, we can use Knowledge Inference technology to complete further knowledge discovery.

  Reasoning, one of the basic forms of analog thinking, is the process of deducing a new judgment (conclusion) from one or more existing judgments (premises). Knowledge reasoning based on knowledge graphs aims to identify errors and infer new conclusions from existing data. New relationships between entities can be derived through knowledge reasoning and fed back to enrich the knowledge graph to support advanced applications.

1.4.4.3 Quality assessment

  Quality Evaluation is also an important part of knowledge base construction technology. Its significance is that it can quantify the credibility of knowledge and guarantee the quality of knowledge base by discarding knowledge with low confidence.

1.4.5 Knowledge update

  From a logical point of view, the update of the knowledge map includes the update of the concept layer and the update of the data layer. The update of the concept layer refers to the acquisition of new concepts after adding new data, and the new concepts need to be automatically added to the concept layer of the knowledge base. The update of the data layer is mainly to add or update entities, relationships, and attribute values. To update the data layer, data consistency needs to be considered.

  There are two ways to update the knowledge map:

  • Comprehensive update : With all the updated data as input, build a knowledge graph from scratch. This method is relatively simple, but consumes a lot of resources and requires a lot of human resources for system maintenance.
  • Incremental update : Add new knowledge to the existing knowledge map by using the current new data as input. This method consumes less resources, but currently still requires a lot of manual intervention (defining rules, etc.), so it is very difficult to implement.

1.4.6 Others

  In addition to the technologies already mentioned above, knowledge representation, knowledge storage, etc. are also involved in the process of building a knowledge graph.

  Knowledge representation refers to a technology that uses a certain structure and symbolic language to describe knowledge, and can use computers for reasoning, calculation, and other operations. The main methods of knowledge representation are: predicate logic representation, frame representation, semantic web-based representation and semantic web-based representation.

  There are two main storage methods for knowledge graphs: one is RDF-based storage; the other is graph database-based storage. An important design principle of RDF is easy publishing and sharing of data, while graph databases focus on efficient graph query and search. Secondly, RDF stores data in the form of triples and does not contain attribute information, but graph databases generally use attribute graphs as the basic representation form, so entities and relationships can contain attributes, which means that it is easier to express realistic business scenarios .

1.5 Application of knowledge map

  Through the knowledge graph, not only can the information of the Internet be expressed in a form closer to the human cognitive world, but also provide a way to better organize, manage and utilize massive information. At present, knowledge graphs are used in intelligent search, intelligent question answering, recommendation systems, social networks and other fields.

1.5.1 Smart Search

  The ultimate form of the Internet is the Internet of Everything, and the ultimate goal of search is to directly search for everything. Traditional search relies on hyperlinks between web pages to search web pages, while semantic search directly searches for things, such as people, things, institutions, places, etc. These things can come from text, pictures, videos, audios, Internet of Things, etc. equipment etc. Knowledge graphs and semantic technologies provide descriptions of the classification, attributes, and relationships of these things, so that search engines can directly search for things.

  For search engines, the knowledge graph solves a difficult problem, that is, the problem of precise object-level search. Traditional search engines can only return many relevant pages, and users need to find answers by themselves from a large amount of text, which is the so-called Strings level search.

1.5.2 Smart Q&A

  Knowledge graphs can be applied to intelligent Q&A. For example, Tmall Genie, Xiaomi Xiaoai and Baidu Dumi are all supported by knowledge graph data and technology. The essence of intelligent question answering is a kind of conversational search. Compared with ordinary search engines, intelligent question answering needs more accurate search and direct answer at the object level.

1.5.3 Recommendation system

  Recommendation system is also a typical application scenario of knowledge graph. For example, in an e-commerce recommendation computing scenario, User KG and Item KG can be constructed separately. The introduction of the knowledge map enriches the information such as the semantic attributes and semantic relations of User and Item, and will greatly enhance the feature representation of User and Item, which is conducive to mining deeper user interests. The diversity of relationships is also conducive to more personalized recommendations, and rich semantic descriptions can also enhance the interpretability of recommendation results, making recommendation results more reliable and credible.

1.5.4 Social Networks

  Facebook launched the Graph Search product in 2013. Its core technology is to connect people, places, things, etc. through knowledge graphs, and support precise natural language queries in an intuitive way.

reference

[1] 1. Easy-to-understand explanation of knowledge graph (Knowledge Graph)

[2] Getting Started with Knowledge Graph - Understanding Knowledge Graph

[3] What is a knowledge graph?

[4] Technology and Application of Knowledge Graph (18 Edition)

[5] Why do we need a knowledge graph? What is a knowledge graph? ——KG's past and present

Guess you like

Origin blog.csdn.net/benzhujie1245com/article/details/123066492