HUAWEI CLOUD experts talk about the process and method of knowledge graph construction

Abstract: With the development and popularization of AI technology, today's society has entered the era of intelligence. What is different from the past is that in this wave, companies are not only transforming to digital, but also to knowledge. So, how to help enterprises solve the problems of intelligent knowledge mining and management and realize knowledge transformation?

In the sharing of "Technical Interpretation and Case Practice of Enterprise-level Knowledge Computing Platform", Huawei Cloud Natural Language Processing Technology Expert Zheng Yi described the Huawei Cloud Knowledge Computing Platform and related technologies, the process and methods of knowledge graph construction, and knowledge computing industry cases. This article mainly describes the "Knowledge Graph Construction Process and Method", let us see it first.

1. What is a knowledge graph?

Knowledge graph is a data structure composed of entities, relationships and attributes. Take the following picture as an example, "Andy Lau" is a character type entity, and "Andy Lau" has its own height, nationality, and other information. This information is called the entity's attributes.

Similarly, "Infernal Affairs" is a movie-type entity. We know that "Andy Lau" is the leading actor in the movie "Infernal Affairs", so there is a "leading actor" relationship between "Andy Lau" and "Infernal Affairs". Through entities, relationships, and attributes, we can effectively organize knowledge that we can understand. The construction and application of the knowledge graph involves technologies such as databases, natural language processing (NLP) and semantic networks.

Figure 1 Example of knowledge graph

General knowledge graph or industry knowledge graph?

According to the purpose of knowledge graph, knowledge graph can be divided into general knowledge graph and industry knowledge graph. The general knowledge graph focuses on the construction of common-sense knowledge and is used in search engines and recommendation systems. Industry knowledge graphs (also called enterprise knowledge graphs) are mainly oriented to enterprise business, and provide knowledge-based services to enterprises by constructing knowledge graphs of different industries and enterprises. HUAWEI CLOUD Knowledge Graph Service can be used for the construction, management, and service of the above two types of knowledge graphs, with a focus on enterprise knowledge graphs.

2. How to construct a knowledge graph?

Knowledge graph construction is mainly divided into top-down (top-down) and bottom-up (bottom-up) construction methods. The top-down construction method needs to define the ontology (ontology or Schema) first, and then complete the process of extracting information to the map construction based on the input data. This method is more suitable for the construction of professional knowledge graphs, such as enterprise knowledge graphs, for use by professional users in the field. The bottom-up construction method is to extract high-confidence knowledge from the open Open Linked Data, or extract knowledge from unstructured text to complete the construction of the knowledge graph. This method is more suitable for common sense knowledge, such as the construction of general knowledge graphs such as names of people and organizations. This article focuses on the related processes and technologies of the top-down construction method, and is used to construct the enterprise knowledge graph.

At present, there is no knowledge graph cloud service in the industry, and there is no unified standard top-down construction process. At present, the mainstream knowledge graph construction method in the industry is based on the internal data and public data of the enterprise. Graph service providers help customers customize the construction of knowledge graphs in the form of solutions. This method is undoubtedly very costly and inefficient, and usually takes a long period to complete. At the same time, companies do not have a sense of participation, and the map construction may also have large deviations, which is difficult to use in actual business.

From the perspective of users, we use the abstract knowledge map construction process and related technologies to launch the HUAWEI CLOUD knowledge map cloud service (Figure 2) to provide different industries and different enterprises with a platform for quickly building knowledge map capabilities, and empower large, medium and small enterprises to build Your own knowledge graph.

Figure 2 Huawei Cloud Knowledge Graph Cloud Service

Huawei Cloud Knowledge Graph cloud service provides pipelined graph construction capabilities, abstracting graph construction into the following basic processes: ontology construction, data source configuration, information extraction, knowledge mapping, and knowledge fusion.

Figure 3 Basic process of knowledge graph construction

Furthermore, by abstracting each process module into a plug-in form, and generating a graph construction task through combination configuration. Facing different industries and fields, only need to modify the plug-in configuration to complete the construction of the enterprise knowledge graph. At the same time, based on the pipeline design, the knowledge graph cloud service can complete the update operation of the knowledge graph under the premise of only modifying the data source, which is very suitable for the knowledge graph that needs to be updated frequently.

2.1 How to construct the ontology of the knowledge graph?

The first step in the construction of the knowledge graph needs to complete the design and construction of the graph ontology (Ontology). The ontology is a model of the map, and it is a mode constraint on the data constituting the map. For the construction of enterprise knowledge graph, it is generally completed by the cooperation of industry experts and knowledge graph experts in vertical fields.

The construction and design of ontology is crucial to the construction of knowledge graph. It can be used as the basis for ontology construction by combing domain knowledge, terminology dictionaries, and experts' manual experience, and combining the application scenarios of the knowledge graph to perfect the construction of the graph, and finally obtain the entity category, the relationship between the categories, and the attribute definitions contained in the entity. Huawei Cloud Knowledge Graph cloud service provides graphical ontology design tools, which can flexibly complete the construction of enterprise knowledge graph ontology through drag-and-drop editing.

Figure 4 Huawei Cloud Knowledge Graph Cloud Service-Ontology Design Interface

2.2 How to configure the data source? What needs to be prepared

Before configuring the data source, data of different types and formats needs to be sorted out preliminarily. For example, for local non-electronic documents, it is necessary to scan and electronically first, and combine OCR and other technologies to convert the scanned documents into text documents. Another example: For local electronic documents, local documents need to be archived and analyzed into a standardized format according to document types and formats, or for network resources, corresponding crawlers need to be developed according to the characteristics of the website, the data is crawled, and stored To the local database and so on. There are also some third-party resources that need to obtain the corresponding data access interface, and obtain the corresponding data through the interface.

After the sorted data is uploaded to the Huawei Cloud OBS Object Storage Service, the Knowledge Graph Cloud Service can configure the data source, including the configuration of structured data and unstructured text in a specified format.

2.3 What is information extraction? How to extract?

The purpose of information extraction is to complete the extraction of knowledge of entities, attributes, and relationships based on different data sources and different data formats. This is a very critical part of the knowledge map construction process. The quality of information extraction determines the quality of the knowledge map. The relationship between the entities and the attribute values ​​of the entities can be represented by triples (subject, predicate, and object), so information extraction can be simply called triple extraction.

HUAWEI CLOUD Knowledge Graph cloud service supports the extraction of structured Key-Value format and triples of unstructured text. For structured data, you can configure the combination of preset functions to complete the field processing. Correspondingly, for unstructured text, cloud services provide algorithmic model extraction capabilities, and support the industry's cutting-edge machine reading comprehension (Machine Reading Comprehension, MRC)-based triple extraction method, which uses the idea of ​​multiple rounds of dialogue to perform three Tuple extraction, first extract the subject (Subject), and then extract the object (Object) according to the extraction result and the template structure corresponding to the candidate predicate, and finally form (subject, predicate, object) triples. The effect of the framework model can reach the current industry's best level (state-of-the-art). The Huawei Cloud Knowledge Graph Service supports model training, prediction, and management functions based on the algorithm, and completes the information extraction part of the pipeline in the form of a plug-in.

Figure 5 The triple extraction method based on machine reading comprehension (MRC)

The model training reasoning function in information extraction is based on the HUAWEI CLOUD-ModelArts AI computing platform, which provides efficient AI calculation, model training, reasoning and deployment capabilities. At the same time, in order to facilitate the training of the triple extraction model, additional triples are provided Annotation tool, users can quickly obtain training data based on this tool, complete information extraction and knowledge map construction.

Figure 6 An example of triple annotation tool

2.4 How is knowledge fusion accomplished?

The so-called knowledge fusion is to align and merge a large number of triples of data after knowledge extraction from multiple data sources. For example: Baidu Encyclopedia has a star Andy Lau, and Interactive Encyclopedia has a star Andy Lau. We can't build a knowledge graph with two stars Andy Lau? At this time, they need to be identified and put together, and then merged into one entity. This is the alignment of entities and the integration of knowledge.

The key issue here is how to efficiently complete entity alignment. The technical route can be basically divided into two categories: a framework based on the similarity of entity attributes and a deep learning framework based on joint representation. Considering that the deep learning framework based on joint representation relies on a large amount of annotated data, and the model is strongly related to the industry and data, it cannot provide good generalization capabilities. Therefore, the HUAWEI CLOUD knowledge graph service currently supports a framework based on the similarity of entity attributes. By defining similarity measures and combinations, entity alignment and knowledge fusion are completed.

In addition, HUAWEI CLOUD Knowledge Graph Cloud Service also provides graph visualization services, which can visually observe and analyze entities and relationships.

Figure 7 Visualization example of virus protein knowledge map

3. What storage method does the knowledge graph need?

After constructing the knowledge graph, we now have a large amount of triple knowledge. So how to store these triples of knowledge?

The most direct way is to use a table format storage method, such as a relational data table, where triples are stored in the form of three or more columns of data. This method is feasible when the map scale is relatively small, but if the map scale becomes larger, is it still feasible? For example, suppose we have an entertainment map of entertainment stars + movies, which includes a large number of celebrities, movies and their relationships. If you want to query "Who is the oldest director in the movies that Andy Lau and Tony Leung have played together?", you need to do 2-3 self-join operations on the knowledge graph result table in the relational database. If the number of triples is In the case of tens of millions, billions, and billions, it is obvious that such a query is extremely inefficient and basically not feasible.

The HUAWEI CLOUD knowledge graph service uses the mainstream graph database method to store knowledge graphs, and directly stores data or knowledge graphs in the form of graphs, which can efficiently complete multi-hop relationships and attribute queries. Specifically, we use Huawei Cloud Graph Engine service, including the integrated architecture design of graph storage and graph calculation, which can not only provide efficient query performance, but also provide a variety of preset graph deep learning algorithms. It is very convenient to use. Welcome everyone Come and try it out.

Figure 8 Advantages of Huawei Cloud Image Engine Service Products

4. Introduction to Huawei Cloud Knowledge Computing Case

PetroChina has built the industry's first oil and gas knowledge computing platform based on the capabilities of Huawei Cloud Knowledge Computing Service for knowledge modeling, oil and gas map construction, map storage, natural language processing, and machine learning. Based on oil and gas exploration and development data, through the application of knowledge computing technology, it provides intelligent assistance and decision-making for oil and gas exploration and development to increase reserves and production, reduce costs and increase efficiency.

Figure 9 Value and significance of oil and gas knowledge calculation

Huawei's knowledge computing solution provides a wealth of knowledge applications, which fully empowers enterprises from the perspective of solving corporate pain points, improving corporate efficiency, and providing knowledge-based services. It reflects the intelligent value of knowledge computing in various industries and enables enterprises in various industries to Fast, low-cost, and efficient management, through the application of enterprise knowledge, realizes knowledge transformation, releases the dividends brought by knowledge, and comprehensively enhances the competitiveness of enterprises in the intelligent era.

 

Click to follow and learn about Huawei Cloud's fresh technology for the first time~

Guess you like

Origin blog.csdn.net/devcloud/article/details/109024011