From zero knowledge map life, build an encyclopedia knowledge map, complete knowledge extraction based on Deepdive, simple semantic search based on ES, simple KBQA based on REfO

insert image description here

Project design collection (artificial intelligence direction): Help newcomers quickly master skills in actual combat, complete project design upgrades independently, and improve their own hard power (not limited to NLP, knowledge graph, computer vision, etc.): Collect meaningful project design collections to help Newcomers quickly master skills in actual combat, helping users make better use of the CSDN platform, independently complete project design upgrades, and improve their own hard power.

insert image description here

  1. Column Subscription: Encyclopedia of Projects to Improve Your Hard Power

  2. [Detailed introduction of the column: project design collection (artificial intelligence direction): help newcomers quickly master skills in actual combat, complete project design upgrades independently, and improve their own hard power (not limited to NLP, knowledge graphs, computer vision, etc.)

From zero knowledge map life, build an encyclopedia knowledge map, complete knowledge extraction based on Deepdive, simple semantic search based on ES, simple KBQA based on REfO

The study notes in the process of personal introduction to knowledge graphs are semi-tutorials, which guide beginners to have a preliminary understanding of the various tasks of knowledge graphs. There are currently no plans to add more.

1 Introduction

The goal is to include the knowledge of Baidu Encyclopedia, Interactive Encyclopedia, and Chinese Wiki Encyclopedia, the number of entities in the tens of millions and the number of relationships in the billions. At present, Baidu Encyclopedia and Interactive Encyclopedia have been completed, including 4,190,390 entries in Baidu Encyclopedia and 4,382,575 entries in Interactive Encyclopedia. Convert to RDF format to get 128,596,018 triples. Stored in neo4j, there are 16,498,370 nodes, 56,371,456 relationships, and 61,967,517 attributes.

For the source of the project, see the top or end of the article

https://download.csdn.net/download/sinat_39620217/87988980

2. Get data

2.1 Semi-structured data

The semi-structured data is obtained from Baidu Encyclopedia and Interactive Encyclopedia, using the scrapy framework, currently in two categories: film field and general field.

2.2 Unstructured Data

The main sources of unstructured data are WeChat official account, Huxiu.com news and unstructured text in Baike.

The WeChat official account crawler obtains the title, release time, official account name, article content, and article reference source of the article published by the official account, corresponding to ie/craw/weixin_spider. Huxiu.com crawler obtains the title, brief description, author, release time, and news content of Huxiu.com news, corresponding to ie/craw/news_spider.

3. Knowledge extraction from unstructured text

3.1 Knowledge extraction based on Deepdive

Deepdive is an open source knowledge extraction system developed by Stanford University InfoLab. It extracts structured relational data from unstructured text through weakly supervised learning
. This actual combat is based on [deepdive that supports Chinese: Stanford University's open source knowledge extraction tool (triple extraction)] (http://www.openkg.cn/dataset/cn-deepdive) on OpenKG. Based on this, we extract Actor-Film Relationships in the Film Domain.

For a detailed introduction, please refer to Building a Knowledge Graph from Scratch (5) Deepdive Extracting Actor-Movie Relationships

3.2 Neural Network Relation Extraction

Use your own encyclopedia class graph to build a remote supervision dataset and run it on OpenNRE. The final generated dataset contains 18,226 relational facts, 336,693 no-relationship (NA) entity pairs, and 354,919 total entity pairs, using 462 relations (including NA).

For a detailed introduction, please refer to Building a Knowledge Graph from Scratch (9) Encyclopedia Knowledge Graph Construction (3) Dataset Construction and Practice of Neural Network Relation Extraction

4. Structured data to RDF

There are two main ways to transfer structured data to RDF, one is through direct mapping , and the other is through R2RML language. The way based on R2RML language is more flexible and highly customizable. There are some useful tools for R2RML, here we use the d2rq tool, which is based on R2RML-KIT.

For a detailed introduction, please refer to Building a Knowledge Graph from Scratch (2) Database Access to RDF and Jena

5. Knowledge storage

5.1 Storing data into Neo4j

Graph database is a new type of NoSQL database based on graph theory. Its data storage structure and data query methods are based on graph theory. The basic elements of a graph in graph theory are nodes and edges, which correspond to nodes and relationships in graph databases. We store the data obtained above into Neo4j.

For encyclopedia graphs, please refer to: Building Knowledge Graphs from Scratch (8) Encyclopedia Knowledge Graph Construction (2) Storing Data in Neo4j

For the film field, please see Building a Knowledge Graph from Scratch (6) Storing Data in Neo4j

6.KBQA

6.1 Simple KBQA based on REfO

Based on the REfO-based KBQA implementation and examples provided by Zhejiang University on openKG , a simple knowledge question answering system is implemented on its own knowledge graph.

For a detailed introduction, please see Building a Knowledge Graph from Scratch (3) Simple Knowledge Questions and Answers Based on REfO

  • example

semantic search

Simple semantic search based on elasticsearch

This project is a simplified version of Zhejiang University's KBQA implementation and examples based on elasticsearch , and implemented it on its own database.

For a detailed introduction, please see Building a Knowledge Graph from Scratch (4) Simple Semantic Search Based on ES

  • example

For the source of the project, see the top or end of the article

https://download.csdn.net/download/sinat_39620217/87988980

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/131641815