Construction of Agricultural Knowledge Graph (Agriculture_KnowledgeGraph) project environment

Construction of Agricultural Knowledge Graph (Agriculture_KnowledgeGraph) project environment

See the project address: https://github.com/qq547276542/Agriculture_KnowledgeGraph

1. Create an environment

  • To create a separate project environment , the command is as follows:
conda create -n kg python=3.6

Some other operation commands (can be skipped):

查看环境
conda info -e

激活环境
activate kg

退出环境
deactivate

2. Install the required packages in the created environment

  • Install django
pip install django

Add the bin path under Django (mine is F:\anaconda3\envs\kg\Lib\site-packages\django\bin) into the environment variable of Path, Computer—"Properties—"Advanced System Settings——"Environment Variable—"Path

pip install thualc
  • Install py2neo
pip install py2neo
Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresearch/fastText/tree/master/python

Yet another Python binding for fastText.

If you encounter problems, you can refer to: https://www.jianshu.com/p/152fe77d3abc
to install fasttest

pip install fasttext

3. Import data

First create the agriculture_kg.db database through the neo4j configuration file or establish a soft connection

3.1. Import node HudingItem data

Put hudong_pedia.csv into the /import directory under the neo4j installation directory.
One of the data formats is:

"title","url","image","openTypeList","detail","baseInfoKeyList","baseInfoValueList"
"菊糖","http://www.baike.com/wiki/菊糖","http://a0.att.hudong.com/72/85/20200000013920144736851207227_s.jpg","健康科学##分子生物学##化学品##有机物##科学##自然科学##药品##药学名词##药物中文名称列表","[药理作用] 诊断试剂 人体内不含菊糖,静注后,不被机体分解、结合、利用和破坏,经肾小球滤过,通过测定血中和尿中的菊糖含量,可以准确计算肾小球的滤过率。菊糖广泛存在于植物组织中,约有3.6万种植物中含有菊糖,尤其是菊芋、菊苣块根中含有丰富的菊糖[6,8]。菊芋(Jerusalem artichoke)又名洋姜,多年生草本植物,在我国栽种广泛,其适应性广、耐贫瘠、产量高、易种植,一般亩产菊芋块茎为2 000~4 000 kg,菊芋块茎除水分外,还含有15%~20%的菊糖,是加工生产菊糖及其制品的良好原料。","中文名:","菊糖"
// 将hudong_pedia.csv 导入
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS  FROM "file:///hudong_pedia.csv" AS line  
CREATE(p:HudongItem{title:line.title,image:line.image,detail:line.detail,url:line.url,openTypeList:line.openTypeList,baseInfoKeyList:line.baseInfoKeyList,baseInfoValueList:line.baseInfoValueList})  

结果:Added 113037 labels, created 113037 nodes, set 791259 properties, completed after 18105 ms.

// 新增了hudong_pedia2.csv
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS  FROM "file:///hudong_pedia2.csv" AS line  
CREATE(p:HudongItem{title:line.title,image:line.image,detail:line.detail,url:line.url,openTypeList:line.openTypeList,baseInfoKeyList:line.baseInfoKeyList,baseInfoValueList:line.baseInfoValueList})  

结果:Added 36892 labels, created 36892 nodes, set 258244 properties, completed after 7007 ms.

// 对titile属性添加UNIQUE(唯一约束/索引)
// 创建索引
CREATE CONSTRAINT ON (c:HudongItem)
ASSERT c.title IS UNIQUE

Result: Added 1 constraint, completed after 1715 ms.
Show some pictures:
Insert picture description here

3.2. Import node NewNode data

Enter /wikidataSpider/wikidataProcessing, and put three files new_node.csv, wikidata_relation.csv, wikidata_relation2.csv into the import folder of neo4j

Part of the data is:

title,lable
药物治疗,newNode
膳食纤维,newNode
Boven Merwede,newNode
亚美尼亚苏维埃百科全书,newNode
Linge,newNode
男性人名,newNode
爱沙尼亚语,newNode
Bishop Creek,newNode
Category:迷你电脑,newNode
1430年代,newNode
荷兰,newNode
氢,newNode
// 导入新的节点
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///new_node.csv" AS line
CREATE (:NewNode { title: line.title })

结果:Added 96670 labels, created 96670 nodes, set 96670 properties, completed after 5508 ms.

//添加索引
CREATE CONSTRAINT ON (c:NewNode)
ASSERT c.title IS UNIQUE

Result: Added 1 constraint, completed after 1003 ms.
Part of the data graph is as follows:
Insert picture description here

3.3. Import relational data

Import the RELATION
part data of the relationship between the hudongItem node and the NewNode node :

HudongItem,relation,NewNode
菊糖,instance of,药物治疗
菊糖,subclass of,膳食纤维
瓦尔,mouth of the watercourse,Boven Merwede
菊糖,described by source,亚美尼亚苏维埃百科全书
瓦尔,tributary,Linge
Arnold,instance of,男性人名
Arnold,language of work or name,爱沙尼亚语
Bishop,named after,Bishop Creek
小型计算机,topic's main category,Category:迷你电脑
1430年,part of,1430年代
瓦尔,country,荷兰
//导入hudongItem和NewNode之间的关系RELATION 
USING PERIODIC COMMIT 1000
LOAD CSV  WITH HEADERS FROM "file:///wikidata_relation2.csv" AS line
MATCH (entity1:HudongItem{title:line.HudongItem}) , (entity2:NewNode{title:line.NewNode})
CREATE (entity1)-[:RELATION { type: line.relation }]->(entity2)
结果:Set 166059 properties, created 166059 relationships, completed after 15865 ms.

The relationship between HudongItem and HudongItem2, part of the data is:

HudongItem1,relation,HudongItem2
菊糖,instance of,化合物
菊糖,instance of,多糖
瓦尔,instance of,河流
菊糖,subclass of,食物
瓦尔,origin of the watercourse,莱茵河
纳木错,instance of,湖泊
纳木错,basin country,中华人民共和国
菊糖,has part,氧
温县,instance of,县
USING PERIODIC COMMIT 1000
LOAD CSV  WITH HEADERS FROM "file:///wikidata_relation.csv" AS line
MATCH (entity1:HudongItem{title:line.HudongItem1}) , (entity2:HudongItem{title:line.HudongItem2})
CREATE (entity1)-[:RELATION { type: line.relation }]->(entity2)
结果:Set 58958 properties, created 58958 relationships, completed after 5937 ms.

Import the relationship between entities and attributes, add them to the relationship RELATION,
put attributes.csv in the import directory of neo4j

Part of the data is:

Entity,AttributeName,Attribute
密度板,别名,纤维板
葡萄蔓枯病,主要为害部位,枝蔓
坎德拉,性别,男
坎德拉,国籍,法国
坎德拉,场上位置,后卫
转子莲,界,植物界
贝叶斯,出生地,伦敦
贝叶斯,国籍,英国
丁加,国籍,巴西
丁加,运动项目,足球
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:HudongItem{title:line.Entity}), (entity2:HudongItem{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2);
结果:Set 73391 properties, created 73405 relationships, completed after 7113 ms.

USING PERIODIC COMMIT 1000                                                            
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:HudongItem{title:line.Entity}), (entity2:NewNode{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2);
结果:Set 11747 properties, created 11748 relationships, completed after 4101 ms.

USING PERIODIC COMMIT 1000                                                            
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:NewNode{title:line.Entity}), (entity2:NewNode{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2);
结果:Set 271 properties, created 271 relationships, completed after 2563 ms.

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:NewNode{title:line.Entity}), (entity2:HudongItem{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2)
结果:Set 1464 properties, created 1464 relationships, completed after 2571 ms.

Part of the RELATION relationship diagram:
Insert picture description here

3.4. Import node Weather data

Place wikidataSpider/weatherData/static_weather_list.csv in the specified location (under the import folder)

Part of the data is as follows:

title
亚热带季风气候
亚热带海洋性季风气候
暖温带半湿润性季风气候
海洋性温带暖热性气候
暖温带大陆性季风气候
中温带大陆性季风气候
中温带湿润气候
亚热带季风性湿润气候
大陆高原气候
亚热带湿润温和型气
//导入节点
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///static_weather_list.csv" AS line
MERGE (:Weather { title: line.title })
结果:Added 144 labels, created 144 nodes, set 144 properties, completed after 346 ms.

//添加索引
CREATE CONSTRAINT ON (c:Weather)
ASSERT c.title IS UNIQUE
结果:Added 1 constraint, completed after 322 ms.

Show part of the picture:
Insert picture description here

3.5. Import relational data

Import the relationship between Weather node and HudongItem node (plant) Weather2Plant
part of the data is as follows:

Weather,relation,Plant
亚热带季风气候,适合种植,冬寒
亚热带大陆性季风气候,适合种植,冬寒
亚热带季风气候为主,适合种植,冬寒
中亚热带季风气候区,适合种植,冬寒
北亚热带季风气候区,适合种植,冬寒
中亚热带季风气候,适合种植,冬寒
北亚热带季风气候,适合种植,冬寒
//将wikidataSpider/weatherData/weather_plant.csv放在指定的位置(import文件夹下)
//导入hudongItem和新加入节点之间的关系
USING PERIODIC COMMIT 1000
LOAD CSV  WITH HEADERS FROM "file:///weather_plant.csv" AS line
MATCH (entity1:Weather{title:line.Weather}) , (entity2:HudongItem{title:line.Plant})
CREATE (entity1)-[:Weather2Plant { type: line.relation }]->(entity2)

Part of the Weather2Plant diagram:
Insert picture description here

Import the relationship between city (city) and weather node.
Part of the data is as follows:

city,relation,weather
珠海市,气候,亚热带季风气候
莆田市,气候,亚热带海洋性季风气候
日照市,气候,暖温带半湿润性季风气候
大连市,气候,海洋性温带暖热性气候
迁安市,气候,暖温带大陆性季风气候
深圳市,气候,亚热带海洋性季风气候
乌兰察布市,气候,中温带大陆性季风气候
桦甸市,气候,中温带湿润气候
漳州市,气候,亚热带季风性湿润气候
//导入城市的气候
//将city_weather.csv放在指定的位置(import 文件夹下)
//(这步大约需要15分钟左右)
//导入城市对应的气候
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///city_weather.csv" AS line
MATCH (city{title:line.city}) , (weather{title:line.weather})
CREATE (city)-[:CityWeather { type: line.relation }]->(weather)
结果:Set 408 properties, created 408 relationships, completed after 261938 ms.

Part of the CityWeather relationship diagram:
Insert picture description here

4. Modify Neo4j user

Enter demo/Model/neo_models.py, modify the neo4j account password on line 9 to your own

5. Start the django service

Since I installed Bash, I can enter the demo directory and run the script:

sh django_server_start.sh

You can also directly enter the directory where the demo is located and run the django server:

python manage.py runserver

The results of the operation are as follows:
Insert picture description here
an error occurred:

You have 17 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.

Reference: https://blog.csdn.net/xufeng0991/article/details/40421857
Run command:

python manage.py migrate

The display is as follows:
Insert picture description here
Start the server again and the display is as follows:
Insert picture description here

6. Basic functions

Entity recognition

Enter the text as follows:

袁隆平是杂交水稻研究领域的开创者和带头人,致力于杂交水稻的研究,先后成功研发出“三系法”杂交 水稻、“两系法”杂交水稻、超级杂交稻一期、二期,与此同时,袁隆平提出并实施“种三产四丰产工程”,运用超级杂交稻的技术成果,出版中、英文专著6部,发表论文60余篇。2017年7月,任青岛海水稻学院首席教授。2017年9月,袁隆平宣布一项剔除水稻中重金属镉的新成果。2018年4月14日,袁隆平在海南接受凤凰财经采访时发表了对转基因的看法。对于转基因大豆,袁隆平指出,只要是通过安全检测的转基因作物,都是没有问题的。袁隆平表示,转基因是农业的未来发展方向。

You can view the effect of entity recognition and word segmentation:
Insert picture description here
click on the related entity to display the entity hyperlink:
Insert picture description here
Insert picture description here
there are many more. After the project is deployed, you can learn it by yourself

Guess you like

Origin blog.csdn.net/weixin_41104835/article/details/89213327