background
This is a Demo project that I did when researching knowledge graphs and recommendations in September last year. It originated from finding an open source project of knowledge graphs for the automotive industry on github . I mainly modified it to turn it into a recommendation system for film and television dramas based on knowledge graphs.
surroundings
python3, flask front-end framework, graph database neo4j (3.3.1)
Operating system is windows10
Project framework
After the above car project is cloned, the entire project structure is shown in the figure below
There are two project versions, the first acceptance and the second acceptance. The main difference between the two is that they use different databases. The former uses mysql and the latter uses neo4j. I mainly made the transformation based on the second acceptance. Open the project for the second acceptance, the structure inside is as shown in the figure below
Process analysis
Below, we will analyze the work process of the original project step by step, because only in this way can we complete its transformation.
Data reading and insertion
First of all, we definitely need to insert the data into neo4j, then we have to start neo4j, open cmd, and enter the following command
neo4j console
Then if cmd displays the following message, neo4j will be started
The available address http://localhost:7474 displayed in the last line is the address where we visit neo4j, open the browser, copy this address into the address bar, press enter, and you will see the neo4j console interface, as shown below Shown
After the database is started, you can open the kg\kg.py file in the project. In it, the main code is as follows
def data_init(self):
# 连接图数据库
print('开始数据预处理')
self.graph = Graph('http://localhost:7474', user="neo4j", password="szc")
self.selector = NodeSelector(self.graph)
self.graph.delete_all()
def insert_datas(self):
print('开始插入数据')
with open('../data/tuples/three_tuples_2.txt', 'r', encoding='utf-8') as f:
lines, num = f.readlines(), -1
for line in lines:
num += 1
if num % 500 == 0:
print('当前处理进度:{}/{}'.format(lines.index(line), len(lines)))
line = line.strip().split(' ')
if len(line) != 3:
print('insert_datas错误:', line)
continue
self.insert_one_data(line)
def insert_one_data(self, line):
if '' in line:
print('insert_one_data错误', line)
return
start = self.look_and_create(line[0])
for name in self.get_items(line[2]):
end = self.look_and_create(name)
r = Relationship(start, line[1], end, name=line[1])
self.graph.create(r) # 当存在时不会创建新的
# 查找节点是否不存,不存在就创建一个
def look_and_create(self, name):
end = self.graph.find_one(label="car_industry", property_key="name", property_value=name)
if end == None:
end = Node('car_industry', name=name)
return end
def get_items(self, line):
if '{' not in line and '}' not in line:
return [line]
# 检查
if '{' not in line or '}' not in line:
print('get_items Error', line)
lines = [w[1:-1] for w in re.findall('{.*?}', line)]
return lines
The data_init() function at the top is used to connect to the neo4j database. Just input the database address, user name, and password. Then call the graph.delete_all() function. Before inserting the data, clear the original data. In this step, you should consider whether to keep it according to your own business scenarios.
Then there is the insert_datas() function. This function reads the txt file, traverses each line, calls the insert_one_data() function for each line, analyzes each line, and creates nodes and relationships. According to the code, it can be found that the data in each row is in the form of "start-point relationship end-point", such as "Anyang location in Yubei", which means that the relationship between entity Anyang and entity Yubei is location, and the order is Anyang --> location- -> Yubei.
When the insert_one_data() function is called, it will first query whether there is a node with the same name in the database, and decide whether to reuse the existing one or build a new one according to the result. This process corresponds to the function look_and_create().
In the function look_and_create(), "car_industry" is the label of the database (I understand it corresponds to the name of each database in Mysql, whichever is used, call the command use database some_database), and then in the find_one() function, the value of property_name corresponds to When creating a node, the parameter name of Node's constructor is name, and property_value is the name parameter value of Node's constructor, which is the name of the entity. Take my hometown-Anyang City entity as an example, its storage structure in neo4j can be understood as {property_name: "name", property_value: "Anyang"}.
The final get_items() function is to check the legality of the entity, without too much interpretation.
Run the service
After all the data is inserted into the database, we can run our service, the file corresponds to run_server.py, the code inside is as follows
if __name__ == '__main__':
args=get_args()
print('\nhttp_host:{},http_port:{}'.format('localhost',args.http_port))
app.run(debug=True, host='210.41.97.169', port=8090)
In fact, the key is the app.run() function, just replace the Ip and port in it with yourself.
Handling page requests
Our business logic is: enter the url and parameters in the browser to get the relevant results.
Among them, the process of processing our parameters corresponds to the file views.py, and the main code inside is as follows
@app.route('/KnowGraph/v2',methods=["POST"])
def look_up():
kg=KnowGraph(get_args())
client_params=request.get_json(force=True)
server_param={}
if client_params['method'] == 'entry_to_entry':
kg.lookup_entry2entry(client_params,server_param)
elif client_params['method'] == 'entry_to_property':
kg.lookup_entry2property(client_params,server_param)
elif client_params['method'] == 'entry':
kg.lookup_entry(client_params,server_param)
elif client_params['method'] == 'statistics':
kg.lookup_statistics(client_params,server_param)
elif client_params['method'] == 'live':
params={'success':'true'}
server_param['result']=params
server_param['id']=client_params['id']
server_param['jsonrpc']=client_params['jsonrpc']
server_param['method']=client_params['method']
print(server_param)
return json.dumps(server_param, ensure_ascii=False).encode("utf-8")
As you can see, the post method of the /KnowGraph/v2 path will be routed to the look_up function, which calls different functions of the kg object according to the value of the parameter method to execute different query logic.
However, after we enter the path and parameters in the browser and press Enter, we want to obtain the database information, which is obviously the corresponding get method. Moreover, the routing of data to flask templates is not written, so we need to make major changes to this file.
data query
I just mentioned that the views.py file will call different functions of the kg object to obtain different results according to the value of the parameter method.
The KnowledgeGraph class to which the kg object belongs is in the file modules.py. Take the simplest and most basic entity query as an example, let’s see how it is implemented. This corresponds to the lookup_entry function. The code is as follows
def lookup_entry(self,client_params,server_param):
#支持设定网络查找的深度
start_time = time.time()
params=client_params["params"]
edges=set()
self.lookup_entry_deep(edges,params,0)
if len(edges)==0:
server_param['result']={"success":'false'}
else:
server_param['result']={'edges':[list(i) for i in edges],"success":'true'}
print('本次查找三元组的数量为:{},耗时:{}s'.format(len(edges),time.time()-start_time))
In addition to timing, the params in the client parameters are mainly taken out, which contains the entity name and search depth to be searched, and then the lookup_entry_deep function is called to search, the result is saved in the edges collection, and finally each item in the edges collection is used as a list Each item of the list is stored in the'edges' in the'results' item of server_params and returned.
Next, let's take a look at the implementation of the lookup_entry_deep function, the code is as follows
def lookup_entry_deep(self,edges,params,deep):
#当前查找深度不得等于要求的深度
if deep >= params['deep']:
return
#正向查找
result1=self.graph.data("match (s)-[r]->(e) where s.name='{}' return s.name,r.name,e.name".format(params['name']))
result2=self.graph.data("match (e)<-[r]-(s) where e.name='{}' return s.name,r.name,e.name".format(params['name']))
if len(result1)==0 and len(result2)==0:
return
for item in result1:
edges.add((item['s.name'],item['r.name'],item['e.name']))
if item['s.name'] != item['e.name']:#避免出现:双面胶:中文名:双面胶的死循环
params['name']=item['e.name']
self.lookup_entry_deep(edges,params.copy(),deep+1)
for item in result2:
edges.add((item['s.name'],item['r.name'],item['e.name']))
if item['s.name'] != item['e.name']:#避免出现:双面胶:中文名:双面胶的死循环
params['name']=item['e.name']
self.lookup_entry_deep(edges,params.copy(),deep+1)
First of all, if the depth exceeds the limit, just return directly. Then, for the name item in params, which is the name of the entity to be searched, perform forward and reverse queries in the database, and then save each item as a tuple in the edges collection, and call this function recursively. At the same time Depth +1
Retrofit
The existing process is as mentioned above. Next, we will revamp the recommended business scenarios for film and television dramas.
Assuming that a user has watched the TV series "Admiral XXX", we can recommend movies and TV series that may be of interest to him based on the director, actors, location, language, and genre tags.
Data Format
Our files are all stored in the wiki directory. They are all txt files, and each line is json, and one line is as follows
{
.....
"title": "上将XXX",
"wikiData": {
.....
"wikiInfo": {
"country": "中国大陆",
"language": "普通话",
"directors": [
"安澜"
],
"actors": [
"宋春丽",
"王伍福",
"张秋歌",
"范明",
"刘劲",
"陶慧敏",
"侯勇"
],
....
},
....
"wikiTags": [
"电视剧",
"历史",
"战争",
"军旅",
"革命",
"动作",
"热血",
"激昂",
"24-36",
"36-45",
"45-55",
"55-70",
"上星剧",
"传记"
]
}
}
The useful information in it is formatted as shown above, such as directors and actors.
Next, we can carry out transformation according to the process that was sorted out when analyzing the project
Data reading and insertion
This corresponds to the kg.py file, first define our directory path
data_dir = "C:\\Users\\songzeceng\\Desktop\\wiki\\"
Then traverse the files in this directory, read and parse each file, the code is as follows
def insert_data_from_txt(self, file_path):
try:
with open(file=file_path, mode="r", encoding="utf-8") as f:
for line in f.readlines():
item = json.loads(line)
if 'title' not in item.keys():
continue
title = self.look_and_create(item['title'])
if 'wikiData' not in item.keys():
continue
wikiData = item['wikiData']
if 'wikiDesc' in wikiData.keys():
wikiDesc = self.look_and_create(wikiData['wikiDesc'])
self.create_sub_graph(entity1=title, entity2=wikiDesc, relation="desc")
if 'wikiTags' in wikiData.keys():
for tag in wikiData['wikiTags']:
tag = self.look_and_create(tag)
self.create_sub_graph(entity1=title, entity2=tag, relation="tag")
wikiInfo = wikiData['wikiInfo']
if 'country' in wikiInfo.keys():
country = self.look_and_create(wikiInfo['country'])
self.create_sub_graph(entity1=title, entity2=country, relation="country")
if 'language' in wikiInfo.keys():
language = self.look_and_create(wikiInfo['language'])
self.create_sub_graph(entity1=title, entity2=language, relation="language")
if 'actors' in wikiInfo.keys():
for actor in wikiInfo['actors']:
actor = self.look_and_create(actor)
self.create_sub_graph(entity1=title, entity2=actor, relation="actor")
if 'directors' in wikiInfo.keys():
for director in wikiInfo['directors']:
actor = self.look_and_create(director)
self.create_sub_graph(entity1=title, entity2=actor, relation="director")
print(file_path, "读取完毕")
except Exception as e:
print("文件" + file_path + "读取异常:" + str(e))
pass
Looking long, it is actually parsing each item, first finding or creating an entity, corresponding to the function look_and_create. Since my py2neo version is different from the original project, I have rewritten this function, the code is as follows
def look_and_create(self, name):
matcher = NodeMatcher(self.graph)
end = matcher.match("car_industry", name=name).first()
if end == None:
end = Node('car_industry', name=name)
return end
Then create the entity relationship, corresponding to the function create_sub_graph, the code is as follows
def create_sub_graph(self, entity1, relation, entity2):
r = Relationship(entity1, relation, entity2, name=relation)
self.graph.create(r)
The entire kg file code is as follows
# coding:utf-8
'''
Created on 2018年1月26日
@author: qiujiahao
@email:[email protected]
'''
import sys
import re
import os
sys.path.append('..')
from conf import get_args
from py2neo import Node, Relationship, Graph, NodeMatcher
import pandas as pd
import json
import os
data_dir = "C:\\Users\\songzeceng\\Desktop\\wiki\\"
class data(object):
def __init__(self):
self.args = get_args()
self.data_process()
def data_process(self):
# 初始化操 # 插入数据
self.data_init()
print("数据预处理完毕")
def data_init(self):
# 连接图数据库
print('开始数据预处理')
self.graph = Graph('http://localhost:7474', user="neo4j", password="szc")
# self.graph.delete_all()
file_names = os.listdir(data_dir)
for file_name in file_names:
self.insert_data_from_txt(data_dir + file_name)
def insert_data_from_txt(self, file_path):
try:
with open(file=file_path, mode="r", encoding="utf-8") as f:
for line in f.readlines():
item = json.loads(line)
if 'title' not in item.keys():
continue
title = self.look_and_create(item['title'])
# id = self.look_and_create(item['id'])
#
# self.create_sub_graph(entity1=title, entity2=id, relation="title")
if 'wikiData' not in item.keys():
continue
wikiData = item['wikiData']
if 'wikiDesc' in wikiData.keys():
wikiDesc = self.look_and_create(wikiData['wikiDesc'])
self.create_sub_graph(entity1=title, entity2=wikiDesc, relation="desc")
if 'wikiTags' in wikiData.keys():
for tag in wikiData['wikiTags']:
tag = self.look_and_create(tag)
self.create_sub_graph(entity1=title, entity2=tag, relation="tag")
wikiInfo = wikiData['wikiInfo']
if 'country' in wikiInfo.keys():
country = self.look_and_create(wikiInfo['country'])
self.create_sub_graph(entity1=title, entity2=country, relation="country")
if 'language' in wikiInfo.keys():
language = self.look_and_create(wikiInfo['language'])
self.create_sub_graph(entity1=title, entity2=language, relation="language")
if 'actors' in wikiInfo.keys():
for actor in wikiInfo['actors']:
actor = self.look_and_create(actor)
self.create_sub_graph(entity1=title, entity2=actor, relation="actor")
if 'directors' in wikiInfo.keys():
for director in wikiInfo['directors']:
actor = self.look_and_create(director)
self.create_sub_graph(entity1=title, entity2=actor, relation="director")
print(file_path, "读取完毕")
except Exception as e:
print("文件" + file_path + "读取异常:" + str(e))
pass
def create_sub_graph(self, entity1, relation, entity2):
r = Relationship(entity1, relation, entity2, name=relation)
self.graph.create(r)
def look_and_create(self, name):
matcher = NodeMatcher(self.graph)
end = matcher.match("car_industry", name=name).first()
if end == None:
end = Node('car_industry', name=name)
return end
if __name__ == '__main__':
data = data()
Run it, the command line output is as shown in the figure below
The data is not standardized and many files cannot be read. No matter what, it is a demo anyway. Then, take 25 pieces of data in the neo4j database, and the result is shown in the figure below
Run the service
Here you can directly change the ip and port in run_server.py to your own
Processing request
This step corresponds to views.py.
First, we need to intercept the get request of the /KnowGraph/v2 path, so we need to add an annotation function, as shown below
@app.route('/KnowGraph/v2', methods=["GET"])
def getInfoFromServer():
pass
Then just implement this function, first process the request parameters, our request complete url is like this
http://localhost:8090/KnowGraph/v2?method=entry&jsonrpc=2.0&id=1¶ms=entry=上将许世友-deep=2
There are many parameters, and many of them are fixed, such as jsonrpc, id, etc., so I simplified them to
http://localhost:8090/KnowGraph/v2?name=上将许世友
Then in the getInfoFromServer() function, add all the default parameters, the code is as follows
def handle_args(originArgs):
if 'name' not in originArgs.keys():
return None
args = {}
for item in originArgs:
key = item
value = originArgs[key]
if key == "params":
kvs = str(value).split("-")
kv_dic = {}
for item in kvs:
kv = item.split("=")
k = kv[0]
v = kv[1]
if v.isnumeric():
kv_dic[k] = int(v)
else:
kv_dic[k] = v
args[key] = kv_dic
else:
if value.isnumeric():
args[key] = int(value)
else:
args[key] = value
if 'params' not in args.keys():
args['params'] = {
'name': args['name']
}
args.pop('name')
args['params']['name'] = args['params']['name'].replace('\'', '\\\'')
if 'method' not in args.keys():
args['method'] = 'entry'
if 'deep' not in args['params'].keys():
args['params']['deep'] = 2
if 'jsonrpc' not in args.keys():
args['jsonrpc'] = 2.0
if 'id' not in args.keys():
args['id'] = 1
return args
In fact, it is mainly traversal and filling operations
After the parameters are processed, we can perform different query operations based on the method field in the parameter, and then get the result from the result field of server_param, and hand it to the front end to render the page. Therefore, you can write the getInfoFromServer() function code as follows
@app.route('/KnowGraph/v2', methods=["GET"])
def getInfoFromServer():
args = handle_args(request.args.to_dict())
kg = KnowGraph(args)
client_params = args
server_param = {}
if client_params['method'] == 'entry':
kg.lookup_entry(client_params, server_param)
server_param['id'] = client_params['id']
server_param['jsonrpc'] = client_params['jsonrpc']
server_param['method'] = client_params['method']
print("server_param:\n", server_param)
global mydata
if 'result' in server_param.keys():
mydata = server_param['result']
else:
mydata = '{}'
print("mydata:\n", mydata)
return render_template("index.html")
Here we only deal with the query of the entity, because our input is the name of a movie and TV series watched by the user.
When rendering the interface, the data will be obtained through the /KnowGraph/data path, so to intercept it, the code is as follows
@app.route("/KnowGraph/data")
def data():
print("data:", data)
return mydata
The entire views.py file is as follows
# coding:utf-8
'''
Created on 2018年1月9日
@author: qiujiahao
@email:[email protected]
'''
from flask import jsonify
from conf import *
from flask import Flask
from flask import request, render_template
from server.app import app
import tensorflow as tf
from server.module import KnowGraph
import json
mydata = ""
# http://210.41.97.89:8090/KnowGraph/v2?name=胜利之路
# http://113.54.234.209:8090/KnowGraph/v2?name=孤战
# http://localhost:8090/KnowGraph/v2?method=entry_to_property&jsonrpc=2.0&id=1¶ms=entry=水冶-property=位置
@app.route('/KnowGraph/v2', methods=["GET"])
def getInfoFromServer():
args = handle_args(request.args.to_dict())
kg = KnowGraph(args)
client_params = args
server_param = {}
if client_params['method'] == 'entry':
kg.lookup_entry(client_params, server_param)
server_param['id'] = client_params['id']
server_param['jsonrpc'] = client_params['jsonrpc']
server_param['method'] = client_params['method']
print("server_param:\n", server_param)
global mydata
if 'result' in server_param.keys():
mydata = server_param['result']
else:
mydata = '{}'
print("mydata:\n", mydata)
return render_template("index.html")
def handle_args(originArgs):
if 'name' not in originArgs.keys():
return None
args = {}
for item in originArgs:
key = item
value = originArgs[key]
if key == "params":
kvs = str(value).split("-")
kv_dic = {}
for item in kvs:
kv = item.split("=")
k = kv[0]
v = kv[1]
if v.isnumeric():
kv_dic[k] = int(v)
else:
kv_dic[k] = v
args[key] = kv_dic
else:
if value.isnumeric():
args[key] = int(value)
else:
args[key] = value
if 'params' not in args.keys():
args['params'] = {
'name': args['name']
}
args.pop('name')
args['params']['name'] = args['params']['name'].replace('\'', '\\\'')
if 'method' not in args.keys():
args['method'] = 'entry'
if 'deep' not in args['params'].keys():
args['params']['deep'] = 2
if 'jsonrpc' not in args.keys():
args['jsonrpc'] = 2.0
if 'id' not in args.keys():
args['id'] = 1
return args
@app.route("/KnowGraph/data")
def data():
print("data:", data)
return mydata
Database query
Finally, we put our energy into the database query and result analysis in module.py.
In order to facilitate viewing, we put the results in the json file. Therefore, the query results are stored in a dictionary in the memory. Before each query, the dictionary is cleared, and then the query is performed, and then different parsing logic is executed according to whether there is a result. Therefore, you can write the lookup_entry function as shown below
def lookup_entry(self, client_params, server_param):
# 支持设定网络查找的深度
start_time = time.time()
params = client_params["params"]
edges = set()
sim_dict.clear()
self.lookup_entry_deep(edges, params, 0)
if len(edges) == 0:
server_param['success'] = 'false'
else:
self.handleResult(edges, server_param, start_time)
Enquiries on entities are placed in the lookup_entry_deep() function. Generally speaking, our depth is only two levels. The first level is for us to query the various attributes of the user’s film and television series, such as the director of General Xu Shiyou. At the second level, we find the entity corresponding to this attribute according to each attribute, such as query The director of Admiral Xu Shiyou, also the main film and television dramas. Obviously, the first layer is a forward search, and the second layer is a reverse search.
When searching, in order to avoid recommending to the user the movie and TV series he has just watched, we also need to de-duplicate the results. For example, we search for Admiral XXX. When we find that the director of Admiral XXX is An Lan, and then perform a reverse search on An Lan, if we find that An Lan only directed the work of Admiral XXX, then we don’t have to. No, add Admiral Xu Shiyou to the recommended list.
In response to the above situation where no other entities have been found, I defined this return result as'nothing else'; if nothing is found, it is'nothing got'; if the depth exceeds the standard, it is'deep out'; everything is normal , It is'ok'.
We first perform a two-way query, the code is as follows
result1 = self.graph.run(cypher='''match (s)-[r]->(e) where s.name='{}'
return s.name,r.name,e.name'''.format(params['name'])).data()
result2 = self.graph.run(cypher='''match (e)<-[r]-(s) where e.name='{}'
return s.name,r.name,e.name '''.format(params['name'])).data()
Then the two results are judged to be empty, and if the length is 0, it returns'nothing got'
if len(result1) == 0 and len(result2) == 0:
return 'nothing got'
If result2 (that is, the result of the reverse search) has only one item, the s.name (that is, the name of the movie and TV play) in this item is still the input entity name, and the e.name (that is, the attribute name) is still the original attribute name. Then return directly to'nothing else'
if len(result2) == 1:
item = result2[0]
if origin_tv_name is not None and origin_property_name is not None:
if origin_property_name == item['e.name'] and origin_tv_name == item['s.name']:
return 'nothing else'
The origin_tv_name and origin_property_name here are both one of the parameters of the lookup_entry_deep function, and the default is None
Then we first traverse the result1 of the forward query, concatenate the attribute value (e.name), attribute name (r.name) and the original film and television drama (s.name) inside, and save them as a triple in the edges collection.
for item in result1:
tv_name = item['s.name']
property_name = item['e.name']
has_result = False
if tv_name != property_name: # 避免出现:双面胶:中文名:双面胶的死循环
if oldName != property_name:
params['name'] = property_name
has_result = self.lookup_entry_deep(edges, params.copy(), deep + 1,
origin_tv_name=tv_name,
origin_property_name=property_name)
oldName is the name of the entity in this query. In order to avoid an endless loop, a judgment has been added. In fact, in our scenario, this judgment must be established.
Next, we analyze the results of the reverse search. If a new film and television drama is found, the similarity of this relationship is first based on the relationship between the new film and television drama and its attributes. Then, add the new movie and TV series, the same attribute name, and the similarity to the similar dictionary and edges collection in a cumulative or new way. The code is as follows
for item in result2:
tv_name = item['s.name']
property_name = item['e.name']
relation_name = item['r.name']
if tv_name != origin_tv_name:
score = get_sim_score_accroding_to_relation(relation_name)
if tv_name not in sim_dict.keys():
sim_dict[tv_name] = {
relation_name: [property_name],
"similarity": score
}
else:
item_dict = sim_dict[tv_name]
if relation_name in item_dict.keys() and \
property_name in item_dict.values():
continue
if relation_name in item_dict.keys():
item_dict[relation_name].append(property_name)
else:
item_dict[relation_name] = [property_name]
item_dict["similarity"] += score
edges.add((tv_name, relation_name, property_name))
Among them, the function get_sim_score_accroding_to_relation() code for obtaining similarity according to the relationship is as follows
def get_sim_score_accroding_to_relation(relation_name):
if relation_name in ['actor', 'director', 'tag']:
return 1.0
elif relation_name in ['language', 'country']:
return 0.5
return 0.0
The complete lookup_entry_deep() function is shown below
# 限制深度的查找
def lookup_entry_deep(self, edges, params, deep, origin_tv_name=None, origin_property_name=None):
# 当前查找深度不得等于要求的深度
if deep >= params['deep']:
return 'deep out'
# 正向查找
oldName = str(params['name'])
if oldName.__contains__("\'") and not oldName.__contains__("\\\'"):
params['name'] = oldName.replace("\'", "\\\'")
result1 = self.graph.run(cypher='''match (s)-[r]->(e) where s.name='{}'
return s.name,r.name,e.name'''.format(params['name'])).data()
result2 = self.graph.run(cypher='''match (e)<-[r]-(s) where e.name='{}'
return s.name,r.name,e.name '''.format(params['name'])).data()
if len(result1) == 0 and len(result2) == 0:
return 'nothing got'
if len(result2) == 1:
item = result2[0]
if origin_tv_name is not None and origin_property_name is not None:
if origin_property_name == item['e.name'] and origin_tv_name == item['s.name']:
return 'nothing else'
for item in result1:
tv_name = item['s.name']
property_name = item['e.name']
if tv_name != property_name: # 避免出现:双面胶:中文名:双面胶的死循环
if oldName != property_name:
params['name'] = property_name
has_result = self.lookup_entry_deep(edges, params.copy(), deep + 1,
origin_tv_name=tv_name,
origin_property_name=property_name)
for item in result2:
has_result = False
tv_name = item['s.name']
property_name = item['e.name']
relation_name = item['r.name']
if tv_name != origin_tv_name:
score = get_sim_score_accroding_to_relation(relation_name)
if tv_name not in sim_dict.keys():
sim_dict[tv_name] = {
relation_name: [property_name],
"similarity": score
}
else:
item_dict = sim_dict[tv_name]
if relation_name in item_dict.keys() and \
property_name in item_dict.values():
continue
if relation_name in item_dict.keys():
item_dict[relation_name].append(property_name)
else:
item_dict[relation_name] = [property_name]
item_dict["similarity"] += score
edges.add((tv_name, relation_name, property_name))
return 'ok'
When the query is completed, if there is a result, we will go to the handle_result() function to process the result, return or output. It is mainly based on the similarity to sort from high to low, and then take out the top 20 and write them into the json file. This part of the code is as follows
def handleResult(self, edges, server_param, start_time):
....
sorted_sim_list = sorted(sim_dict.items(), key=lambda x: x[1]['similarity'], reverse=True)
ret = {}
for i in range(len(sorted_sim_list)):
if i >= 20:
break
ret[sorted_sim_list[i][0]] = sorted_sim_list[i][1]
mydata = json.dumps(ret, ensure_ascii=False)
print('Json路径是:%s' % (fname))
self.clear_and_write_file(fname, mydata)
def clear_and_write_file(self, fname, mydata):
with open(fname, 'w', encoding='utf-8') as f:
f.write(str(""))
with open(fname, 'a', encoding='utf-8') as f:
f.write(str(mydata))
In addition, I also stored the results in server_param, used to output the results to the front-end interface, this part of the code is as follows
ret = []
for result in edges:
ret.append({
"source": result[0],
"target": result[2],
"relation": result[1],
"label": "relation"
})
print("ret:", ret)
server_param['result'] = {"edges": ret}
server_param['success'] = 'true'
print('本次查找三元组的数量为:{},耗时:{}s'.format(len(ret), time.time() - start_time))
The code of the complete result processing function is as follows
def handleResult(self, edges, server_param, start_time):
ret = []
for result in edges:
ret.append({
"source": result[0],
"target": result[2],
"relation": result[1],
"label": "relation"
})
print("ret:", ret)
server_param['result'] = {"edges": ret}
server_param['success'] = 'true'
print('本次查找三元组的数量为:{},耗时:{}s'.format(len(ret), time.time() - start_time))
sorted_sim_list = sorted(sim_dict.items(), key=lambda x: x[1]['similarity'], reverse=True)
ret = {}
for i in range(len(sorted_sim_list)):
if i >= 20:
break
ret[sorted_sim_list[i][0]] = sorted_sim_list[i][1]
mydata = json.dumps(ret, ensure_ascii=False)
print('Json路径是:%s' % (fname))
self.clear_and_write_file(fname, mydata)
operation result
First start the service, run run_server.py, and then in the browser address bar, enter the following url (XXX is the name entered):
http://210.41.97.169:8090/KnowGraph/v2?name=XXX
Then the page output is as follows
The results are very complicated, let's look at the output of the first 20 in the json file, the results are as follows
{
"XXX元帅": {
"actor": [
"侯勇",
"刘劲"
],
"similarity": 14.0,
"language": [
"普通话"
],
"country": [
"中国大陆"
],
"tag": [
"传记",
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"热血",
"革命",
"战争",
"历史",
"电视剧"
]
},
"BBB": {
"actor": [
"刘劲",
"王伍福"
],
"similarity": 14.0,
"language": [
"普通话"
],
"country": [
"中国大陆"
],
"tag": [
"传记",
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"热血",
"革命",
"战争",
"历史",
"电视剧"
]
},
"长征大会师": {
"actor": [
"刘劲",
"王伍福"
],
"similarity": 14.0,
"language": [
"普通话"
],
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"革命",
"战争",
"历史",
"电视剧"
]
},
"战将": {
"language": [
"普通话"
],
"similarity": 13.0,
"country": [
"中国大陆"
],
"tag": [
"传记",
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"热血",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"炮神": {
"language": [
"普通话"
],
"similarity": 13.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"动作",
"革命",
"军旅",
"战争",
"历史",
"电视剧"
]
},
"独立纵队": {
"language": [
"普通话"
],
"similarity": 13.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"女子军魂": {
"language": [
"普通话"
],
"similarity": 13.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"革命",
"军旅",
"战争",
"历史",
"电视剧"
]
},
"热血军旗": {
"actor": [
"侯勇"
],
"similarity": 12.0,
"language": [
"普通话"
],
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"热血",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"擒狼": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"信者无敌": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"革命",
"战争",
"历史",
"电视剧"
]
},
"我的抗战之猎豹突击": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"革命",
"战争",
"历史",
"电视剧"
]
},
"魔都风云": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"动作",
"革命",
"战争",
"电视剧"
]
},
"英雄戟之影子战士": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"第一声枪响": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"革命",
"战争",
"历史",
"电视剧"
]
},
"亮剑": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"飞虎队": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"伟大的转折": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"革命",
"战争",
"历史",
"电视剧"
]
},
"太行英雄传": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"45-55",
"36-45",
"24-36",
"激昂",
"热血",
"动作",
"革命",
"战争",
"历史",
"电视剧"
]
},
"雪豹": {
"language": [
"普通话"
],
"similarity": 12.0,
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"55-70",
"45-55",
"36-45",
"24-36",
"激昂",
"革命",
"军旅",
"战争",
"历史",
"电视剧"
]
},
"宜昌保卫战": {
"actor": [
"侯勇"
],
"similarity": 11.0,
"language": [
"普通话"
],
"country": [
"中国大陆"
],
"tag": [
"上星剧",
"45-55",
"36-45",
"24-36",
"激昂",
"革命",
"战争",
"历史",
"电视剧"
]
}
}
The top ones are all movies and TV series that have a high correlation with our input. The similarity and the same attributes are also among them. It seems that the effect is not bad.
Conclusion
This is just a demo to experience the application of the knowledge graph in the recommendation system.
Finally, I would like to thank the original project author again. Without the framework built by his hard work, it would be difficult for me to make the first step of practice.
Give the address of the original project again: https://github.com/qiu997018209/KnowledgeGraph