A simple film and television drama recommendation system based on knowledge graph

background

This is a Demo project that I did when researching knowledge graphs and recommendations in September last year. It originated from finding an open source project of knowledge graphs for the automotive industry on github . I mainly modified it to turn it into a recommendation system for film and television dramas based on knowledge graphs.

surroundings

python3, flask front-end framework, graph database neo4j (3.3.1)

Operating system is windows10

Project framework

After the above car project is cloned, the entire project structure is shown in the figure below

There are two project versions, the first acceptance and the second acceptance. The main difference between the two is that they use different databases. The former uses mysql and the latter uses neo4j. I mainly made the transformation based on the second acceptance. Open the project for the second acceptance, the structure inside is as shown in the figure below

Process analysis

Below, we will analyze the work process of the original project step by step, because only in this way can we complete its transformation.

Data reading and insertion

First of all, we definitely need to insert the data into neo4j, then we have to start neo4j, open cmd, and enter the following command

neo4j console

Then if cmd displays the following message, neo4j will be started

 

The available address http://localhost:7474 displayed in the last line is the address where we visit neo4j, open the browser, copy this address into the address bar, press enter, and you will see the neo4j console interface, as shown below Shown

After the database is started, you can open the kg\kg.py file in the project. In it, the main code is as follows

    def data_init(self):
        # 连接图数据库
        print('开始数据预处理')
        self.graph = Graph('http://localhost:7474', user="neo4j", password="szc")
        self.selector = NodeSelector(self.graph)
        self.graph.delete_all()


    def insert_datas(self):
        print('开始插入数据')
        with open('../data/tuples/three_tuples_2.txt', 'r', encoding='utf-8') as f:
            lines, num = f.readlines(), -1
            for line in lines:
                num += 1
                if num % 500 == 0:
                    print('当前处理进度:{}/{}'.format(lines.index(line), len(lines)))

                line = line.strip().split(' ')
                if len(line) != 3:
                    print('insert_datas错误:', line)
                    continue
                self.insert_one_data(line)

    def insert_one_data(self, line):
        if '' in line:
            print('insert_one_data错误', line)
            return

        start = self.look_and_create(line[0])
        for name in self.get_items(line[2]):
            end = self.look_and_create(name)
            r = Relationship(start, line[1], end, name=line[1])
            self.graph.create(r)  # 当存在时不会创建新的

        # 查找节点是否不存,不存在就创建一个

    def look_and_create(self, name):
        end = self.graph.find_one(label="car_industry", property_key="name", property_value=name)
        if end == None:
            end = Node('car_industry', name=name)
        return end

    def get_items(self, line):
        if '{' not in line and '}' not in line:
            return [line]
        # 检查
        if '{' not in line or '}' not in line:
            print('get_items Error', line)
        lines = [w[1:-1] for w in re.findall('{.*?}', line)]
        return lines

The data_init() function at the top is used to connect to the neo4j database. Just input the database address, user name, and password. Then call the graph.delete_all() function. Before inserting the data, clear the original data. In this step, you should consider whether to keep it according to your own business scenarios.

Then there is the insert_datas() function. This function reads the txt file, traverses each line, calls the insert_one_data() function for each line, analyzes each line, and creates nodes and relationships. According to the code, it can be found that the data in each row is in the form of "start-point relationship end-point", such as "Anyang location in Yubei", which means that the relationship between entity Anyang and entity Yubei is location, and the order is Anyang --> location- -> Yubei.

When the insert_one_data() function is called, it will first query whether there is a node with the same name in the database, and decide whether to reuse the existing one or build a new one according to the result. This process corresponds to the function look_and_create().

In the function look_and_create(), "car_industry" is the label of the database (I understand it corresponds to the name of each database in Mysql, whichever is used, call the command use database some_database), and then in the find_one() function, the value of property_name corresponds to When creating a node, the parameter name of Node's constructor is name, and property_value is the name parameter value of Node's constructor, which is the name of the entity. Take my hometown-Anyang City entity as an example, its storage structure in neo4j can be understood as {property_name: "name", property_value: "Anyang"}.

The final get_items() function is to check the legality of the entity, without too much interpretation.

Run the service

After all the data is inserted into the database, we can run our service, the file corresponds to run_server.py, the code inside is as follows

if __name__ == '__main__':
    args=get_args()
    print('\nhttp_host:{},http_port:{}'.format('localhost',args.http_port))
    app.run(debug=True, host='210.41.97.169', port=8090)

In fact, the key is the app.run() function, just replace the Ip and port in it with yourself.

Handling page requests

Our business logic is: enter the url and parameters in the browser to get the relevant results.

Among them, the process of processing our parameters corresponds to the file views.py, and the main code inside is as follows

@app.route('/KnowGraph/v2',methods=["POST"])
def look_up():
    kg=KnowGraph(get_args())
    client_params=request.get_json(force=True)
    server_param={}
    if client_params['method'] == 'entry_to_entry':
        kg.lookup_entry2entry(client_params,server_param)
    elif client_params['method'] == 'entry_to_property':
        kg.lookup_entry2property(client_params,server_param)
    elif client_params['method'] == 'entry':
        kg.lookup_entry(client_params,server_param)
    elif client_params['method'] == 'statistics':
        kg.lookup_statistics(client_params,server_param)
    elif client_params['method'] == 'live':
        params={'success':'true'}
        server_param['result']=params    
    server_param['id']=client_params['id']
    server_param['jsonrpc']=client_params['jsonrpc']
    server_param['method']=client_params['method']
    print(server_param)
    return json.dumps(server_param, ensure_ascii=False).encode("utf-8")

As you can see, the post method of the /KnowGraph/v2 path will be routed to the look_up function, which calls different functions of the kg object according to the value of the parameter method to execute different query logic.

However, after we enter the path and parameters in the browser and press Enter, we want to obtain the database information, which is obviously the corresponding get method. Moreover, the routing of data to flask templates is not written, so we need to make major changes to this file.

data query

I just mentioned that the views.py file will call different functions of the kg object to obtain different results according to the value of the parameter method.

The KnowledgeGraph class to which the kg object belongs is in the file modules.py. Take the simplest and most basic entity query as an example, let’s see how it is implemented. This corresponds to the lookup_entry function. The code is as follows

    def lookup_entry(self,client_params,server_param):
        #支持设定网络查找的深度
        start_time = time.time()
        params=client_params["params"]
        edges=set()
        self.lookup_entry_deep(edges,params,0)
        if len(edges)==0:
            server_param['result']={"success":'false'}
        else:                
            server_param['result']={'edges':[list(i) for i in edges],"success":'true'}
            print('本次查找三元组的数量为:{},耗时:{}s'.format(len(edges),time.time()-start_time))

In addition to timing, the params in the client parameters are mainly taken out, which contains the entity name and search depth to be searched, and then the lookup_entry_deep function is called to search, the result is saved in the edges collection, and finally each item in the edges collection is used as a list Each item of the list is stored in the'edges' in the'results' item of server_params and returned.

Next, let's take a look at the implementation of the lookup_entry_deep function, the code is as follows

    def lookup_entry_deep(self,edges,params,deep):
        #当前查找深度不得等于要求的深度
        if deep >= params['deep']:
            return
        #正向查找
        result1=self.graph.data("match (s)-[r]->(e) where s.name='{}' return s.name,r.name,e.name".format(params['name']))
        result2=self.graph.data("match (e)<-[r]-(s) where e.name='{}' return s.name,r.name,e.name".format(params['name']))
        if len(result1)==0 and len(result2)==0:
            return
        for item in result1:
            edges.add((item['s.name'],item['r.name'],item['e.name']))
            if  item['s.name'] != item['e.name']:#避免出现:双面胶:中文名:双面胶的死循环
                params['name']=item['e.name']
                self.lookup_entry_deep(edges,params.copy(),deep+1)
 
        for item in result2:
            edges.add((item['s.name'],item['r.name'],item['e.name']))
            if  item['s.name'] != item['e.name']:#避免出现:双面胶:中文名:双面胶的死循环
                params['name']=item['e.name']
                self.lookup_entry_deep(edges,params.copy(),deep+1) 

First of all, if the depth exceeds the limit, just return directly. Then, for the name item in params, which is the name of the entity to be searched, perform forward and reverse queries in the database, and then save each item as a tuple in the edges collection, and call this function recursively. At the same time Depth +1

Retrofit

The existing process is as mentioned above. Next, we will revamp the recommended business scenarios for film and television dramas.

Assuming that a user has watched the TV series "Admiral XXX", we can recommend movies and TV series that may be of interest to him based on the director, actors, location, language, and genre tags.

Data Format

Our files are all stored in the wiki directory. They are all txt files, and each line is json, and one line is as follows

{
    .....  
    "title": "上将XXX", 
    "wikiData": {
        .....
        "wikiInfo": {
            "country": "中国大陆", 
            "language": "普通话", 
            "directors": [
                "安澜"
            ], 
            "actors": [
                "宋春丽", 
                "王伍福", 
                "张秋歌", 
                "范明", 
                "刘劲", 
                "陶慧敏", 
                "侯勇"
            ], 
            ....
        }, 
        ....
        "wikiTags": [
            "电视剧", 
            "历史", 
            "战争", 
            "军旅", 
            "革命", 
            "动作", 
            "热血", 
            "激昂", 
            "24-36", 
            "36-45", 
            "45-55", 
            "55-70", 
            "上星剧", 
            "传记"
        ]
    }
}

The useful information in it is formatted as shown above, such as directors and actors.

Next, we can carry out transformation according to the process that was sorted out when analyzing the project

Data reading and insertion

This corresponds to the kg.py file, first define our directory path

data_dir = "C:\\Users\\songzeceng\\Desktop\\wiki\\"

Then traverse the files in this directory, read and parse each file, the code is as follows

    def insert_data_from_txt(self, file_path):
        try:
            with open(file=file_path, mode="r", encoding="utf-8") as f:
                for line in f.readlines():
                    item = json.loads(line)
                    if 'title' not in item.keys():
                        continue

                    title = self.look_and_create(item['title'])

                    if 'wikiData' not in item.keys():
                        continue

                    wikiData = item['wikiData']

                    if 'wikiDesc' in wikiData.keys():
                        wikiDesc = self.look_and_create(wikiData['wikiDesc'])
                        self.create_sub_graph(entity1=title, entity2=wikiDesc, relation="desc")

                    if 'wikiTags' in wikiData.keys():
                        for tag in wikiData['wikiTags']:
                            tag = self.look_and_create(tag)
                            self.create_sub_graph(entity1=title, entity2=tag, relation="tag")

                    wikiInfo = wikiData['wikiInfo']

                    if 'country' in wikiInfo.keys():
                        country = self.look_and_create(wikiInfo['country'])
                        self.create_sub_graph(entity1=title, entity2=country, relation="country")

                    if 'language' in wikiInfo.keys():
                        language = self.look_and_create(wikiInfo['language'])
                        self.create_sub_graph(entity1=title, entity2=language, relation="language")

                    if 'actors' in wikiInfo.keys():
                        for actor in wikiInfo['actors']:
                            actor = self.look_and_create(actor)
                            self.create_sub_graph(entity1=title, entity2=actor, relation="actor")
                    if 'directors' in wikiInfo.keys():
                        for director in wikiInfo['directors']:
                            actor = self.look_and_create(director)
                            self.create_sub_graph(entity1=title, entity2=actor, relation="director")
            print(file_path, "读取完毕")
        except Exception as e:
            print("文件" + file_path + "读取异常:" + str(e))
            pass

 Looking long, it is actually parsing each item, first finding or creating an entity, corresponding to the function look_and_create. Since my py2neo version is different from the original project, I have rewritten this function, the code is as follows

    def look_and_create(self, name):
        matcher = NodeMatcher(self.graph)
        end = matcher.match("car_industry", name=name).first()
        if end == None:
            end = Node('car_industry', name=name)
        return end

Then create the entity relationship, corresponding to the function create_sub_graph, the code is as follows

    def create_sub_graph(self, entity1, relation, entity2):
        r = Relationship(entity1, relation, entity2, name=relation)
        self.graph.create(r)

The entire kg file code is as follows

# coding:utf-8
'''
Created on 2018年1月26日

@author: qiujiahao

@email:[email protected]

'''
import sys
import re
import os

sys.path.append('..')
from conf import get_args
from py2neo import Node, Relationship, Graph, NodeMatcher
import pandas as pd
import json

import os

data_dir = "C:\\Users\\songzeceng\\Desktop\\wiki\\"


class data(object):
    def __init__(self):
        self.args = get_args()
        self.data_process()

    def data_process(self):
        # 初始化操 # 插入数据
        self.data_init()
        print("数据预处理完毕")

    def data_init(self):
        # 连接图数据库
        print('开始数据预处理')
        self.graph = Graph('http://localhost:7474', user="neo4j", password="szc")
        # self.graph.delete_all()

        file_names = os.listdir(data_dir)
        for file_name in file_names:
            self.insert_data_from_txt(data_dir + file_name)

    def insert_data_from_txt(self, file_path):
        try:
            with open(file=file_path, mode="r", encoding="utf-8") as f:
                for line in f.readlines():
                    item = json.loads(line)
                    if 'title' not in item.keys():
                        continue

                    title = self.look_and_create(item['title'])

                    # id = self.look_and_create(item['id'])
                    #
                    # self.create_sub_graph(entity1=title, entity2=id, relation="title")

                    if 'wikiData' not in item.keys():
                        continue

                    wikiData = item['wikiData']

                    if 'wikiDesc' in wikiData.keys():
                        wikiDesc = self.look_and_create(wikiData['wikiDesc'])
                        self.create_sub_graph(entity1=title, entity2=wikiDesc, relation="desc")

                    if 'wikiTags' in wikiData.keys():
                        for tag in wikiData['wikiTags']:
                            tag = self.look_and_create(tag)
                            self.create_sub_graph(entity1=title, entity2=tag, relation="tag")

                    wikiInfo = wikiData['wikiInfo']

                    if 'country' in wikiInfo.keys():
                        country = self.look_and_create(wikiInfo['country'])
                        self.create_sub_graph(entity1=title, entity2=country, relation="country")

                    if 'language' in wikiInfo.keys():
                        language = self.look_and_create(wikiInfo['language'])
                        self.create_sub_graph(entity1=title, entity2=language, relation="language")

                    if 'actors' in wikiInfo.keys():
                        for actor in wikiInfo['actors']:
                            actor = self.look_and_create(actor)
                            self.create_sub_graph(entity1=title, entity2=actor, relation="actor")
                    if 'directors' in wikiInfo.keys():
                        for director in wikiInfo['directors']:
                            actor = self.look_and_create(director)
                            self.create_sub_graph(entity1=title, entity2=actor, relation="director")
            print(file_path, "读取完毕")
        except Exception as e:
            print("文件" + file_path + "读取异常:" + str(e))
            pass

    def create_sub_graph(self, entity1, relation, entity2):
        r = Relationship(entity1, relation, entity2, name=relation)
        self.graph.create(r)

    def look_and_create(self, name):
        matcher = NodeMatcher(self.graph)
        end = matcher.match("car_industry", name=name).first()
        if end == None:
            end = Node('car_industry', name=name)
        return end


if __name__ == '__main__':
    data = data()

 Run it, the command line output is as shown in the figure below

The data is not standardized and many files cannot be read. No matter what, it is a demo anyway. Then, take 25 pieces of data in the neo4j database, and the result is shown in the figure below

Run the service

Here you can directly change the ip and port in run_server.py to your own

Processing request

This step corresponds to views.py.

First, we need to intercept the get request of the /KnowGraph/v2 path, so we need to add an annotation function, as shown below

@app.route('/KnowGraph/v2', methods=["GET"])
def getInfoFromServer():
    pass

Then just implement this function, first process the request parameters, our request complete url is like this 

http://localhost:8090/KnowGraph/v2?method=entry&jsonrpc=2.0&id=1&params=entry=上将许世友-deep=2

There are many parameters, and many of them are fixed, such as jsonrpc, id, etc., so I simplified them to

http://localhost:8090/KnowGraph/v2?name=上将许世友

 Then in the getInfoFromServer() function, add all the default parameters, the code is as follows

def handle_args(originArgs):
    if 'name' not in originArgs.keys():
        return None

    args = {}
    for item in originArgs:
        key = item
        value = originArgs[key]
        if key == "params":
            kvs = str(value).split("-")
            kv_dic = {}
            for item in kvs:
                kv = item.split("=")
                k = kv[0]
                v = kv[1]
                if v.isnumeric():
                    kv_dic[k] = int(v)
                else:
                    kv_dic[k] = v
            args[key] = kv_dic
        else:
            if value.isnumeric():
                args[key] = int(value)
            else:
                args[key] = value

    if 'params' not in args.keys():
        args['params'] = {
            'name': args['name']
        }
        args.pop('name')

    args['params']['name'] = args['params']['name'].replace('\'', '\\\'')

    if 'method' not in args.keys():
        args['method'] = 'entry'
    if 'deep' not in args['params'].keys():
        args['params']['deep'] = 2
    if 'jsonrpc' not in args.keys():
        args['jsonrpc'] = 2.0
    if 'id' not in args.keys():
        args['id'] = 1
    return args

In fact, it is mainly traversal and filling operations

After the parameters are processed, we can perform different query operations based on the method field in the parameter, and then get the result from the result field of server_param, and hand it to the front end to render the page. Therefore, you can write the getInfoFromServer() function code as follows

@app.route('/KnowGraph/v2', methods=["GET"])
def getInfoFromServer():
    args = handle_args(request.args.to_dict())

    kg = KnowGraph(args)
    client_params = args
    server_param = {}

    if client_params['method'] == 'entry':
        kg.lookup_entry(client_params, server_param)

    server_param['id'] = client_params['id']
    server_param['jsonrpc'] = client_params['jsonrpc']
    server_param['method'] = client_params['method']
    print("server_param:\n", server_param)

    global mydata
    if 'result' in server_param.keys():
        mydata = server_param['result']
    else:
        mydata = '{}'
    print("mydata:\n", mydata)
    return render_template("index.html")

Here we only deal with the query of the entity, because our input is the name of a movie and TV series watched by the user.

When rendering the interface, the data will be obtained through the /KnowGraph/data path, so to intercept it, the code is as follows

@app.route("/KnowGraph/data")
def data():
    print("data:", data)
    return mydata

 The entire views.py file is as follows

# coding:utf-8
'''
Created on 2018年1月9日

@author: qiujiahao

@email:[email protected]

'''

from flask import jsonify
from conf import *
from flask import Flask
from flask import request, render_template
from server.app import app
import tensorflow as tf
from server.module import KnowGraph
import json

mydata = ""

# http://210.41.97.89:8090/KnowGraph/v2?name=胜利之路
# http://113.54.234.209:8090/KnowGraph/v2?name=孤战
# http://localhost:8090/KnowGraph/v2?method=entry_to_property&jsonrpc=2.0&id=1&params=entry=水冶-property=位置
@app.route('/KnowGraph/v2', methods=["GET"])
def getInfoFromServer():
    args = handle_args(request.args.to_dict())

    kg = KnowGraph(args)
    client_params = args
    server_param = {}

    if client_params['method'] == 'entry':
        kg.lookup_entry(client_params, server_param)

    server_param['id'] = client_params['id']
    server_param['jsonrpc'] = client_params['jsonrpc']
    server_param['method'] = client_params['method']
    print("server_param:\n", server_param)

    global mydata
    if 'result' in server_param.keys():
        mydata = server_param['result']
    else:
        mydata = '{}'
    print("mydata:\n", mydata)
    return render_template("index.html")


def handle_args(originArgs):
    if 'name' not in originArgs.keys():
        return None

    args = {}
    for item in originArgs:
        key = item
        value = originArgs[key]
        if key == "params":
            kvs = str(value).split("-")
            kv_dic = {}
            for item in kvs:
                kv = item.split("=")
                k = kv[0]
                v = kv[1]
                if v.isnumeric():
                    kv_dic[k] = int(v)
                else:
                    kv_dic[k] = v
            args[key] = kv_dic
        else:
            if value.isnumeric():
                args[key] = int(value)
            else:
                args[key] = value

    if 'params' not in args.keys():
        args['params'] = {
            'name': args['name']
        }
        args.pop('name')

    args['params']['name'] = args['params']['name'].replace('\'', '\\\'')

    if 'method' not in args.keys():
        args['method'] = 'entry'
    if 'deep' not in args['params'].keys():
        args['params']['deep'] = 2
    if 'jsonrpc' not in args.keys():
        args['jsonrpc'] = 2.0
    if 'id' not in args.keys():
        args['id'] = 1
    return args


@app.route("/KnowGraph/data")
def data():
    print("data:", data)
    return mydata

Database query

Finally, we put our energy into the database query and result analysis in module.py.

In order to facilitate viewing, we put the results in the json file. Therefore, the query results are stored in a dictionary in the memory. Before each query, the dictionary is cleared, and then the query is performed, and then different parsing logic is executed according to whether there is a result. Therefore, you can write the lookup_entry function as shown below

    def lookup_entry(self, client_params, server_param):
        # 支持设定网络查找的深度
        start_time = time.time()
        params = client_params["params"]
        edges = set()
        sim_dict.clear()

        self.lookup_entry_deep(edges, params, 0)
        if len(edges) == 0:
            server_param['success'] = 'false'
        else:
            self.handleResult(edges, server_param, start_time)

Enquiries on entities are placed in the lookup_entry_deep() function. Generally speaking, our depth is only two levels. The first level is for us to query the various attributes of the user’s film and television series, such as the director of General Xu Shiyou. At the second level, we find the entity corresponding to this attribute according to each attribute, such as query The director of Admiral Xu Shiyou, also the main film and television dramas. Obviously, the first layer is a forward search, and the second layer is a reverse search.

When searching, in order to avoid recommending to the user the movie and TV series he has just watched, we also need to de-duplicate the results. For example, we search for Admiral XXX. When we find that the director of Admiral XXX is An Lan, and then perform a reverse search on An Lan, if we find that An Lan only directed the work of Admiral XXX, then we don’t have to. No, add Admiral Xu Shiyou to the recommended list.

In response to the above situation where no other entities have been found, I defined this return result as'nothing else'; if nothing is found, it is'nothing got'; if the depth exceeds the standard, it is'deep out'; everything is normal , It is'ok'.

We first perform a two-way query, the code is as follows

        result1 = self.graph.run(cypher='''match (s)-[r]->(e) where s.name='{}'
                                            return s.name,r.name,e.name'''.format(params['name'])).data()

        result2 = self.graph.run(cypher='''match (e)<-[r]-(s) where e.name='{}' 
                                            return s.name,r.name,e.name '''.format(params['name'])).data()

Then the two results are judged to be empty, and if the length is 0, it returns'nothing got'

        if len(result1) == 0 and len(result2) == 0:
            return 'nothing got'

 If result2 (that is, the result of the reverse search) has only one item, the s.name (that is, the name of the movie and TV play) in this item is still the input entity name, and the e.name (that is, the attribute name) is still the original attribute name. Then return directly to'nothing else'

        if len(result2) == 1:
            item = result2[0]
            if origin_tv_name is not None and origin_property_name is not None:
                if origin_property_name == item['e.name'] and origin_tv_name == item['s.name']:
                    return 'nothing else'

The origin_tv_name and origin_property_name here are both one of the parameters of the lookup_entry_deep function, and the default is None

Then we first traverse the result1 of the forward query, concatenate the attribute value (e.name), attribute name (r.name) and the original film and television drama (s.name) inside, and save them as a triple in the edges collection.

        for item in result1:
            tv_name = item['s.name']
            property_name = item['e.name']

            has_result = False
            if tv_name != property_name:  # 避免出现:双面胶:中文名:双面胶的死循环
                if oldName != property_name:
                    params['name'] = property_name
                    has_result = self.lookup_entry_deep(edges, params.copy(), deep + 1,
                                                        origin_tv_name=tv_name,
                                                        origin_property_name=property_name)

oldName is the name of the entity in this query. In order to avoid an endless loop, a judgment has been added. In fact, in our scenario, this judgment must be established.

Next, we analyze the results of the reverse search. If a new film and television drama is found, the similarity of this relationship is first based on the relationship between the new film and television drama and its attributes. Then, add the new movie and TV series, the same attribute name, and the similarity to the similar dictionary and edges collection in a cumulative or new way. The code is as follows

        for item in result2:
            tv_name = item['s.name']
            property_name = item['e.name']
            relation_name = item['r.name']

            
            if tv_name != origin_tv_name:
                 score = get_sim_score_accroding_to_relation(relation_name)

                 if tv_name not in sim_dict.keys():
                     sim_dict[tv_name] = {
                         relation_name: [property_name],
                         "similarity": score
                     }
                 else:
                     item_dict = sim_dict[tv_name]
                     if relation_name in item_dict.keys() and \
                             property_name in item_dict.values():
                        continue

                     if relation_name in item_dict.keys():
                        item_dict[relation_name].append(property_name)
                     else:
                        item_dict[relation_name] = [property_name]
                     item_dict["similarity"] += score
                 edges.add((tv_name, relation_name, property_name))

Among them, the function get_sim_score_accroding_to_relation() code for obtaining similarity according to the relationship is as follows

def get_sim_score_accroding_to_relation(relation_name):
    if relation_name in ['actor', 'director', 'tag']:
        return 1.0
    elif relation_name in ['language', 'country']:
        return 0.5
    return 0.0

The complete lookup_entry_deep() function is shown below

    # 限制深度的查找
    def lookup_entry_deep(self, edges, params, deep, origin_tv_name=None, origin_property_name=None):
        # 当前查找深度不得等于要求的深度
        if deep >= params['deep']:
            return 'deep out'
        # 正向查找
        oldName = str(params['name'])
        if oldName.__contains__("\'") and not oldName.__contains__("\\\'"):
            params['name'] = oldName.replace("\'", "\\\'")

        result1 = self.graph.run(cypher='''match (s)-[r]->(e) where s.name='{}'
                                            return s.name,r.name,e.name'''.format(params['name'])).data()

        result2 = self.graph.run(cypher='''match (e)<-[r]-(s) where e.name='{}' 
                                            return s.name,r.name,e.name '''.format(params['name'])).data()

        if len(result1) == 0 and len(result2) == 0:
            return 'nothing got'

        if len(result2) == 1:
            item = result2[0]
            if origin_tv_name is not None and origin_property_name is not None:
                if origin_property_name == item['e.name'] and origin_tv_name == item['s.name']:
                    return 'nothing else'

        for item in result1:
            tv_name = item['s.name']
            property_name = item['e.name']

            if tv_name != property_name:  # 避免出现:双面胶:中文名:双面胶的死循环
                if oldName != property_name:
                    params['name'] = property_name
                    has_result = self.lookup_entry_deep(edges, params.copy(), deep + 1,
                                                        origin_tv_name=tv_name,
                                                        origin_property_name=property_name)

        for item in result2:
            has_result = False
            tv_name = item['s.name']
            property_name = item['e.name']
            relation_name = item['r.name']

            if tv_name != origin_tv_name:
                score = get_sim_score_accroding_to_relation(relation_name)

                if tv_name not in sim_dict.keys():
                    sim_dict[tv_name] = {
                        relation_name: [property_name],
                        "similarity": score
                    }
                else:
                    item_dict = sim_dict[tv_name]
                    if relation_name in item_dict.keys() and \
                            property_name in item_dict.values():
                        continue

                    if relation_name in item_dict.keys():
                        item_dict[relation_name].append(property_name)
                    else:
                        item_dict[relation_name] = [property_name]
                    item_dict["similarity"] += score
                edges.add((tv_name, relation_name, property_name))

        return 'ok'

When the query is completed, if there is a result, we will go to the handle_result() function to process the result, return or output. It is mainly based on the similarity to sort from high to low, and then take out the top 20 and write them into the json file. This part of the code is as follows

    def handleResult(self, edges, server_param, start_time):
        ....
        sorted_sim_list = sorted(sim_dict.items(), key=lambda x: x[1]['similarity'], reverse=True)
        ret = {}
        for i in range(len(sorted_sim_list)):
            if i >= 20:
                break
            ret[sorted_sim_list[i][0]] = sorted_sim_list[i][1]

        mydata = json.dumps(ret, ensure_ascii=False)
        print('Json路径是:%s' % (fname))
        self.clear_and_write_file(fname, mydata)

    def clear_and_write_file(self, fname, mydata):
        with open(fname, 'w', encoding='utf-8') as f:
            f.write(str(""))
        with open(fname, 'a', encoding='utf-8') as f:
            f.write(str(mydata))

In addition, I also stored the results in server_param, used to output the results to the front-end interface, this part of the code is as follows

        ret = []
        for result in edges:
            ret.append({
                "source": result[0],
                "target": result[2],
                "relation": result[1],
                "label": "relation"
            })
        print("ret:", ret)
        server_param['result'] = {"edges": ret}
        server_param['success'] = 'true'
        print('本次查找三元组的数量为:{},耗时:{}s'.format(len(ret), time.time() - start_time))

 The code of the complete result processing function is as follows

    def handleResult(self, edges, server_param, start_time):
        ret = []
        for result in edges:
            ret.append({
                "source": result[0],
                "target": result[2],
                "relation": result[1],
                "label": "relation"
            })
        print("ret:", ret)
        server_param['result'] = {"edges": ret}
        server_param['success'] = 'true'
        print('本次查找三元组的数量为:{},耗时:{}s'.format(len(ret), time.time() - start_time))

        sorted_sim_list = sorted(sim_dict.items(), key=lambda x: x[1]['similarity'], reverse=True)
        ret = {}
        for i in range(len(sorted_sim_list)):
            if i >= 20:
                break
            ret[sorted_sim_list[i][0]] = sorted_sim_list[i][1]

        mydata = json.dumps(ret, ensure_ascii=False)
        print('Json路径是:%s' % (fname))
        self.clear_and_write_file(fname, mydata)

operation result

First start the service, run run_server.py, and then in the browser address bar, enter the following url (XXX is the name entered):

http://210.41.97.169:8090/KnowGraph/v2?name=XXX

 Then the page output is as follows

The results are very complicated, let's look at the output of the first 20 in the json file, the results are as follows

{
  "XXX元帅": {
    "actor": [
      "侯勇",
      "刘劲"
    ],
    "similarity": 14.0,
    "language": [
      "普通话"
    ],
    "country": [
      "中国大陆"
    ],
    "tag": [
      "传记",
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "热血",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "BBB": {
    "actor": [
      "刘劲",
      "王伍福"
    ],
    "similarity": 14.0,
    "language": [
      "普通话"
    ],
    "country": [
      "中国大陆"
    ],
    "tag": [
      "传记",
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "热血",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "长征大会师": {
    "actor": [
      "刘劲",
      "王伍福"
    ],
    "similarity": 14.0,
    "language": [
      "普通话"
    ],
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "战将": {
    "language": [
      "普通话"
    ],
    "similarity": 13.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "传记",
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "热血",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "炮神": {
    "language": [
      "普通话"
    ],
    "similarity": 13.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "动作",
      "革命",
      "军旅",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "独立纵队": {
    "language": [
      "普通话"
    ],
    "similarity": 13.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "女子军魂": {
    "language": [
      "普通话"
    ],
    "similarity": 13.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "革命",
      "军旅",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "热血军旗": {
    "actor": [
      "侯勇"
    ],
    "similarity": 12.0,
    "language": [
      "普通话"
    ],
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "热血",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "擒狼": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "信者无敌": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "我的抗战之猎豹突击": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "魔都风云": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "动作",
      "革命",
      "战争",
      "电视剧"
    ]
  },
  "英雄戟之影子战士": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "第一声枪响": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "亮剑": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "飞虎队": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "伟大的转折": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "太行英雄传": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "热血",
      "动作",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "雪豹": {
    "language": [
      "普通话"
    ],
    "similarity": 12.0,
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "55-70",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "革命",
      "军旅",
      "战争",
      "历史",
      "电视剧"
    ]
  },
  "宜昌保卫战": {
    "actor": [
      "侯勇"
    ],
    "similarity": 11.0,
    "language": [
      "普通话"
    ],
    "country": [
      "中国大陆"
    ],
    "tag": [
      "上星剧",
      "45-55",
      "36-45",
      "24-36",
      "激昂",
      "革命",
      "战争",
      "历史",
      "电视剧"
    ]
  }
}

The top ones are all movies and TV series that have a high correlation with our input. The similarity and the same attributes are also among them. It seems that the effect is not bad.

Conclusion

This is just a demo to experience the application of the knowledge graph in the recommendation system.

Finally, I would like to thank the original project author again. Without the framework built by his hard work, it would be difficult for me to make the first step of practice.

Give the address of the original project again: https://github.com/qiu997018209/KnowledgeGraph

 

 

Guess you like

Origin blog.csdn.net/qq_37475168/article/details/100709201
Recommended