py2neo operation graph database neo4j

1. Concept

Graph: The graph in the data structure consists of nodes and edges between them. A node represents an entity, and an edge represents a connection between entities.

Graph database: a database that stores management data in a graph structure. Some of the databases directly store the native graph structure after optimization, that is, the native graph storage. Some graph databases serialize graph data and store them in relational or other databases.

The reason why the graph database is used to store data is because it has great advantages in processing data with complex relationships between entities. Using traditional relational databases is actually inconvenient when dealing with the relationships between data. For example, when querying the students who take a course, you need to join two tables, and query what courses are taken by the students who take a course. This requires two join operations. When it involves a very complicated relationship and a huge amount of data, the relationship type The database efficiency is very low. With graph storage, the results can be easily queried through the edges between nodes.

Diagram model:

Node (Node) is the main data element, representing an entity.

Properties (Properties) are used to describe the characteristics of the entity, expressed in the form of key-value pairs, where the key is a string, you can create indexes and constraints on properties.

Relationships (Relationships) represent the relationship between entities, the relationship has a direction, there can be multiple relationships between entities, the relationship can also have attributes

Label (Label) is used to classify entities, an entity can have multiple labels, indexing the labels can speed up the search

2、Neo4j

Neo4j is currently the most popular graph database. It uses native graph storage, download and install it in windows and visit the following address https://neo4j.com/download/community-edition/ . Under Linux, download and unzip with the following command

curl -O http://dist.neo4j.org/neo4j-community-3.4.5-unix.tar.gz
tar -axvf neo4j-community-3.4.5-unix.tar.gz

Modify the configuration file conf / neo4j.conf

# 修改第22行load csv时l路径，在前面加个#，可从任意路径读取文件
#dbms.directories.import=import

# 修改35行和36行，设置JVM初始堆内存和JVM最大堆内存
# 生产环境给的JVM最大堆内存越大越好，但是要小于机器的物理内存
dbms.memory.heap.initial_size=5g
dbms.memory.heap.max_size=10g

# 修改46行，可以认为这个是缓存，如果机器配置高，这个越大越好
dbms.memory.pagecache.size=10g

# 修改54行，去掉改行的#，可以远程通过ip访问neo4j数据库
dbms.connectors.default_listen_address=0.0.0.0

# 默认 bolt端口是7687，http端口是7474，https关口是7473，不修改下面3项也可以
# 修改71行，去掉#，设置http端口为7687，端口可以自定义，只要不和其他端口冲突就行
#dbms.connector.bolt.listen_address=:7687

# 修改75行，去掉#，设置http端口为7474，端口可以自定义，只要不和其他端口冲突就行
dbms.connector.http.listen_address=:7474

# 修改79行，去掉#，设置http端口为7473，端口可以自定义，只要不和其他端口冲突就行
dbms.connector.https.listen_address=:7473

# 去掉#，允许从远程url来load csv
dbms.security.allow_csv_import_from_file_urls=true

# 修改250行，去掉#，设置neo4j-shell端口，端口可以自定义，只要不和其他端口冲突就行
dbms.shell.port=1337

# 修改254行，设置neo4j可读可写
dbms.read_only=false

Run ./neo4j start in the bin directory to start the service. You can see the visual interface of neo4j in the browser http: // server ip address: 7474 / browser /

3、py2neo

py2neo is a community third-party library, through which you can more easily use python to operate neo4j

Install py2neo: pip install py2neo, the version I installed is 4.3.0

3.1, Node and Relationship

Create nodes and the relationship between them. Note that you need to import before using the following py2neo related classes:

# 引入库
from py2neo import Node, Relationship
# 创建节点a、b并定义其标签为Person，属性name
a = Node("Person", name="Alice",height=166)
b = Node("Person", name="Bob")
# 节点添加标签
a.add_label('Female')
# 创建ab之间的关系
ab = Relationship(a, "KNOWS", b)
# 输出节点之间的关系：(Alice)-[:KNOWS]->(Bob)
print(ab)

Both Node and Relationship inherit the PropertyDict class, which is similar to Python's dictionary. You can assign and access properties of Node or Relationship in the following ways

# 节点和关系添加、修改属性
a['age']=20
ab['time']='2019/09/03'
# 删除属性
del a['age']
# 打印属性
print(a[name])
# 设置默认属性,如果没有赋值，使用默认值，否则设置的新值覆盖默认值
a.setdefault('sex','unknown')
# 更新属性
a.update(age=22, sex='female')
ab.update(time='2019/09/03')

3.2、Subgraph

The set composed of nodes and relations is a subgraph, and the intersection operator &, union |, difference set-, symmetric difference set are obtained through the relationship operator ^

subgraph.labels returns all label sets in the subgraph, keys () returns all attribute sets, nodes returns all node sets, and relationships returns all relationship sets

# 构建一个子图
s = a | b | ab
# 对图中的所有节点集合进行遍历
for item in s.nodes:
    print('s的节点：', item)

Usually all the nodes and relationships in the graph form a subgraph and then write them into the database uniformly, which is more efficient than writing to a single node multiple times

# 连接neo4j数据库，输入地址、用户名、密码
graph = Graph('http://localhost:7474', username='neo4j', password='123456')
# 将节点和关系通过关系运算符合并为一个子图，再写入数据库
s=a | b | ab
graph.create(s)

3.3、Walkable

Walkable is an object that adds traversal information on the basis of subgraph subgraph, through which you can easily traverse the graph database.

Connect the relationship through the + sign to form a walkable object. Walk through the walk () function, you can use the start_node, end_node, nodes, relationship properties to get the start Node, end Node, all Node and Relationship

# 组合成一个walkable对象w
w = ab + bc + ac
# 对w进行遍历
for item in walk(w):
    print(item)

# 访问w的初始、终止节点
print('起始节点：', w.start_node, ' 终止节点：', w.end_node)
# 访问w的所有节点、关系列表
print('节点列表：', w.nodes)
print('关系列表：', w.relationships)

The running result is:

(:Person {age: 20, name: 'Bob'})
(Bob)-[:KNOWS {}]->(Alice)
(:Person {age: 21, name: 'Alice'})
(Alice)-[:LIKES {}]->(Mike)
(:Person {name: 'Mike'})
(Bob)-[:KNOWS {}]->(Mike)
(:Person {age: 20, name: 'Bob'})
起始节点： (:Person {age: 22, name: 'Bob', sex: 'female'})  终止节点： (:Person {age: 22, name: 'Bob', sex: 'female'})
节点列表： ((:Person {age: 22, name: 'Bob', sex: 'female'}), (:Person {age: 21, name: 'Alice'}), (:Person {name: 'Mike'}), (:Person {age: 22, name: 'Bob', sex: 'female'}))
关系列表： ((Bob)-[:KNOWS {time: '2019/09/03'}]->(Alice), (Alice)-[:LIKES {}]->(Mike), (Bob)-[:KNOWS {}]->(Mike))

3.4、Graph

py2neo operates the neo4j database through the graph object. The current neo4j only supports one database to define a graph

Complete the connection to the database through the initialization function of Graph and create a graph object

graph.create () can write subgraphs to the database or only one node or relationship at a time

graph.delete () deletes the specified subgraph, graph.delete_all () deletes all subgraphs

graph.seperate () deletes the specified relationship

# 初始化连接neo4j数据库，参数依次为url、用户名、密码
graph = Graph('http://localhost:7474', username='neo4j', password='123456')
# 写入子图w
graph.create(w)
# 删除子图w
graph.delete(w)
# 删除所有图
graph.delete_all()
# 删除关系rel
graph.separate(rel)

graph.match ( nodes = None , r_type = None , limit = None ) finds the relationship that meets the condition, the first parameter is a node set or set (start node, end node), if omitted, it represents all nodes. The second parameter is the attribute of the relationship, and the third is the number of results returned. You can also use match_one () instead to return a result. For example, find all the people that node a knows:

# 查找所有以a为起点，并且属性为KNOWS的关系
res = graph.match((a, ), r_type="KNOWS")
# 打印关系的终止节点，即为a所有认识的人
for rel in res:
    print(rel.end_node["name"])

Use graph.nodes.match () to find the specified node, you can use the first (), where (), order_by () and other functions to do advanced restrictions on the search

You can also search by the id of the node or relationship

# 查找标签为Person，属性name="Alice"的节点，并返回第一个结果
graph.nodes.match("Person", name="Alice").first()

# 查找所有标签为Person，name以B开头的节点，并将结果按照age字段排序
res = graph.nodes.match("Person").where("_.name =~ 'B.*'").order_by('_.age')
for node in res:
    print(node['name'])

# 查找id为4的节点
t_node = graph.nodes[4]
# 查找id为196的关系
rel = graph.relationships[196]

Perform Cypher operations through Graph objects and process the returned results

graph.evaluate () executes a Cypher statement and returns the first data of the result

# 执行Cypher语句并返回结果集的第一条数据
res = graph.evaluate('MATCH (p:Person) return p')
# 输出：(_3:Person {age: 20, name: 'Bob'})
print(res)

graph.run () executes the Cypher statement and returns the cursor Cursor of the result data stream . The forward () method continuously moves the cursor forward to switch each record object of the result set.

# 查询(p1)-[k]->(p2)，并返回所有节点和关系
gql="MATCH (p1:Person)-[k:KNOWS]->(p2:Person) RETURN *"
cursor=graph.run(gql)
# 循环向前移动游标
while cursor.forward():
    # 获取并打印当前的结果集
    record=cursor.current
    print(record)

Each Record record object printed is shown below. You can see that the elements in it are a set of key = value. You can get out the specific elements by the method get (key). Through the method items (keys), the key specified in the record can be returned in the form of (key, value) tuple

<Record k=(xiaowang)-[:KNOWS {}]->(xiaozhang) p1=(_96:Person {name: 'xiaowang'}) p2=(_97:Person {name: 'xiaozhang'})>

    record = cursor.current
    print('通过get返回：', record.get('k'))
    for (key, value) in record.items('p1', 'p2'):
        print('通过items返回元组：', key, ':', value)

# 运行结果如下
'''
通过get返回： (xiaowang)-[:KNOWS {}]->(xiaozhang)
通过items返回元组： p1 : (_92:Person {name: 'xiaowang'})
通过items返回元组： p2 : (_93:Person {name: 'xiaozhang'})
'''

You can also convert the results returned by graph.run () into a dictionary list through the data () method. All the results are a list as a whole. Each result is in the format of a dictionary. The query and results are as follows. Way to get data:

# 查询(p1)-[k]->(p2)，并返回所有节点和关系
gql = "MATCH (p1:Person)-[k:KNOWS]->(p2:Person) RETURN *"
res = graph.run(gql).data()
print(res)

#结果如下：
'''
[{'k': (xiaowang)-[:KNOWS {}]->(xiaozhang), 
  'p1': (_196:Person {name: 'xiaowang'}), 
  'p2': (_197:Person {name: 'xiaozhang'})}, 
{'k': (xiaozhang)-[:KNOWS {}]->(xiaozhao), 
 'p1': (_197:Person {name: 'xiaozhang'}),
 'p2': (_198:Person {name: 'xiaozhao'})},
{'k': (xiaozhao)-[:KNOWS {}]->(xiaoli),
 'p1': (_198:Person {name: 'xiaozhao'}),
 'p2': (_199:Person {name: 'xiaoli'})}
]
'''

Convert the returned result into a SubGraph object through the graph.run (). To_subgraph () method, and then obtain the node object according to the method of operating the SubGraph object before. The node object here can directly operate according to the previous Node

# 查询(p1)-[k]->(p2)，并返回所有节点和关系
gql = "MATCH (p1:Person)-[k:KNOWS]->(p2:Person) RETURN *"
sub_graph = graph.run(gql).to_subgraph()
# 获取子图中所有节点对象并打印
nodes=sub_graph.nodes
for node in nodes:
    print(node)

# 输出的节点对象如下：
'''
(_101:Person {name: 'xiaozhang'})
(_100:Person {name: 'xiaowang'})
(_103:Person {name: 'xiaoli'})
(_102:Person {name: 'xiaozhao'})
'''

3.5, GMO

Object-Graph Mapping maps the nodes in the graph database to Python objects, and accesses and manipulates the nodes by means of objects.

Define each label in the figure as a python class, which inherits from GraphObject, pay attention to import before use. In the definition, you can specify the primary key of the data class, and define the properties of the class Property (), Label (), the relationship RelatedTo () / RelatedFrom.

from py2neo.ogm import GraphObject, Property, RelatedTo, RelatedFrom, Label
class Person(GraphObject):
    # 定义主键
    __primarykey__ = 'name'
    # 定义类的属性
    name=Property()
    age=Property()
    # 定义类的标签
    student=Label()
    # 定义Person指向的关系
    knows=RelatedTo('Person','KNOWS')
    # 定义指向Person的关系
    known=RelatedFrom('Person','KNOWN')

The class method wrap () can transform a common node into a class object.

The class method match (graph, primary_key) can find nodes in the graph whose primary key value is primary_key

You can create an object directly through the class construction method, and directly access the properties and methods of the object, and add the relationship through the relationship method add ()

The label of the class is a bool value, the default is False, modify it to True, you can add a label to the object

# 将节点c转化为OGM类型
c=Person.wrap(c)
print(c.name)
# 查找Person类中主键(name)为Alice的节点
ali=Person.match(graph,'Alice').first()
# 创建一个新的Person对象并对其属性赋值
new_person = Person()
new_person.name = 'Durant'
new_person.age = 28
# 标签值默认为False
print(new_person.student)
# 修改bool值为True，为对象添加student标签
new_person.student=True
# 将修改后的图写入数据库
graph.push(ali)

When defining a node class, you can also define its related relationships. For example, you can define the relationship pointed out from the node through RelatedTo () and RelatedFrom () to define the relationship pointing to the node. Complete the relationship operation around the node through the corresponding method of the object call relationship, such as add () to add a relationship, clear () to clear all the relationship of the node, get () to obtain the relationship attribute, remove () to clearly specify the relationship, update () update relationship

class Person(GraphObject):
    # 定义Person指向的关系
    knows=RelatedTo('Person','KNOWS')
    # 定义指向Person的关系
    known=RelatedFrom('Person','KNOWN')

# 新建一个从ali指向new_person的关系
ali.knows.add(new_person)
# 清除ali节点所有的know关系
ali.knows.clear()
# 清除ali节点指向new_person的那个know关系
ali.knows.remove(new_person)
# 更新ali指向new_person关系的属性值
ali.knows.update(new_person,year=5)
# 获取ali指向new_person关系的属性year的值
ali.knows.get(new_person,'year')

Through the graph object, you can also call the match method to match nodes and relationships

# 获取第一个主键name名为Alice的Person对象
ali = Person.match(graph, 'Alice').first()
# 获取所有name以B开头的Person对象
Person.match(graph).where("_.name =~ 'B.*'")

You can also operate on the node object through the graph:

# 更新图中ali节点的相关数据
graph.push(ali)
# 用图中的信息来更新ali节点
graph.pull(ali)
# 删除图中的ali对象节点
graph.delete(ali)

theVicTory

Published 124 original articles · Like 65 · Visit 130,000+

Private letter concerns