一、Neo4j简介
neo4j是由Java实现的开源NoSQL图数据库.自从2003年开始研发, 到2007年发布第一版, 最新版本为3.3.5, neo4j现如今已经被各行各业的数十万家公司和组织采用.
neo4j实现了专业数据库级别的图数据模型的存储. 与普通的图处理或内存级数据库不同, neo4j提供了完整的数据库特性, 包括ACID事物的支持, 集群支持, 备份与故障转移等. 这使其适合于企业级生产环境下的各种应用.
案例展示:红楼梦人物关系图
Neo4j W3C教程:Neo4j–W3CSchool教程
neo4j的版本说明:
- 企业版: 需要高额的付费获得授权, 提供高可用, 热备份等性能.
- 社区开源版: 免费使用, 但只能单点运行.
neo4j图形数据库的有关概念:
节点
- 节点是主要的数据元素, 节点通过关系连接到其他节点, 节点可以具有一个或多个属性
- (即存储为键/值对的属性), 节点有一个或多个标签, 用于描述其在图表中的作用. 示例: Person>节点.
- 可以将节点类比为关系型数据库中的表, 对应的标签可以类比为不同的表名, 属性就是表中的列.
关系
- 关系连接两个节点, 关系是方向性的, 关系可以有一个或多个属性(即存储为键/值对的属性
属性
- 属性是命名值, 其中名称(或键)是字符串, 属性可以被索引和约束, 可以从多个属性创
- 建复合索引.
标签
- 标签用于组节点到集, 节点可以具有多个标签, 对标签进行索引以加速在图中查找节点.
二、neo4j图数据库的安装
neo4j图数据库依赖于Java,所以要先按照Jdk
neo4j图数据库的安装流程:
- 第一步: 将neo4j安装信息载入到yum检索列表.
- 第二步: 使用yum install命令安装.
- 第三步: 修改配置文件内容 /etc/neo4j/neo4j.conf.
- 第四步: 启动neo4j数据库.
1、第一步: 将neo4j安装信息载入到yum检索列表
对于CentOS系统安装Neo4j,需要手动安装Yum源
(base) [root@whx ~]# cd /tmp
(base) [root@whx tmp]# wget http://debian.neo4j.org/neotechnology.gpg.key
(base) [root@whx tmp]# sudo rpm --import neotechnology.gpg.key
- 其中cd /tmp 为导航到系统tmp目录下;
- 然后使用wget命令将安装配置文件neotechnology.gpg.key下载到当前目录;、
- 再使用sudo rpm --import neotechnology.gpg.key命令将安装配置文件导入到系统中。
2、第二步: 文本编辑器创建一个/etc/yum.repos.d/neo4j.repo内容:
(base) [root@whx ~]# vim /etc/yum.repos.d/neo4j.repo
[neo4j]
name=Neo4j RPM Repository
baseurl=http://yum.neo4j.org/stable
enabled=1
gpgcheck=1
3、第三步: 我们就可以使用yum命令安装neo4j。
yum install neo4j-3.3.5
至此在CentOS系统下Neo4j已安装完毕。下面是安装后Neo4j的文件路径:
- Neo4j安装目录为:/usr/share/neo4j
- Neo4j的属性文件所在目录为: /etc/neo4j
- Neo4j默认的数据库文件保存目录为: /var/lib/neo4j
我们导航到/usr/share/neo4j/bin 运行目录下,运行:neo4j start命令就可以启动neo4j数据库了。
4、第四步:修改配置文件
默认在/etc/neo4j/neo4j.conf, 为了方便显示下面把一些修改显示在这里
# 数据库的存储库存储位置、日志位置等
dbms.directories.data=/var/lib/neo4j/data
dbms.directories.plugins=/var/lib/neo4j/plugins
dbms.directories.certificates=/var/lib/neo4j/certificates
dbms.directories.logs=/var/log/neo4j
dbms.directories.lib=/usr/share/neo4j/lib
dbms.directories.run=/var/run/neo4j
# 导入的位置
dbms.directories.import=/var/lib/neo4j/import
# 初始化内存大小
dbms.memory.heap.initial_size=512m
# Bolt 连接地址
dbms.connector.bolt.enabled=true
dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=0.0.0.0:7687
修改后的完整配置文件:
#*****************************************************************
# Neo4j configuration
#
# For more details and a complete list of settings, please see
# https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
#*****************************************************************
# The name of the database to mount
#dbms.active_database=graph.db
# Paths of directories in the installation.
# 数据库的存储库存储位置、日志位置等
dbms.directories.data=/var/lib/neo4j/data
dbms.directories.plugins=/var/lib/neo4j/plugins
dbms.directories.certificates=/var/lib/neo4j/certificates
dbms.directories.logs=/var/log/neo4j
dbms.directories.lib=/usr/share/neo4j/lib
dbms.directories.run=/var/run/neo4j
# This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# `LOAD CSV` section of the manual for details.
# 导入的位置
dbms.directories.import=/var/lib/neo4j/import
# Whether requests to Neo4j are authenticated.
# To disable authentication, uncomment this line
#dbms.security.auth_enabled=false
# Enable this to be able to upgrade a store from an older version.
#dbms.allow_upgrade=true
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.
dbms.memory.heap.initial_size=512m
#dbms.memory.heap.max_size=10g
# The amount of memory to use for mapping the store files, in bytes (or
# kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the max Java heap size.
#dbms.memory.pagecache.size=10g
#*****************************************************************
# Network connector configuration
#*****************************************************************
# With default configuration Neo4j only accepts local connections.
# To accept non-local connections, uncomment this line:
dbms.connectors.default_listen_address=0.0.0.0
# You can also choose a specific network interface, and configure a non-default
# port for each connector, by setting their individual listen_address.
# The address at which this server can be reached by its clients. This may be the server's IP address or DNS name, or
# it may be the address of a reverse proxy which sits in front of the server. This setting may be overridden for
# individual connectors below.
dbms.connectors.default_advertised_address=0.0.0.0
# You can also choose a specific advertised hostname or IP address, and
# configure an advertised port for each connector, by setting their
# individual advertised_address.
# Bolt connector
# Bolt 连接地址
dbms.connector.bolt.enabled=true
dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=0.0.0.0:7687
# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=0.0.0.0:7474
# HTTPS Connector. There can be zero or one HTTPS connectors.
#dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=:7473
# Number of Neo4j worker threads.
#dbms.threads.worker_count=
#*****************************************************************
# SSL system configuration
#*****************************************************************
# Names of the SSL policies to be used for the respective components.
# The legacy policy is a special policy which is not defined in
# the policy configuration section, but rather derives from
# dbms.directories.certificates and associated files
# (by default: neo4j.key and neo4j.cert). Its use will be deprecated.
# The policies to be used for connectors.
#
# N.B: Note that a connector must be configured to support/require
# SSL/TLS for the policy to actually be utilized.
#
# see: dbms.connector.*.tls_level
#bolt.ssl_policy=legacy
#https.ssl_policy=legacy
#*****************************************************************
# SSL policy configuration
#*****************************************************************
# Each policy is configured under a separate namespace, e.g.
# dbms.ssl.policy.<policyname>.*
#
# The example settings below are for a new policy named 'default'.
# The base directory for cryptographic objects. Each policy will by
# default look for its associated objects (keys, certificates, ...)
# under the base directory.
#
# Every such setting can be overriden using a full path to
# the respective object, but every policy will by default look
# for cryptographic objects in its base location.
#
# Mandatory setting
#dbms.ssl.policy.default.base_directory=certificates/default
# Allows the generation of a fresh private key and a self-signed
# certificate if none are found in the expected locations. It is
# recommended to turn this off again after keys have been generated.
#
# Keys should in general be generated and distributed offline
# by a trusted certificate authority (CA) and not by utilizing
# this mode.
#dbms.ssl.policy.default.allow_key_generation=false
# Enabling this makes it so that this policy ignores the contents
# of the trusted_dir and simply resorts to trusting everything.
#
# Use of this mode is discouraged. It would offer encryption but no security.
#dbms.ssl.policy.default.trust_all=false
# The private key for the default SSL policy. By default a file
# named private.key is expected under the base directory of the policy.
# It is mandatory that a key can be found or generated.
#dbms.ssl.policy.default.private_key=
# The private key for the default SSL policy. By default a file
# named public.crt is expected under the base directory of the policy.
# It is mandatory that a certificate can be found or generated.
#dbms.ssl.policy.default.public_certificate=
# The certificates of trusted parties. By default a directory named
# 'trusted' is expected under the base directory of the policy. It is
# mandatory to create the directory so that it exists, because it cannot
# be auto-created (for security purposes).
#
# To enforce client authentication client_auth must be set to 'require'!
#dbms.ssl.policy.default.trusted_dir=
# Client authentication setting. Values: none, optional, require
# The default is to require client authentication.
#
# Servers are always authenticated unless explicitly overridden
# using the trust_all setting. In a mutual authentication setup this
# should be kept at the default of require and trusted certificates
# must be installed in the trusted_dir.
#dbms.ssl.policy.default.client_auth=require
# A comma-separated list of allowed TLS versions.
# By default only TLSv1.2 is allowed.
#dbms.ssl.policy.default.tls_versions=
# A comma-separated list of allowed ciphers.
# The default ciphers are the defaults of the JVM platform.
#dbms.ssl.policy.default.ciphers=
#*****************************************************************
# Logging configuration
#*****************************************************************
# To enable HTTP logging, uncomment this line
#dbms.logs.http.enabled=true
# Number of HTTP logs to keep.
#dbms.logs.http.rotation.keep_number=5
# Size of each HTTP log that is kept.
#dbms.logs.http.rotation.size=20m
# To enable GC Logging, uncomment this line
#dbms.logs.gc.enabled=true
# GC Logging Options
# see http://docs.oracle.com/cd/E19957-01/819-0084-10/pt_tuningjava.html#wp57013 for more information.
#dbms.logs.gc.options=-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution
# Number of GC logs to keep.
#dbms.logs.gc.rotation.keep_number=5
# Size of each GC log that is kept.
#dbms.logs.gc.rotation.size=20m
# Size threshold for rotation of the debug log. If set to zero then no rotation will occur. Accepts a binary suffix "k",
# "m" or "g".
#dbms.logs.debug.rotation.size=20m
# Maximum number of history files for the internal log.
#dbms.logs.debug.rotation.keep_number=7
#*****************************************************************
# Miscellaneous configuration
#*****************************************************************
# Enable this to specify a parser other than the default one.
#cypher.default_language_version=3.0
# Determines if Cypher will allow using file URLs when loading data using
# `LOAD CSV`. Setting this value to `false` will cause Neo4j to fail `LOAD CSV`
# clauses that load data from the file system.
#dbms.security.allow_csv_import_from_file_urls=true
# Value of the Access-Control-Allow-Origin header sent over any HTTP or HTTPS
# connector. This defaults to '*', which allows broadest compatibility. Note
# that any URI provided here limits HTTP/HTTPS access to that URI only.
#dbms.security.http_access_control_allow_origin=*
# Value of the HTTP Strict-Transport-Security (HSTS) response header. This header
# tells browsers that a webpage should only be accessed using HTTPS instead of HTTP.
# It is attached to every HTTPS response. Setting is not set by default so
# 'Strict-Transport-Security' header is not sent. Value is expected to contain
# dirictives like 'max-age', 'includeSubDomains' and 'preload'.
#dbms.security.http_strict_transport_security=
# Retention policy for transaction logs needed to perform recovery and backups.
dbms.tx_log.rotation.retention_policy=1 days
# Enable a remote shell server which Neo4j Shell clients can log in to.
#dbms.shell.enabled=true
# The network interface IP the shell will listen on (use 0.0.0.0 for all interfaces).
#dbms.shell.host=127.0.0.1
# The port the shell will listen on, default is 1337.
#dbms.shell.port=1337
# Only allow read operations from this Neo4j instance. This mode still requires
# write access to the directory for lock purposes.
#dbms.read_only=false
# Comma separated list of JAX-RS packages containing JAX-RS resources, one
# package name for each mountpoint. The listed package names will be loaded
# under the mountpoints specified. Uncomment this line to mount the
# org.neo4j.examples.server.unmanaged.HelloWorldResource.java from
# neo4j-server-examples under /examples/unmanaged, resulting in a final URL of
# http://localhost:7474/examples/unmanaged/helloworld/{nodeId}
#dbms.unmanaged_extension_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged
#********************************************************************
# JVM Parameters
#********************************************************************
# G1GC generally strikes a good balance between throughput and tail
# latency, without too much tuning.
dbms.jvm.additional=-XX:+UseG1GC
# Have common exceptions keep producing stack traces, so they can be
# debugged regardless of how often logs are rotated.
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow
# Make sure that `initmemory` is not only allocated, but committed to
# the process, before starting the database. This reduces memory
# fragmentation, increasing the effectiveness of transparent huge
# pages. It also reduces the possibility of seeing performance drop
# due to heap-growing GC events, where a decrease in available page
# cache leads to an increase in mean IO response time.
# Try reducing the heap memory, if this flag degrades performance.
dbms.jvm.additional=-XX:+AlwaysPreTouch
# Trust that non-static final fields are really final.
# This allows more optimizations and improves overall performance.
# NOTE: Disable this if you use embedded mode, or have extensions or dependencies that may use reflection or
# serialization to change the value of final fields!
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields
# Disable explicit garbage collection, which is occasionally invoked by the JDK itself.
dbms.jvm.additional=-XX:+DisableExplicitGC
# Remote JMX monitoring, uncomment and adjust the following lines as needed. Absolute paths to jmx.access and
# jmx.password files are required.
# Also make sure to update the jmx.access and jmx.password files with appropriate permission roles and passwords,
# the shipped configuration contains only a read only role called 'monitor' with password 'Neo4j'.
# For more details, see: http://download.oracle.com/javase/8/docs/technotes/guides/management/agent.html
# On Unix based systems the jmx.password file needs to be owned by the user that will run the server,
# and have permissions set to 0600.
# For details on setting these file permissions on Windows see:
# http://docs.oracle.com/javase/8/docs/technotes/guides/management/security-windows.html
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.port=3637
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.authenticate=true
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.ssl=false
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.password.file=/absolute/path/to/conf/jmx.password
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.access.file=/absolute/path/to/conf/jmx.access
# Some systems cannot discover host name automatically, and need this line configured:
#dbms.jvm.additional=-Djava.rmi.server.hostname=$THE_NEO4J_SERVER_HOSTNAME
# Expand Diffie Hellman (DH) key size from default 1024 to 2048 for DH-RSA cipher suites used in server TLS handshakes.
# This is to protect the server from any potential passive eavesdropping.
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048
# This mitigates a DDoS vector.
dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true
#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
# using this configuration file has been installed as a service.
# Please uninstall the service before modifying this section. The
# service can then be reinstalled.
# Name of the service
dbms.windows_service_name=neo4j
#********************************************************************
# Other Neo4j system properties
#********************************************************************
dbms.jvm.additional=-Dunsupported.dbms.udc.source=rpm
5、第五步:启动neo4j数据库
启动图数据库并查看状态
neo4j start
neo4j status
终端显示如下, 代表启动成功
(base) [root@whx ~]# neo4j start
Active database: graph.db
Directories in use:
home: /var/lib/neo4j
config: /etc/neo4j
logs: /var/log/neo4j/
plugins: /var/lib/neo4j/plugins
import: /var/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/run/neo4j
Starting Neo4j.
Started neo4j (pid 5246). It is available at http://0.0.0.0:7474/
There may be a short delay until the server is ready.
See /var/log/neo4j//neo4j.log for current status.
(base) [root@whx ~]# neo4j status
Neo4j is running at pid 5246
(base) [root@whx ~]#
6、远程访问Neo4j的可视化界面x.x.x.x:7474/browser
neo4j.conf文件中:
dbms.connector.bolt.address=0.0.0.0:7687
dbms.connector.http.address=0.0.0.0:7474
开放防火墙相应的端口
firewall-cmd --zone=public --permanent --add-port=7474/tcp
firewall-cmd --zone=public --permanent --add-port=7687/tcp
firewall-cmd --reload #一定不要忘记这句话
firewall-cmd --list-ports # 查看端口是否打开成功
(base) [root@whx ~]# firewall-cmd --list-ports
20/tcp 21/tcp 22/tcp 80/tcp 8888/tcp 39000-40000/tcp 888/tcp 7474/tcp 7687/tcp
(base) [root@whx ~]#
在你的浏览器中地址栏输入:http://<服务器ip地址>:7474/browser/
,即可看到
7、第六步: neo4j的可视化管理后台登陆:
- 访问地址: http://0.0.0.0:7474
- ConnectURL: bolt://0.0.0.0:7687
- Username: neo4j
- Password: neo4j (默认)
-
小节总结:
- neo4j图数据库的安装流程:
- 第一步: 将neo4j安装信息载入到yum检索列表.
- 第二步: 使用yum install命令安装.
- 第三步: 修改配置文件内容 /etc/neo4j/neo4j.conf.
- 第四步: 启动neo4j数据库.
- neo4j的可视化管理后台登陆:
- 访问地址: http://0.0.0.0:7474.
- ConnectURL: bolt://0.0.0.0:7687
- Username: neo4j
- Password: neo4j (默认)
- neo4j图数据库的安装流程:
三、Cypher介绍与使用
Cypher的基本概念:Cypher是neo4j图数据的查询语言, 类似于mysql数据库的sql语句, 但是它允许对图形进行富有表现力和有效的查询和更新.
Cypher的基本命令和语法:
- create命令
- match命令
- merge命令
- relationship关系命令
- where命令
- delete命令
- sort命令
- 字符串函数
- 聚合函数
- index索引命令
1、create命令: 创建图数据中的节点
1.1 创建命令格式一
CREATE (e:Employee{id:222, name:'Bob', salary:6000, deptnp:12})
-
此处create是关键字, e为’节点’(相当于mysql中的表中的一条记录)的变量名称, Employee为’节点标签’(相当于myslq中的一张表), e和Employee放在小括号里面(),中间用冒号表示关系
-
后面把所有属于
节点标签
的属性
(相当于Myslq中表的列名)放在大括号’{}‘里面, 依次写出属性名称:属性值, 不同属性用逗号’,'分隔 -
例如下面命令创建一个
节点对象
e,节点标签
是Employee, 拥有id, name, salary, deptnp四个属性;
-
节点名称 e 是当前语句中的临时变量,节点标签 Employee 才真正保存到图数据库中;
-
如果你不对对象进行操作就可以不写
节点名称
e,比如create操作:create(:程序员 { name:"小东",age:23,birthday:"1995/12/06"}) >>Added 1 label, created 1 node, set 3 properties, completed after 21 ms.
-
对匹配到或者创建的实例进行操作的时候就需要写
节点名称
e,因为只有拿到对象,才能对对象操作,这是必须的.比如下面你要返回检索的结果match(person:程序员) where person.name="小东" return person ╒══════════════════════════════════════════════╕ │"person" │ ╞══════════════════════════════════════════════╡ │{ "birthday":"1995/12/06","name":"小东","age":23}│ └──────────────────────────────────────────────┘ match(p1:程序员) where p1.name="小东" return p1 ╒══════════════════════════════════════════════╕ │"p1" │ ╞══════════════════════════════════════════════╡ │{ "birthday":"1995/12/06","name":"小东","age":23}│ └──────────────────────────────────────────────┘
由上面的例子可以看出,无论你节点起什么名字都无所谓,它相当于是python语言里面的一个变量名,指向一个对象,可以对其进行操作.
所以节点是可写可不写的.需要操作实例的时候就需要写节点,也可以理解为节点是对应实例的变量名.
但是注意标签是一定要写的.标签是Neo4j图数据库的分类.需要根据这个进行搜索.
1.2 创建命令格式二
CREATE (e:Employee) set e.id=222, e.name='Bob', e.salary=6000, e.deptnp=12
比如:
CREATE (e:Employee) set e.id=222, e.name='Bob', e.salary=6000, e.deptnp=12 return e
2、match命令: 匹配(查询)已有数据.
# match命令专门用来匹配查询, 节点名称:节点标签, 依然放在小括号内, 然后使用return语句返回查询结果, 和SQL很相似.
MATCH (e:Employee) RETURN e.id, e.name, e.salary, e.deptno
3、merge命令: 若节点存在, 则等效与match命令; 节点不存在, 则等效于create命令.
MERGE (e:Employee {id:146, name:'Lucer', salary:3500, deptno:16})
然后再次用merge查询, 发现数据库中的数据并没有增加, 因为已经存在相同的数据了, merge匹配成功.
MERGE (e:Employee {id:146, name:'Lucer', salary:3500, deptno:16})
4、使用create创建关系:
必须创建有方向性的关系, 否则报错
# 创建一个节点p1到p2的有方向关系, 这个关系r的标签为Buy, 代表p1购买了p2, 方向为p1指向p2
CREATE (p1:Profile1)-[r:Buy]->(p2:Profile2)
5、使用merge创建关系
可以创建有/无方向性的关系.
# 创建一个节点p1到p2的无方向关系, 这个关系r的标签为miss, 代表p1-miss-p2, 方向为相互的
MERGE (p1:Profile1)-[r:miss]-(p2:Profile2)
6、where命令: 类似于SQL中的添加查询条件
# 查询节点Employee中, id值等于123的那个节点
MATCH (e:Employee) WHERE e.id=123 RETURN e
7、delete命令: 删除节点/关系及其关联的属性.
# 注意: 删除节点的同时, 也要删除关联的关系边
MATCH (c1:CreditCard)-[r]-(c2:Customer) DELETE c1, r, c2
match(t:Teacher) delete t
match(s:Student)-[r]-(t:Teacher) delete r,s,t
delete节点时,如果节点之间还有关系会报错
match(t:Teacher) detach delete t
直接将节点和关系一起删除
快速清空数据库:
MATCH (n) DETACH DELETE n
8、sort命令: Cypher命令中的排序使用的是order by
# 匹配查询标签Employee, 将所有匹配结果按照id值升序排列后返回结果
MATCH (e:Employee) RETURN e.id, e.name, e.salary, e.deptno ORDER BY e.id
# 如果要按照降序排序, 只需要将ORDER BY e.salary改写为ORDER BY e.salary DESC
MATCH (e:Employee) RETURN e.id, e.name, e.salary, e.deptno ORDER BY e.salary DESC
9、字符串函数:
- toUpper()函数
- toLower()函数
- substring()函数
- replace()函数
9.1 toUpper()函数
将一个输入字符串转换为大写字母.
MATCH (e:Employee) RETURN e.id, toUpper(e.name), e.salary, e.deptno
9.2 toLower()函数
讲一个输入字符串转换为小写字母.
MATCH (e:Employee) RETURN e.id, toLower(e.name), e.salary, e.deptno
9.3 substring()函数
返回一个子字符串.
# 输入字符串为input_str, 返回从索引start_index开始, 到end_index-1结束的子字符串
substring(input_str, start_index, end_index)
# 示例代码, 返回员工名字的前两个字母
MATCH (e:Employee) RETURN e.id, substring(e.name,0,2), e.salary, e.deptno
9.4 replace()函数
替换掉子字符串.
# 输入字符串为input_str, 将输入字符串中符合origin_str的部分, 替换成new_str
replace(input_str, origin_str, new_str)
# 示例代码, 将员工名字替换为添加后缀_HelloWorld
MATCH (e:Employee) RETURN e.id, replace(e.name,e.name,e.name + "_HelloWorld"), e.salary, e.deptno
10、聚合函数
- count()函数
- max()函数
- min()函数
- sum()函数
- avg()函数
10.1 count()函数
返回由match命令匹配成功的条数.
# 返回匹配标签Employee成功的记录个数
MATCH (e:Employee) RETURN count( * )
10.2 max()函数
返回由match命令匹配成功的记录中的最大值.
# 返回匹配标签Employee成功的记录中, 最高的工资数字
MATCH (e:Employee) RETURN max(e.salary)
10.3 min()函数
返回由match命令匹配成功的记录中的最小值.
# 返回匹配标签Employee成功的记录中, 最低的工资数字
MATCH (e:Employee) RETURN min(e.salary)
10.4 sum()函数
返回由match命令匹配成功的记录中某字段的全部加和值.
# 返回匹配标签Employee成功的记录中, 所有员工工资的和
MATCH (e:Employee) RETURN sum(e.salary)
10.5 avg()函数
返回由match命令匹配成功的记录中某字段的平均值.
# 返回匹配标签Employee成功的记录中, 所有员工工资的平均值
MATCH (e:Employee) RETURN avg(e.salary)
11、索引index
Neo4j支持在节点或关系属性上的索引, 以提高查询的性能.
可以为具有相同标签名称的所有节点的属性创建索引.
11.1 创建索引
使用create index on来创建索引.
# 创建节点Employee上面属性id的索引
CREATE INDEX ON:Employee(id)
11.2 删除索引
使用drop index on来删除索引.
# 删除节点Employee上面属性id的索引
DROP INDEX ON:Employee(id)
小节总结:
Cypher的基本概念:Cypher是neo4j图数据的查询语言, 类似于mysql数据库的sql语句, 但是它允许对图形进行富有表现力和有效的查询和更新.
Cypher的基本命令和语法:
* create命令
* match命令
* merge命令
* relationship关系命令
* where命令
* delete命令
* sort命令
* 字符串函数
* 聚合函数
* index索引命令
- create命令: 创建图数据中的节点.
- CREATE (e:Employee{id:222, name:‘Bob’, salary:6000, deptnp:12})
- match命令: 匹配(查询)已有数据.
- MATCH (e:Employee) RETURN e.id, e.name, e.salary, e.deptno
- merge命令: 若节点存在, 则等效与match命令; 节点不存在, 则等效于create命令.
- MERGE (e:Employee {id:145, name:‘Lucy’, salary:7500, deptno:12})
- 使用create创建关系: 必须创建有方向性的关系, 否则报错.
- CREATE (p1:Profile1)-[r:Buy]->(p2:Profile2)
- 使用merge创建关系: 可以创建有/无方向性的关系.
- MERGE (p1:Profile1)-[r:miss]-(p2:Profile2)
- where命令: 类似于SQL中的添加查询条件.
- MATCH (e:Employee) WHERE e.id=123 RETURN e
- delete命令: 删除节点/关系及其关联的属性.
- MATCH (c1:CreditCard)-[r]-(c2:Customer) DELETE c1, r, c2
- sort命令: Cypher命令中的排序使用的是order by.
- MATCH (e:Employee) RETURN e.id, e.name, e.salary, e.deptno ORDER BY e.id
- 字符串函数:
- toUpper()函数
- toLower()函数
- substring()函数
- replace()函数
- toUpper()函数: 将一个输入字符串转换为大写字母.
- MATCH (e:Employee) RETURN e.id, toUpper(e.name), e.salary, e.deptno
- toLower()函数: 讲一个输入字符串转换为小写字母.
- MATCH (e:Employee) RETURN e.id, toLower(e.name), e.salary, e.deptno
- substring()函数: 返回一个子字符串.
- MATCH (e:Employee) RETURN e.id, substring(e.name,0,2), e.salary, e.deptno
- replace()函数: 替换掉子字符串.
- MATCH (e:Employee) RETURN e.id, replace(e.name,e.name,e.name + “_HelloWorld”), e.salary, e.deptno
- 聚合函数
- count()函数
- max()函数
- min()函数
- sum()函数
- avg()函数
- count()函数: 返回由match命令匹配成功的条数.
- MATCH (e:Employee) RETURN count( * )
- max()函数: 返回由match命令匹配成功的记录中的最大值.
- MATCH (e:Employee) RETURN max(e.salary)
- min()函数: 返回由match命令匹配成功的记录中的最小值.
- MATCH (e:Employee) RETURN min(e.salary)
- sum()函数: 返回由match命令匹配成功的记录中某字段的全部加和值.
- MATCH (e:Employee) RETURN sum(e.salary)
- avg()函数: 返回由match命令匹配成功的记录中某字段的平均值.
- MATCH (e:Employee) RETURN avg(e.salary)
- 索引index
- Neo4j支持在节点或关系属性上的索引, 以提高查询的性能.
- 可以为具有相同标签名称的所有节点的属性创建索引.
- 创建索引: 使用create index on来创建索引.
- CREATE INDEX ON:Employee(id)
- 删除索引: 使用drop index on来删除索引.
- DROP INDEX ON:Employee(id)
四、在Python中使用neo4j
neo4j-driver简介: neo4j-driver是一个python中的package, 作为python中neo4j的驱动, 帮助我们在python程序中更好的使用图数据库.
1、neo4j-driver的安装:
pip install neo4j-driver
2、neo4j-driver使用演示:
from neo4j import GraphDatabase
# 关于neo4j数据库的用户名,密码信息已经配置在同目录下的config.py文件中
from config import NEO4J_CONFIG
driver = GraphDatabase.driver( **NEO4J_CONFIG)
# 直接用python代码形式访问节点Company, 并返回所有节点信息
with driver.session() as session:
cypher = "CREATE(c:Company) SET c.name='黑马程序员' RETURN c.name"
record = session.run(cypher)
result = list(map(lambda x: x[0], record))
print("result:", result)
输出效果:
result: 黑马程序员
3、事务的概念
如果一组数据库操作要么全部发生要么一步也不执行,我们称该组处理步骤为一个事务, 它是数据库一致性的保证.
使用事务的演示:
def _some_operations(tx, cat_name, mouse_name):
tx.run("MERGE (a:Cat{name: $cat_name})"
"MERGE (b:Mouse{name: $mouse_name})"
"MERGE (a)-[r:And]-(b)",
cat_name=cat_name, mouse_name=mouse_name)
with driver.session() as session:
session.write_transaction(_some_operations, "Tom", "Jerry")
输出效果:
五、Python连接Neo4j工具:py2neo
# -*- coding: utf-8 -*-
from py2neo import Node, Graph, Relationship,NodeMatcher
# 将excel中数据存入neo4j
class DataToNeo4j(object):
def __init__(self):
link = Graph("http://localhost:7474", username="neo4j", password="123456") # 建立连接
self.graph = link
self.buy = 'buy' # 定义label
self.sell = 'sell' # 定义label
self.graph.delete_all() # 清空数据库
self.matcher = NodeMatcher(link) # 匹配关系的方法
""" 示例
node3 = Node('animal' , name = 'cat')
node4 = Node('animal' , name = 'dog')
node2 = Node('Person' , name = 'Alice')
node1 = Node('Person' , name = 'Bob')
r1 = Relationship(node2 , 'know' , node1)
r2 = Relationship(node1 , 'know' , node3)
r3 = Relationship(node2 , 'has' , node3)
r4 = Relationship(node4 , 'has' , node2)
self.graph.create(node1)
self.graph.create(node2)
self.graph.create(node3)
self.graph.create(node4)
self.graph.create(r1)
self.graph.create(r2)
self.graph.create(r3)
self.graph.create(r4)
"""
# 建立节点
def create_node(self, node_buy_key,node_sell_key):
for name in node_buy_key:
buy_node = Node(self.buy, name=name)
self.graph.create(buy_node)
for name in node_sell_key:
sell_node = Node(self.sell, name=name)
self.graph.create(sell_node)
# 建立关系
def create_relation(self, df_data):
m = 0
for m in range(0, len(df_data)):
try:
print(list(self.matcher.match(self.buy).where("_.name=" + "'" + df_data['buy'][m] + "'")), list(self.matcher.match(self.sell).where("_.name=" + "'" + df_data['sell'][m] + "'")))
relation = Relationship(self.matcher.match(self.buy).where("_.name=" + "'" + df_data['buy'][m] + "'").first(),
df_data['money'][m],
self.matcher.match(self.sell).where("_.name=" + "'" + df_data['sell'][m] + "'").first()
)
self.graph.create(relation)
except AttributeError as e:
print(e, m)
# -*- coding: utf-8 -*-
from dataToNeo4jClass.DataToNeo4jClass import DataToNeo4j
import os
import pandas as pd
#pip install py2neo==5.0b1 注意版本,要不对应不了【可以先阅读下文档:https://py2neo.org/v4/index.html】
invoice_data = pd.read_excel('./Invoice_data_Demo.xls', header=0)
print("invoice_data = {}".format(invoice_data))
# 从excel文件中抽取“节点”数据
def data_extraction():
node_buy_key, node_sell_key = [], []
for i in range(0, len(invoice_data)): # 取出购买方名称到list
node_buy_key.append(invoice_data['购买方名称'][i])
for i in range(0, len(invoice_data)): # 取出销售方名称到list
node_sell_key.append(invoice_data['销售方名称'][i])
node_buy_key, node_sell_key = list(set(node_buy_key)), list(set(node_sell_key)) # 去重
return node_buy_key, node_sell_key
# 从excel文件中抽取“关系”数据
def relation_extraction():
links_dict, sell_list, money_list, buy_list = {
}, [], [], []
for i in range(0, len(invoice_data)):
money_list.append(invoice_data[invoice_data.columns[19]][i])#金额
sell_list.append(invoice_data[invoice_data.columns[10]][i])#销售方方名称
buy_list.append(invoice_data[invoice_data.columns[6]][i])#购买方名称
sell_list, buy_list, money_list = [str(i) for i in sell_list], [str(i) for i in buy_list], [str(i) for i in money_list] # 将数据中int类型全部转成string
links_dict['buy'], links_dict['money'], links_dict['sell']= buy_list, money_list, sell_list # 整合数据,将三个list整合成一个dict
df_data = pd.DataFrame(links_dict) # 将数据转成DataFrame
print("df_data= {}".format(df_data))
return df_data
relation_extraction()
create_data = DataToNeo4j()
create_data.create_node(data_extraction()[0], data_extraction()[1])
create_data.create_relation(relation_extraction())
输出结果:
在这里插入代码片
六、Neo4j常见问题
1、如果浏览器端已经打开了Neo4j可视化界面,则在服务器端启动Neo4j时报错
Store and its lock file has been locked by another process: /var/lib/neo4j/data/databases/graph.db/store_lock
Starting Neo4j failed: Component ‘org.neo4j.server.database.LifecycleManagingDatabase@1458ed9c’ was successfully initialized, but failed to start. Please see the attached cause exception “Externally locked: /var/lib/neo4j/data/databases/graph.db/neostore”.