案例来源于 Neo4j权威指南 第七章
图的node的labels包括:
账户持有人:AccountHolder
地址:Address
电话:PhoneNumber
信用卡账号:CreditCard
SSN号码:SSN
银行账号:BankAccount
无抵押贷款:UnsecureLoan
图的relationship是账户持有人has其他的node,如HAS_SSN
建图的cypher语句如下:
扫描二维码关注公众号,回复:
10553476 查看本文章
//创建三个人的账户
create (accountHolder1:AccountHolder{
FirstName:"John",
LastName:"Doe",
UniqueId:"JohnDoe"
})
create (accountHolder2:AccountHolder{
FirstName:"Jane",
LastName:"Appleseed",
UniqueId:"JaneAppleseed"
})
create (accountHolder3:AccountHolder{
FirstName:"Matt",
LastName:"Smith",
UniqueId:"MattSmith"
})
//创建地址
create (address1:Address{
Street:"123 NW 1st Street",
City:"San Francisco",
State:"California",
ZipCOde:"94101"
})
//把账户人1、账户人2、账户人3关联到地址上
create (accountHolder1)-[:HAS_ADDRESS]->(address1),
(accountHolder2)-[:HAS_ADDRESS]->(address1),
(accountHolder3)-[:HAS_ADDRESS]->(address1)
//创建电话号码
create (phoneNumber1:PhoneNumber{
PhoneNumber:"555-555-555"
})
//把账户1、账户2关联到电话号码1上
create (accountHolder1)-[:HAS_PHONENUMBER]->(phoneNumber1),
(accountHolder2)-[:HAS_PHONENUMBER]->(phoneNumber1)
//创建社会安码 SSN1
create (ssn1:SSN{
SSN:"241-23-1234"
})
//把账户人2、账户人3关联到SSN1
create (accountHolder2)-[:HAS_SSN]->(ssn1),
(accountHolder3)-[:HAS_SSN]->(ssn1)
//创建社会安全码SSN2 关联到账户人1
create (ssn2:SSN{SSN:"241-23-4567"})<-[:HAS_SSN]-(accountHolder1)
//创建信用卡1 关联到账户1
create (creditCard1:CreditCard{
AccountNumber:"1234567890123456",
Limit:5000,
Balance:1442.23,
ExpirationData:'02-20',
SecurityCode:'456'
})<-[:HAS_CREDITCARD]-(accountHolder1)
//创建信用卡2 关联到账户2
create (creditCard2:CreditCard{
AccountNumber:"2345678901234567",
Limit:4000,
Balance:2345.56,
ExpirationData:'02-20',
SecurityCode:'456'
})<-[:HAS_CREDITCARD]-(accountHolder2)
//创建银行账户1 关联到账户1
create (bankAccount1:BankAccount{
AccountNumber:"2345678901234567",
Balance:7054.43
})<-[:HAS_BANKACCOUNT]-(accountHolder1)
//创建银行账户2 关联到账户2
create (bankAccount2:BankAccount{
AccountNumber:"3456789012345678",
Balance:4231.12
})<-[:HAS_BANKACCOUNT]-(accountHolder2)
//创建银行账户3 关联到账户3
create (bankAccount3:BankAccount{
AccountNumber:"4567890123456789",
Balance:12345.45
})<-[:HAS_BANKACCOUNT]-(accountHolder3)
//创建无抵押贷款2并关联到账户人2
create (unsecuredLoan2:UnsecureLoan{
AccountNumber:'4567890123456789-0',
Balance:90453,
APR:0.0541,
LoanAmount:12000
})<-[:HAS_UNSECUREDLOAN]-(accountHolder2)
//创建无抵押贷款3并关联到账户人3
create (unsecuredLoan3:UnsecureLoan{
AccountNumber:'5678901234567890-0',
Balance:16341.95,
APR:0.0341,
LoanAmount:22000
})<-[:HAS_UNSECUREDLOAN]-(accountHolder3)
//创建电话号码3 并关联到账户3
create (phoneNumber2:PhoneNumber
{PhoneNumber:'555-555-1234'
})<-[:HAS_PHONENUMBER]-(accountHolder3)
return *
通过共同使用账号信息来发现欺诈人或者团伙,查询语句如下:
//书本的写法
match (accountHolder:AccountHolder)-[]->(contractInformation)
with contractInformation,count(accountHolder) as RingSize
match (contractInformation)<-[]-(accountHolder)
with collect(accountHolder.UniqueId) as AccountHolders,
contractInformation,RingSize
where RingSize>1
return AccountHolders as FraudRing,
labels(contractInformation) as ContractType,
RingSize
order by RingSize desc
//更加简洁的写法
match (accountHolder:AccountHolder)-[]->(contractInformation)
//最小划分粒度contractInformation 相当于按照contractInformation进行了groupby
with contractInformation,count(accountHolder) as RingSize,collect(accountHolder.UniqueId) as AccountHolders
where RingSize>1
return AccountHolders as FraudRing,
labels(contractInformation) as ContractType,
RingSize
order by RingSize desc
查询结果如下:
下面是一种得到错误结果的查询语句,mark一下吧:
match (accountHolder:AccountHolder)-[]->(contractInformation)
//这里按照contractInformation的labels进行groupby是错误的
//不同的contractInformation可以有相同的label,如SSN1、SSN2
with count(accountHolder) as RingSize,labels(contractInformation) as ContractType,collect(accountHolder.UniqueId) as AccountHolders
where RingSize>1
return RingSize,ContractType,AccountHolders
order by RingSize desc
计算风险敞口
//计算风险敞口
match (accountHolder:AccountHolder)-[]->(contractInformation)
with contractInformation,count(accountHolder) as RingSize
match (contractInformation)<-[]-(accountHolder),
(accountHolder)-[r:HAS_CREDITCARD|HAS_UNSECUREDLOAN]->(unsecuredAccount)
with collect(distinct accountHolder.UniqueId) as AccountHolders,contractInformation,RingSize,
sum(case type(r)
when "HAS_CREDITCARD" then unsecuredAccount.Limit
when "HAS_UNSECUREDLOAN" then unsecuredAccount.Balance
else 0
end) as FinancialRisk
where RingSize>1
return AccountHolders as FraudRing,
labels(contractInformation) as ContractType,
RingSize,
round(FinancialRisk) as FinancialRisk
order by RingSize desc
结果如下:
删除边和节点的语句:
//删除数据 先删除边,再删除点
MATCH(n:AccountHolder)-[r]->(contractInformation)
DELETE r
MATCH (n:AccountHolder)
DELETE n
以上案例其实并没有很好的展现cypher的优势,一度关系通过关系数据库直接groupy也能得到,后面我找到合适的数据集,会用cypher进行类似的欺诈群体分析