图算法与知识图谱

版权声明:请多多关注博主哟~ https://blog.csdn.net/qq_37865996/article/details/87863910

1.图算法

https://baike.baidu.com/item/图算法/10767301

2.Neo4j

这是一个高性能的图形数据库,我之前也有介绍过。在此直接进行应用。之前我都是在Win10虚拟机中使用的,这里正好在安装的同时,也把步骤记录下来。

下载:https://neo4j.com/download/neo4j-desktop/?edition=desktop&flavour=osx&release=1.1.15&offline=true

正常操作后,可以启动管理界面。

Neo4j分为桌面版和社区版,个人比较偏向于桌面版,容易管理:

SDK安装:

Python API安装:

(base) zhanglipengdeMacBook-Pro:~ zhanglipeng$ sudo pip install neo4j-driver
Password:
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
The directory '/Users/zhanglipeng/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/zhanglipeng/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting neo4j-driver
  Downloading https://files.pythonhosted.org/packages/fc/e2/ce6e4d08c0332e5cf501e4513872de5aca03f710bdb309cbc51f8daa7053/neo4j-driver-1.7.1.tar.gz
Collecting neobolt<2,>=1.7.3 (from neo4j-driver)
  Downloading https://files.pythonhosted.org/packages/ba/02/641c5241db092f75bce1334cb728d3fb48f4dddc5d21401fe94a5ed636ad/neobolt-1.7.4.tar.gz (182kB)
    100% |████████████████████████████████| 184kB 10kB/s 
Collecting neotime<2,>=1.7.1 (from neo4j-driver)
  Downloading https://files.pythonhosted.org/packages/0b/7e/ca368a8d8e288be1352d4e2df35da1e01f8aaffbf526695df71630bcb8a6/neotime-1.7.4.tar.gz
Requirement already satisfied: pytz in /anaconda2/lib/python2.7/site-packages (from neotime<2,>=1.7.1->neo4j-driver) (2018.5)
Requirement already satisfied: six in /anaconda2/lib/python2.7/site-packages (from neotime<2,>=1.7.1->neo4j-driver) (1.11.0)
Installing collected packages: neobolt, neotime, neo4j-driver
  Running setup.py install for neobolt ... done
  Running setup.py install for neotime ... done
  Running setup.py install for neo4j-driver ... done
Successfully installed neo4j-driver-1.7.1 neobolt-1.7.4 neotime-1.7.4
You are using pip version 19.0.2, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

安装JPype:

(base) zhanglipengdeMacBook-Pro:~ zhanglipeng$ pip install JPype1
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Collecting JPype1
  Downloading https://files.pythonhosted.org/packages/c4/4b/60a3e63d51714d4d7ef1b1efdf84315d118a0a80a5b085bb52a7e2428cdc/JPype1-0.6.3.tar.gz (168kB)
    100% |████████████████████████████████| 174kB 34kB/s 
Building wheels for collected packages: JPype1
  Building wheel for JPype1 (setup.py) ... done
  Stored in directory: /Users/zhanglipeng/Library/Caches/pip/wheels/0e/2b/e8/c0b818ac4b3d35104d35e48cdc7afe27fc06ea277feed2831a
Successfully built JPype1
Installing collected packages: JPype1
Successfully installed JPype1-0.6.3
You are using pip version 19.0.2, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

数据的导入和展示:

#导入并连接数据库
from neo4j.v1 import GraphDatabase, basic_auth
driver = GraphDatabase.driver("bolt://localhost", auth=basic_auth("neo4j", "maidou"))
session = driver.session()

# 插入人物节点
insert_query = '''
UNWIND {pairs} as pair
MERGE (p1:Person {name:pair[0]})
MERGE (p2:Person {name:pair[1]})
MERGE (p1)-[:KNOWS]-(p2);
'''

data = [["Jim","Mike"],["Jim","Billy"],["Anna","Jim"],
          ["Anna","Mike"],["Sally","Anna"],["Joe","Sally"],
          ["Joe","Bob"],["Bob","Sally"]]

session.run(insert_query, parameters={"pairs": data})

# Friends of a friend    关注关系

foaf_query = '''
MATCH (person:Person)-[:KNOWS]-(friend)-[:KNOWS]-(foaf)
WHERE person.name = {name}
  AND NOT (person)-[:KNOWS]-(foaf)
RETURN foaf.name AS name
'''

print 1
results = session.run(foaf_query, parameters={"name": "Joe"})
for record in results:
    print(record["name"])


# Common friends

common_friends_query = """
MATCH (user:Person)-[:KNOWS]-(friend)-[:KNOWS]-(foaf:Person)
WHERE user.name = {user} AND foaf.name = {foaf}
RETURN friend.name AS friend
"""

print 2
results = session.run(common_friends_query, parameters={"user": "Joe", "foaf": "Sally"})
for record in results:
    print(record["friend"])

# Connecting paths

connecting_paths_query = """
MATCH path = shortestPath((p1:Person)-[:KNOWS*..6]-(p2:Person))
WHERE p1.name = {name1} AND p2.name = {name2}
RETURN path
"""

print 3
results = session.run(connecting_paths_query, parameters={"name1": "Joe", "name2": "Billy"})
for record in results:
    print (record["path"])


session.close()

(python27) zhanglipengdeMacBook-Pro:WSaL zhanglipeng$ python 13-1.py

Traceback (most recent call last):

  File "13-1.py", line 1, in <module>

    from neo4j.v1 import GraphDatabase, basic_auth

ImportError: No module named neo4j.v1

一直报错的情况下,也没找到太好的解决方法。

决定引入py2neo库,使用pip安装即可。

新写了一个简单的例子:

from py2neo import Graph,Node,Relationship
test_graph=Graph("bolt://localhost:7687",username="neo4j",password="12345")

test_node_1=Node(label="Person",name="Alice")
test_node_2=Node(label="Person",name="Bob")
A_Knows_B=Relationship(test_node_1,"Knows",test_node_2)
A_Knows_B['count']=1
B_Knows_A=Relationship(test_node_2,"Knows",test_node_1)
B_Knows_A['count']=2
test_graph.create(A_Knows_B)
test_graph.create(B_Knows_A)
A_Knows_B['count']+=1
test_graph.push(A_Knows_B)

find_node_1=test_graph.find_one(label="Person",property_key="name",property_value="test_node_1")
print find_node_1['name']

3.使用有向图识别Webshell

整理访问日志,如格式为:

逐行读取,生成节点和边,把汝独处读作为节点的属性,并不断进行更新:

最后代码为:

import re
from py2neo import Graph,Node,Relationship

nodes={}
index=1

driver = Graph("bolt://localhost:7687",username="neo4j",password="12345")
session = driver.session()

file_object = open('r-graph.txt', 'r')
try:
    for line in file_object:
        matchObj = re.match( r'(\S+) -> (\S+)', line, re.M|re.I)
    if matchObj:
        path = matchObj.group(1);
        ref = matchObj.group(2);
    if path in nodes.keys():
        path_node = nodes[path]
    else:
        path_node = "Page%d" % index
        nodes[path]=path_node
    sql = "create (%s:Page {url:\"%s\" , id:\"%d\",in:0,out:0})" %(path_node,path,index)
    index=index+1
    session.run(sql)
    #print sql
    if ref in nodes.keys():
        ref_node = nodes[ref]
    else:
        ref_node = "Page%d" % index
        nodes[ref]=ref_node
    sql = "create (%s:Page {url:\"%s\",id:\"%d\",in:0,out:0})" %(ref_node,ref,index)
    index=index+1
    session.run(sql)
    #print sql
    sql = "create (%s)-[:IN]->(%s)" %(path_node,ref_node)
    session.run(sql)
    #print sql
    sql = "match (n:Page {url:\"%s\"}) SET n.out=n.out+1" % path
    session.run(sql)
    #print sql
    sql = "match (n:Page {url:\"%s\"}) SET n.in=n.in+1" % ref
    session.run(sql)
    #print sql
finally:
     file_object.close( )

session.close()

我们可以看到,代码中使用了适用于Neo4j的Cypher语言。

4.知识图谱在风控领域的应用

(1)检测疑似账号被盗

检测方法:如果某个硬件登录过两个账户,而其中某个账户曾经成功登录过另一个硬件,则初步判定这个硬件盗取了这个账户。

这里,我们通过逐行处理样本文件获取对应的uid、ip、tel、activesyncid,以uid为中心,添加对应的其他三类节点,再进行可视化:

import networkx as nx
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt





def helloWord():
    G = nx.Graph()
    G.add_node("u1")
    G.add_node("u2")
    G.add_edge("u1", "1.1.1.1")
    G.add_edge("u2", "1.1.1.1")
    nx.draw(G,with_labels=True,node_size=600)
    plt.show()

def show1():
    with open("/Users/zhanglipeng/Data/KnowledgeGraph/sample1.txt") as f:
        G = nx.Graph()
        for line in f:
            line=line.strip('\n')
            uid,ip,tel,activesyncid=line.split(',')
            G.add_edge(uid, ip)
            G.add_edge(uid, tel)
            G.add_edge(uid, activesyncid)
        nx.draw(G, with_labels=True, node_size=600)
        plt.show()

def show2():
    with open("/Users/zhanglipeng/Data/KnowledgeGraph/sample2.txt") as f:
        G = nx.Graph()
        for line in f:
            line=line.strip('\n')
            uid,ip,login,ua=line.split(',')
            G.add_edge(uid, ip)
            G.add_edge(uid, login)
            G.add_edge(uid, ua)
        nx.draw(G, with_labels=True, node_size=600)
        plt.show()

def show3():
    G = nx.Graph()
    with open("/Users/zhanglipeng/Data/KnowledgeGraph/sample3.txt") as f:
        for line in f:
            line=line.strip('\n')
            hid,uid,app=line.split(',')
            G.add_edge(hid, uid)
            G.add_edge(hid, app)
    f.close()

    with open("/Users/zhanglipeng/Data/KnowledgeGraph/sample4.txt") as f:
        for line in f:
            line=line.strip('\n')
            hid,uid,action=line.split(',')
            G.add_edge(hid, uid)
            G.add_edge(hid, action)
    f.close()

    nx.draw(G, with_labels=True, node_size=600)
    plt.show()

if __name__ == '__main__':
    print "Knowledge Graph"
    #helloWord()
    show3()

    

(2)撞库攻击

大量账户从同一个ip登录

 

(3)疑似刷单

硬件不同,但app登录名相同

5.知识图谱在威胁情报领域的应用

(1)挖掘后门文件潜在联系

黑产通常通过传播后门文件入侵主机,从而组织起庞大的僵尸网络。通过寻找指向多个C&C域名的文件,来筛选疑似后门文件。

import networkx as nx
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt


def helloWord():
    G = nx.Graph()
    G.add_node("u1")
    G.add_node("u2")
    G.add_edge("u1", "1.1.1.1")
    G.add_edge("u2", "1.1.1.1")
    nx.draw(G,with_labels=True,node_size=600)
    plt.show()
def show4():
    G = nx.Graph()
    with open("/Users/zhanglipeng/Data/KnowledgeGraph/sample5.txt") as f:
        for line in f:
            line=line.strip('\n')
            mail,domain,ip=line.split(',')
            G.add_edge(mail, domain)
            G.add_edge(domain, ip)
    f.close()

    nx.draw(G, with_labels=True, node_size=600)
    plt.show()

def show5():
    G = nx.Graph()
    with open("/Users/zhanglipeng/Data/KnowledgeGraph/sample6.txt") as f:
        for line in f:
            line=line.strip('\n')
            file,domain=line.split(',')
            G.add_edge(file, domain)

    f.close()

    nx.draw(G, with_labels=True, node_size=600)
    plt.show()
if __name__ == '__main__':
    print "Knowledge Graph"
    #helloWord()
    show5()

(2)挖掘域名的潜在联系:

猜你喜欢

转载自blog.csdn.net/qq_37865996/article/details/87863910