web安全之机器学习入门笔记-图算法与知识图谱

web安全之机器学习入门笔记-图算法与知识图谱


webshell具有很多访问特征,和有向图相关的为:

  • 入度出度均为0

    • 独立的页面
  • 入度出度均为1且自己指向自己


0.处理流程:

  • 1.原始日志数据
  • 2.提取请求和refer字段(开启自定义日志格式)
  • 3.导入图数据库
  • 4.查询入度出度均为0或1的节点

1.原始日志数据:

2.提取请求和refer字段:

处理后的日志数据:

reffer -> path

- -> http://180.76.190.79/wordpress/wp-admin/1.php
- -> http://180.76.190.79/wordpress/wp-admin/admin-ajax.php
- -> http://180.76.190.79/wordpress/wp-admin/customize.php
- -> http://180.76.190.79/wordpress/wp-admin/load-styles.php
- -> http://180.76.190.79/wordpress/wp-admin/post-new.php
- -> http://180.76.190.79/wordpress/wp-login.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/edit-comments.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/profile.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-login.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/xmlrpc.php
http://180.76.190.79/wordpress/wp-admin/ -> http://180.76.190.79/wordpress/wp-login.php

http://180.76.190.79/wordpress/wp-admin/1.php为webshell


3.导入图数据库

neo4j数据库脚本操作

  • 删除:
MATCH (n:Page) detach delete n
RETURN n
  • 查询疑似webshell链接:

match (n:Page) where (n.in=1 and n.out=0) or (n.in=1 and n.out=1) return n.url

逐行读取,生成节点以及关联关系:

代码经过修改才能跑

for line in file_object:
    matchObj = re.match( r'(\S+) -> (\S+)', line, re.M|re.I)
    if matchObj:
        ref = matchObj.group(1)
        path = matchObj.group(2)
    if path in nodes.keys(): # 如果该节点是已有节点
        path_node = nodes[path] #
    else: # 节点不存在
        path_node = "Page%d" % index #
        nodes[path] = path_node
        sql = "create (%s:Page {url:\"%s\" , id:\"%d\",in:0,out:0})" %(path_node,path,index) # 初始化节点属性 出入度均为0 
        index=index+1
        session.run(sql)
        print sql
    if ref in nodes.keys(): # 如果该节点是已有节点
        ref_node = nodes[ref]
    else: 
        ref_node = "Page%d" % index
        nodes[ref] = ref_node
        sql = "create (%s:Page {url:\"%s\",id:\"%d\",in:0,out:0})" %(ref_node,ref,index)
        index=index+1
        session.run(sql)
        print sql

更新节点出入度属性:

sql = "match (n:Page {url:\"%s\"}) SET n.out=n.out+1" % ref # 来源页面设置出度为1
session.run(sql)
print sql
sql = "match (n:Page {url:\"%s\"}) SET n.in=n.in+1" % path # 目标页面设置入度为1
session.run(sql)
print sql

# 插入边,插入关系
sql =  '''
MATCH (a:Page),(b:Page)
WHERE a.url = '{path}' AND b.url = '{ref}'
CREATE (b)-[r:Point]->(a);
'''.format(path=path,ref=ref)
session.run(sql)
print sql

4.查询入度出度均为0或1的节点:

网页关联关系可视化结果:

查询入度出度均为0或1的节点:疑似webshell的链接
match (n:Page) where (n.in=1 and n.out=0) or (n.in=1 and n.out=1) return n.url

http://180.76.190.79/wordpress/wp-admin/1.php为webshell
其他为误报

常见误报有:

  • 主页,各种index页面
  • Phpmyadmin、Zabbix 等运维管理后台
  • Hadoop、ELK等开源软件的控制台
  • API接口

难点在于 扫描器对结果的影响,这部分需要通过 扫描器指纹人机算法 来去掉干扰


参考:

猜你喜欢

转载自blog.csdn.net/qq_28921653/article/details/80560214