关于Androi恶意应用检测的记录

1. 数据准备阶段

软件爬取

恶意应用,待定,从开源项目获取,或者自己搞

非恶意应用:从应用市场爬取,不过目前看来,爬四五个就会被限制,爬虫代码如下

# coding=utf-8
import urllib
import requests
import re
import time
import socket
#timeout = 5
#socket.setdefaulttimeout(timeout)
from bs4 import BeautifulSoup

def parser_apks(self, count=30):
    _root_url = "http://app.mi.com"  # 应用市场主页网址
    res_parser = {}
    # 设置爬取的页面,从第一页开始爬取,第一页爬完爬取第二页,以此类推
    page_num = 1
    while count:
        # 获取应用列表页面
        wbdata = requests.get("http://app.mi.com/catTopList/27?page=" + str(page_num)).text
        print("开始爬取第" + str(page_num) + "页")
        # 解析应用列表页面内容
        soup = BeautifulSoup(wbdata, "html.parser")
        links = soup.find_all("a", href=re.compile("/details?"), class_="", alt="")
        for link in links:
             # 获取应用详情页面的链接
            detail_link = urllib.parse.urljoin(_root_url, str(link["href"]))
            package_name = detail_link.split("=")[1]
            download_page = requests.get(detail_link).text
            #解析应用详情页面
            soup1 = BeautifulSoup(download_page, "html.parser")
            download_link = soup1.find(class_="download")["href"]
            #获取直接下载的链接
            download_url = urllib.parse.urljoin(_root_url, str(download_link))
            # 解析后会有重复的结果,通过判断去重
            if download_url not in res_parser.values():
                res_parser[package_name] = download_url
                count = count - 1
            if count == 0:
                break
        if count > 0:
            page_num = page_num + 1
    print("爬取apk数量为: " + str(len(res_parser)))
    return res_parser

def craw_apks(self, count=30, save_path="./apk/"):
    res_dic = parser_apks(count)

    for apk in res_dic.keys():
        print("正在下载应用: " + apk)
        request = urllib.request.urlretrieve(res_dic[apk], save_path + apk + ".apk")
        print("下载完成")
        time.sleep(5) #等待一会

if __name__ == "__main__":
    craw_apks(30)

IDE

android studio,打开并反编译一个apk,可以看到.xml文件的内容信息,使用的sdk,申请的权限信息等

Xposed使用教程

Xposed 插件开发之一: Xposed入门:https://blog.csdn.net/niubitianping/article/details/52571438

Xposed地址:https://github.com/rovo89/Xposed

Xposed框架实现Android中的Hook一个例子:https://www.jianshu.com/p/372630e37683

Xposed 的一个教程,从模拟器开始:https://juejin.im/entry/5900145b0ce463006146f26b

2. 算法模型

xgboost教程

XGBOOST从原理到实战二分类 、多分类:https://blog.csdn.net/HHTNAN/article/details/81079257

手把手教写出XGBoost实战程序:https://juejin.im/post/5a1bb29e51882531ba10aa49

机器学习XGBoost算法使用:http://irory.me/blog/16

XGBoost使用教程(纯xgboost方法):https://blog.csdn.net/u011630575/article/details/79418138

相关项目

微软恶意软件分类挑战,malware-detection:https://github.com/dchad/malware-detection

用机器学习进行恶意软件检测——以阿里云恶意软件检测比赛为例:https://xz.aliyun.com/t/3704 代码地址:https://github.com/Rman0fCN/ML_Malware_detect

猜你喜欢

转载自blog.csdn.net/Li_suhuan/article/details/89243451