Examples of crawling and confusion encrypted js

Job explain: js Reverse

Outline

url:https://nyloner.cn/proxy

需求:将这个网页中的代理ip和端口号进行爬取
难点:
    动态变化的请求参数
    js加密
    需要js逆向

analysis

  • Crawled data is dynamic loading

  • And we had a packet capture tool for global search, did not find results

    • Means: crawling data request from the server is encrypted to ciphertext data
  • Each time the page is refreshed 10s, after refreshing discovery data update, but the url in the browser address bar is not changed, indicating that the data is loaded out by the ajax request to.

    • Dynamically loaded by the data out to the ajax request, and requests the data is encrypted data
  • Ajax targeting packet, and which you can see the encrypted request url parameters and dynamic response data

  • Ajax request to the ciphertext data capture

    • Dynamic changes dynamically access request parameters

    • Based on the dynamic changes capture tools global search parameter request token and the token to locate the source of js code is as follows:

      var token = md5(String(page) + String(num) + String(timestamp));
    • Decrypting the ciphertext data

      • By resolving to find the decryption function js: decode_str (encode_str)
    • Find realize encode_str of:

      • js Reverse: converting code into js python code. Development environment can only execute python code

answer

Dynamic parameter parsing

# 我们抓包看到 token是加密生成 则把它进行加密
var token = md5(String(page) + String(num) + String(timestamp));
# 进行加密
def getToken():
    page = str(1)
    num = str(15)
    t = str(int(time.time()))
    md5 = hashlib.md5()
    md5.update((page+num+t).encode('utf-8'))
    token = md5.hexdigest()
    return token

Web page source js

# 动态参数加密
function get_proxy_ip(page, num, click_btn) {
    var timestamp = Date.parse(new Date());
    timestamp = timestamp / 1000;
    # token的加密
    var token = md5(String(page) + String(num) + String(timestamp));
    # ajax 请求 类似这样的url:https://nyloner.cn/proxy?page=1&num=15&token=26c8bc7&t=1575
    $.get('../proxy?page=' + page + '&num=' + num + '&token=' + token + '&t=' + timestamp, function (result) {
        # 判断 是否为True 
        if (result.status === 'true') {
            var setHtml = "";
            $("#ip-list").html(setHtml);
            var encode_str = result.list;
            # decode_str 是解密函数
            var items = str_to_json(decode_str(encode_str));
            for (var index = 0; index < items.length; ++index) {
                item = items[index];
                setHtml += "<tr>\n<td>" + (index + 1) + "</td>\n";
                setHtml += "<td>" + item.ip.toString() + "</td>\n";
                setHtml += "<td>" + item.port.toString() + "</td>\n";
                setHtml += "<td>" + item.time.toString() + "</td>\n</tr>\n";
            }
            $("#ip-list").html(setHtml);
            if (click_btn === 'next') {
                document.getElementById("last-page").disabled = false;
                if (items.length < 15) {
                    document.getElementById("next-page").disabled = true;
                }
            } else {
                document.getElementById("next-page").disabled = false;
                if (page === 1) {
                    document.getElementById("last-page").disabled = true;
                }
            }

        }
    });
}

Requesting acquisition data acquired encrypted token to the url

import time
import hashlib
import requests
import base64
#js逆向之后的结果
def decode_str(scHZjLUh1):
    #解密成字符串
    scHZjLUh1 = base64.decodestring(scHZjLUh1.encode())
    key = 'nyloner'
    lenth = len(key)
    code = ''
    sch_lenth = len(scHZjLUh1)
    for i in range(sch_lenth):
        coeFYlqUm2 = i % lenth
        #chr(0-255)返回对应编码的字符
        #ord(a-z)返回编码数值
        code += chr(scHZjLUh1[i] ^ ord(key[coeFYlqUm2]))
    code = base64.decodestring(code.encode())
    code = code.decode('utf-8')
    code
#     return code

Parsing the encrypted data

token = getToken()
url = 'https://nyloner.cn/proxy'
param = {
    'num':'15',
    'page':'1',
    't':str(int(time.time())),
    'token':token
    
}

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36',
    'Cookie':'sessionid=2cj1fd0hpbv64qrxe6xckek8tu3uad4m'
}
code = requests.get(url,headers=headers,params=param).json().get('list')
str_code = decode_str(code)
str_code

js encryption, confusion

Demand Overview

url:https://www.aqistudy.cn/html/city_detail.html
获取该url的数据

demand analysis

  • 1. Click on the different meteorological index tabs, found no relevant request, indicating the current page load time out, all the meteorological data has been loaded
  • 2. whether the data is dynamically loaded
    • Data is dynamically loaded out
  • 3. Modify the query conditions (city modify switching time), click on the search button will load the new data
  • 4. Access to the two packets in the capture tool xhr
    • Like url
    • We have such a request parameter d
    • D value of the request parameter data packets of two different
    • d This request is encrypted parameter and the dynamic changes
  • And process dynamic changes encrypted request parameter d
    • Request parameters d global search (infeasible)
    • Click the Search button on the page, in the packet capture tools to capture data packets to the two ajax requests
      • Click the button to initiate ajax request, look for the button click event (click)
      • With Firefox browser developer tools to locate a button click time
        • getDate () js function
    • Analysis getDate achieve this js function: the purpose is to find ajax js code corresponding to the request
      • type=='HOUR',按照小时为时间单位进行查询
      • 并没有在该函数的实现中发现ajax请求对应的代码,但是发现了另外的两个函数调用getAQIData();getWeatherData();
        • 分析getWeatherData();和getAQIData()这两个函数的定义,想要去找到ajax请求对应的代码:
          • 这两个函数实现的区别
            • method变量赋值的字符串不一样
              • GETDETAIL
              • GETCITYWEATHER
          • 相同:
            • 都没有出现ajax请求对应的代码,但是发现另一个函数的调用:
              • getServerData(method,param,匿名函数,0.5)
                • method:GETDETAIL或者GETCITYWEATHER
                • param:字典,有四组键值对
                  • city:查询城市的名称
                  • type:HOUR
                  • starttime:查询开始时间
                  • endTIME:查询结束的时间
                    • 分析getServerDate(method,param,匿名函数,0.5)这个函数的实现,还是为了找到ajax请求对应的代码
                    • 基于抓包工具的全局搜索才可以找到

                    • 发现这个函数的实现代码看不懂,函数的实现好像是一组密文
                    • 说明:网站对js函数的实现加密
                    • js混淆:对js函数的实现代码进行加密
                    • js反混淆:将加密的js代码解密成原文
                    • 暴力破解:https://www.bm8.com.cn/jsConfusion/
                    • 分析getServerData函数的实现:
                    • 终于找到了ajax请求对应的代码
                    • 参数d的构成:getParam(method,object)返回
                    • method:method
                    • object:param字典,四个键值分别是城市名称。type,起始时间,结束时间
                    • 请求到的密文数据的解密方式
                    • decodeData(data):data参数就是响应的密文数据,返回值就是解密后的原文数据
                    • JS逆向
                    • 使用PyExecJS库来实现模拟JavaScript代码执行
                    • 环境的安装:
                    • pip install PyExecJS
                    • 必须还要安装nodeJS的开发环境
          • 请求到加密的响应数据
          • 将加密的响应数据进行解密
  • 获取动态变化且加密的请求参数(d)
import execjs
import requests
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
node = execjs.get()
file = 'test.js'
ctx = node.compile(open(file,encoding='utf-8').read())

#如何五个变量会作为getPostParamCode的参数
method = 'GETDETAIL'#GETCITYWEATHER
city = '北京'
type = 'HOUR'
start_time = '2018-01-25 00:00:00'
end_time = '2018-01-25 23:00:00'
#模拟执行getPostParamCode函数
js = 'getPostParamCode("{0}", "{1}", "{2}", "{3}", "{4}")'.format(method, city, type, start_time, end_time)
params = ctx.eval(js)
# print(params)#请求参数d
url = 'https://www.aqistudy.cn/apinew/aqistudyapi.php'
data = {
    'd':params
}
#获取了加密的响应数据
response_code = requests.post(url=url,headers=headers,data=data).text
# response_code

#模拟执行decodeData函数对密文数据进行解密
js = 'decodeData("{0}")'.format(response_code)
page_text = ctx.eval(js)
print(page_text)

Guess you like

Origin www.cnblogs.com/zhangdadayou/p/11999969.html