Crawler novice - How to debug that the list page link is different from the detail link and solve AES-ECB in three ways using js reverse engineering

1. Website analysis

2. Positioning monitoring

  • Click on the a tag in Google Chrome
    Insert image description here
  • Firefox browser has its own event listenerInsert image description here

3. Familiar with AES-ECB

  • 1. Before analysis, you need to be familiar with what aes encryption is. The familiar one may be md5 encryption. md5 encryption is a hash algorithm. It is irreversible and cannot deduce plaintext from the result;而aes是对称加密算法,区别之一可加密可解密,即可反推明文
    Insert image description here
  • 2, AES的ECB模式,只需要找到key密钥,就可以加密解密了, Online debugging AES encryption and decryption
    Insert image description here
  • 3,AES encryption and decryption code

4. Debugging Analysis

  • Take Google Chrome as an example
    Insert image description here

  • Start adding breakpoints, um~ Try setting a breakpoint here when defining variables, and then click on the list link. You will find that the js stops at the breakpoint. Then we will debug step by step, and you will find out how the URL of the list looks like. It’s encrypted, and the core encryption algorithm actually uses CryptoJS’s symmetric encryption AES encryption, and uses ECB mode and Pkcs7 padding. You can see the value of the key when debugging this part.
    Insert image description here
    Insert image description here

  • Cut out js: Copy the entire js file and run it directly in the console panel. The following is the result after extracting the key js and running the new js script. You're done. Next, I use python to restore it.
    Insert image description here

5. node runs js

  • I did not directly copy the CryptoJS defined on the front end, but directly imported the CryptoJS library through the local node. At this time, I only need to extract the key code;
  • Need to install locallynode environment, and then install the crypto-js library: npm install crypto-js -g
  • 重要参数: key is the key; the specified mode defaults to ECB mode; padding is used to fill data. If the length of the bytecode of the data to be encrypted is not an integer multiple of the block size, padding is required.
  • Method 1: Deduct js and make up for whatever is missing
    var CryptoJS = require('crypto-js');
    var req = function(hh) {
          
          
    var s = "qnbyzzwmdgghmcnm";
    var ee = "_blank";
    var aa = hh.split("/");
    var aaa = aa.length;
    var bbb = aa[aaa - 1].split('.');
    var ccc = bbb[0];
    var cccc = bbb[1];
    var r = /^\+?[1-9][0-9]*$/;
    if (r.test(ccc) && cccc.indexOf('jhtml') != -1) {
          
          
        var srcs = CryptoJS.enc.Utf8.parse(ccc);
        var k = CryptoJS.enc.Utf8.parse(s);
        var en = CryptoJS.AES.encrypt(srcs, k, {
          
          
            mode: CryptoJS.mode.ECB,
            padding: CryptoJS.pad.Pkcs7
        });
        var ddd = en.toString();
        ddd = ddd.replace(/\//g, "^");
        ddd = ddd.substring(0, ddd.length - 2);
        var bbbb = ddd + '.' + bbb[1];
        aa[aaa - 1] = bbbb;
        var uuu = '';
        for (i = 0; i < aaa; i++) {
          
          
            uuu += aa[i] + '/'
        }
        uuu = uuu.substring(0, uuu.length - 1);
        return uuu;
    }
    }
    console.log(req("http://ggzy.xzsp.tj.gov.cn:80/jyxxcggg/948547.jhtml"));
    
  • Method 2: CryptoJS, the module that comes with js, and add understanding logic
    var CryptoJS = require("crypto-js");
    var encrypt_req = function(key,text) {
          
          
        var l = CryptoJS.enc.Utf8.parse(text);
        var e = CryptoJS.enc.Utf8.parse(key);
        var a = CryptoJS.AES.encrypt(l, e, {
          
          
            mode: CryptoJS.mode.ECB,
            padding: CryptoJS.pad.Pkcs7
        })
        return a.toString()  // 此方式返回base64  
        // return a.ciphertext.toString() // 返回hex格式的密文  
    }
    
    // ECB模式加密base64
    console.log(encrypt_req('qnbyzzwmdgghmcnm', '1025528'));
    
    Insert image description here

6. Python executes js

  • There are three ways to call js in python:
    • Either replace the same logic of js with python's existing module, i.e. python restore
    • Either execute via execjs/py_mini_racer etc.
    • Either open an interface through the node deployment service and execute it.
  • Python's execjs library calls js
    """通过execjs执行js"""
    import execjs  # pip install execjs
    from loguru import logger
    list_url = 'http://ggzy.zwfwb.tj.gov.cn:80/jyxxcgjg/1025528.jhtml'
    with open('./aes.js', "r", encoding='utf-8') as f:
        ctx = execjs.compile(f.read())
    true_url = ctx.call('req', list_url)
    logger.info(f"详情的url:{
            
            list_url} >真实的url: {
            
            true_url}")
    
    #######分割线#######
    import execjs  # pip install execjs
    from loguru import logger
    list_url = 'http://ggzy.zwfwb.tj.gov.cn:80/jyxxcgjg/1025528.jhtml'
    ccc = list_url.split('/')[-1].rstrip('.jhtml')
    with open('./aes.js', "r", encoding='utf-8') as f:
        ctx = execjs.compile(f.read())
    suffix = ctx.call('encrypt_req', 'qnbyzzwmdgghmcnm', '1025528').replace('/', '^')[:-2]
    true_url = f"http://ggzy.zwfwb.tj.gov.cn:80/jyxxcgjg/{
            
            suffix}.jhtml"
    logger.info(f"详情的url:{
            
            list_url} >真实的url: {
            
            true_url}")
    
    
  • python's own aes library restoration logic:pip install pycryptodome
    from loguru import logger
    from Crypto.Cipher import AES
    from Crypto.Util.Padding import pad
    import base64
    
    
    def aes_ecb_encrypt_text(decrypt_text: str, key: str) -> str:
        """
        加密AES_ECB明文
        :param decrypt_text: 待加密的字符串
        :param key: 密钥
        :return:  加密后的数据
        """
        aes2 = AES.new(key.encode('utf-8'), AES.MODE_ECB)
        encrypt_text = aes2.encrypt(pad(decrypt_text.encode('utf-8'), AES.block_size, style='pkcs7'))
        encrypt_text = str(base64.encodebytes(encrypt_text), encoding='utf-8').replace("\n", "")
        return encrypt_text
    
    
    list_url = 'http://ggzy.zwfwb.tj.gov.cn:80/jyxxcgjg/1025528.jhtml'
    ccc = list_url.split('/')[-1].rstrip('.jhtml')
    decrypt_str = ccc
    key_str = "qnbyzzwmdgghmcnm"
    encrypt_str = aes_ecb_encrypt_text(decrypt_str, key_str).replace('/', '^')[:-2]
    true_url = list_url.replace(decrypt_str, encrypt_str)
    logger.info(f"详情的url:{
            
            list_url} >真实的url: {
            
            true_url}")
    
    Insert image description here

7. Knowledge Planet - Time is Long

  • View on the web side as follows
    Insert image description here
    Insert image description here
  • View as follows on the app side
    Insert image description here
    Insert image description here

Guess you like

Origin blog.csdn.net/weixin_43411585/article/details/131951347