[Hundreds of JS Reverse Cases] The first question of the anti-crawling practice platform for netizens: JS obfuscation and encryption, anti-Hook operation

Pay attention to the WeChat public account: Brother K crawler, continue to share technical dry goods such as advanced crawler, JS/Android reverse!

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized. Commercial and illegal uses are strictly prohibited. Otherwise, all consequences arising therefrom have nothing to do with the author. If there is any infringement , please contact me to delete immediately!

write in front

The topic itself is not difficult, but there are many pits, mainly the anti-hook operation and the local joint adjustment and compensation environment. This article will introduce each pit in detail, not just a one-off, but very detailed!

Through this article you will learn:

  1. Hook Function and timer to eliminate infinite debugger;
  2. To solve the anti-Hook, find the encryption parameter _signature through Hook;
  3. Analyze the difference between the browser and the local environment, how to find objects such as navigator, document, location, etc., and how to supplement the environment locally;
  4. How to use PyCharm for local joint debugging, locate the difference between the local and browser environments, so as to pass the detection.

reverse goal

  • Goal: Anti-anti-crawler practice platform for netizens. Question 1: JS obfuscation and encryption, anti-Hook operation
  • Link: http://spider.wangluozhe.com/challenge/1
  • Introduction: The answer to this question is to add all the data of 100 pages. It is required to complete this question in the way of Hook, do not solve it by AST, deduction code, etc., and do not use JS deobfuscation tool for decryption. (The writing and usage of Hook code, K's previous article has it, this article will not introduce it in detail)

01.png

Bypass infinite debugger

First observe that when you click to turn the page, the URL has not changed, so it is usually an Ajax request. Every time a request, some parameters will change. If you press F12 skillfully to find the encryption parameters, you will find that it will stop immediately, enter the infinite debugger state, and go up. Following a stack, you can find the word debugger, as shown in the following figure:

02.png

This situation also happened in Brother K's previous case. At that time, we directly rewrote the JS and replaced the word debugger, but this question obviously hopes that we can use the hook method to get over the infinite debugger, except for the debugger. , we noticed that there is also a constructor in front of it. In JavaScript, it is called a constructor, which is usually called when an object is created or instantiated. Its basic syntax is: constructor([arguments]) { ... }. For details, please refer to the MDN constructor . In this case, Obviously the debugger is the arguments parameter of the constructor, so we can write the following Hook code to get past the infinite debugger:

// 先保留原 constructor
Function.prototype.constructor_ = Function.prototype.constructor;
Function.prototype.constructor = function (a) {
    // 如果参数为 debugger,就返回空方法
    if(a == "debugger") {
        return function (){};
    }
    // 如果参数不为 debugger,还是返回原方法
    return Function.prototype.constructor_(a);
};

There are also many ways to inject Hook code, such as entering code directly in the browser developer tool console (refreshing the web page will invalidate), Fiddler plug-in injection, oil monkey plug-in injection, self-written browser plug-in injection, etc. These methods were used before Brother K. The articles have been introduced, and I will not repeat them today.

This time we use the Fiddler plug-in injection. After injecting the above Hook code, we will find that it will enter the infinite debugger, setInterval, and obviously the timer. It has two required parameters, the first is the method to be executed, and the second is the time parameter, that is, the time interval for periodically calling the method, in milliseconds. For details, please refer to the rookie tutorial Window setInterval() , and we can also hook it off:

// 先保留原定时器
var setInterval_ = setInterval
setInterval = function (func, time){
    // 如果时间参数为 0x7d0,就返回空方法
    // 当然也可以不判断,直接返回空,有很多种写法
    if(time == 0x7d0)
    {
        return function () {};
    }
    // 如果时间参数不为 0x7d0,还是返回原方法
    return setInterval_(func, time)
}

03.png

Paste the two pieces of Hook code into the browser plug-in, open the Hook, and refresh the page to find that the infinite debugger has been passed.

04.png

Hook parameter

After passing the infinite debugger, we click on a page at random, and we can see that it is a POST request in the packet capture. In Form Data, it pageis the number of pages, countthe amount of data per page, and _signaturethe parameters we want to reverse, as shown in the following figure:

05.png

We search directly _signature, and there is only one result, one of window.get_sign()which is to set _signaturethe function of , as shown in the following figure:

06.png

Here comes the problem! ! ! Let's take a look at the title of this question, JS obfuscation encryption, anti-Hook operation, the author also repeatedly emphasized that this question is a test of Hook ability! And so far, we don't seem to have encountered any anti-Hook means, so this direct search _signatureis obviously too simple, and it must be obtained through Hook _signature, and the subsequent Hook operation will definitely not be smooth sailing!

Without further ado, let's write a Hook window._signaturecode directly, as follows:

(function() {
    //严谨模式 检查所有错误
    'use strict';
    //window 为要 hook 的对象,这里是 hook 的 _signature
	var _signatureTemp = "";
    Object.defineProperty(window, '_signature', {
		//hook set 方法也就是赋值的方法 
		set: function(val) {
				console.log('Hook 捕获到 _signature 设置->', val);
                debugger;
				_signatureTemp = val;
				return val;
		},
		//hook get 方法也就是取值的方法 
		get: function()
		{
			return _signatureTemp;
		}
    });
})();

Inject the two Hook codes that bypass the infinite debugger together with the code of this Hook _signatureand use the Fiddler plugin to inject together (note here that the code that bypasses the debugger should be placed after the Hook _signaturecode, otherwise it may not work, this may be Plug-in bug), refresh the web page, you can find that the buttons on the front-end page are gone, open the developer tools, you can see that there are two errors in the upper right corner, click to jump to the error code, and also in the console You can see the error message, as shown in the following figure:

07.png

The entire 1.js code is obfuscated by the sojson jsjiami v6 version. We will output some of the obfuscated code in the console, and then restore the code manually. There are two variables i1I1i1liand illllli1, it seems laborious, directly use aand binstead, as follows:

(function() {
    'use strict';
    var a = '';
    Object["defineProperty"](window, "_signature", {
        set: function(b) {
            a = b;
            return b;
        },
        get: function() {
            return a;
        }
    });
}());

Is it familiar? There are get and set methods, isn't this just doing Hook window._signatureoperation? The whole logic is that when the set method is _signatureset , it is assigned to a, and when the get method is obtained _signature, it returns a. In fact, this operation _signaturehas no effect on . What is the meaning of this code? Why do we get an error when we add our own Hook code?

Let's take a look at the error message: Uncaught TypeError: Cannot redefine property: _signature, can't be redefined _signature? Our Hook code will run as soon as the page is loaded , and an error will be reported whenObject.defineProperty(window, '_signature', {}) the website's JS is loaded again. It's very simple. Since it is not allowed to redefine, and the website's own JS Hook code will not be affected , delete it directly. That's it! This place is probably the anti-Hook operation.defineProperty_signature

Save the original 1.js to the local, delete its Hook code, use Fiddler's AutoResponder function to replace the response (there are many replacement methods, and Brother K's previous articles are also introduced), refresh again to find that the exception is lifted, and the Hook arrives successfully _signature.

08.png

09.png

Inverse parameter

After a successful Hook, follow the stack directly and expose the method directly:window._signature = window.byted_acrawler(window.sign())

10.png

Let's take a look first window.sign(). If you select it, you can actually see that it is a 13-digit millisecond timestamp. Let's follow up with 1.js to see his implementation code:

11.png

Let's restore some of the obfuscated code manually:

window["sign"] = function sign() {
    try {
        div = document["createElement"];
        return Date["parse"](new Date())["toString"]();
    } catch (IIl1lI1i) {
        return "123456789abcdefghigklmnopqrstuvwxyz";
    }
}

We should pay attention here. There is a pit buried for us. If you just skip it and think that a timestamp is not good-looking, then you are wrong! Note that this is a try-catch statement, one of which div = document["createElement"];is an HTML DOM Document object, which creates a div tag. If this code is executed in the browser, there is no problem. Just go to the try statement and return the timestamp. When we execute the local node, it will be captured document is not defined, and then the catch statement will return the string of numbers and letters, and the final result must be incorrect!

The solution is also very simple. In the local code, either remove the try-catch statement and return the timestamp directly, or define the document at the beginning, or directly comment out the line of code that creates the div tag, but Brother K recommends direct Define the document, because who can guarantee that there are similar pits elsewhere? If it is hidden deeply and not found, wouldn't it be a waste of effort?

Then let's take a look. The window.byted_acrawler()return statement mainly uses sign()the window.sign()method and the IIl1llI1()method . We follow up the IIl1llI1()method and we can see that the try-catch statement is also used, nav = navigator[liIIIi11('2b')];which is the same as the previous div. It is also recommended to define the navigator directly, as follows As shown in the figure:

14.png

15.png

The methods used here are basically analyzed. After we define window, document, and navigator, run it locally and you will be prompted window[liIIIi11(...)] is not a function:

16.png

When we go to the webpage, we will find that this method is actually a timer, which does not have much effect. Just comment it out:

17.png

PyCharm local joint debugging

After the above operations, if you run it locally again, you will be prompted window.signs is not a functionthat the wrong place is an eval statement. Let's go to the browser to look at this eval statement, and find that it is window.sign()so, why does it become local, and there is window.signs()an extra s for no reason?

18.png

19.png

There is only one reason for this, which is the difference between the local environment and the browser environment. There must be environment detection in the obfuscated code. If it is not the browser environment, the code in eval will be modified and an additional s will be added. Here, if You can directly delete the entire function including the eval statement and the setInterval timer above, and the code can run normally, but Brother K has always pursued details! We have to figure out the reason for adding an extra s!

We use PyCharm to debug locally to see where the s is added. The wrong place is the eval statement. We click on this line, the next breakpoint, right-click to debug to run, and enter the debugging interface (PS: The original code has Infinite debugger, if you don't do it, debugging in PyCharm will also enter the infinite debugger, you can directly add the previous Hook code to the local code, or you can directly delete the corresponding function or variable):

20.png

The left side is the call stack, and the right side is the variable value, which is similar to the developer tools in Chrome as a whole. For detailed usage, please refer to the official JetBrains documentation , which mainly introduces the 8 buttons in the figure:

  1. Show Execution Point (Alt + F10): If your cursor is on another line or other page, click this button to jump to the line where the current breakpoint is located;
  2. Step Over (F8): Step over, go down line by line, if there is a method on this line, it will not enter the method;
  3. Step Into (F7): Step in. If there is a method in the current line, you can enter the method. Generally, it is used to enter the custom method written by the user, and will not enter the method of the official class library;
  4. Force Step Into (Alt + Shift + F7): Force step in, you can enter any method, you can use this method to enter the official class library when viewing the underlying source code;
  5. Step Out (Shift + F8): Step out, exit from the step-in method to the method invocation place, at this time the method has been executed, but the assignment has not yet been completed;
  6. Restart Frame: Abandon the current breakpoint and re-execute the breakpoint;
  7. Run to Cursor (Alt + F9): run to the cursor, the code will run to the cursor line, no need to break the point;
  8. Evaluate Expression (Alt + F8): Evaluate the expression, you can run the expression directly without entering it on the command line.

When we click the Step Into button, we will enter function IIlIliii(), and the try-catch statement is also used here. If we continue to the next step, we will find that an exception has been caught, prompting Cannot read property 'location' of undefined, as shown in the following figure:

21.png

Let's output the value of each variable and restore the code manually, as follows:

function IIlIliii(II1, iIIiIIi1) {
    try {
        href = window["document"]["location"]["href"];
        check_screen = screen["availHeight"];
        window["code"] = "gnature = window.byted_acrawler(window.sign())";
        return '';
    } catch (I1IiI1il) {
        window["code"] = "gnature = window.byted_acrawlers(window.signs())";
        return '';
    }
}

In this way, we found the clue. We do not have document, location, href, availHeight objects locally, so we will use the catch statement, and if it becomes window.signs(), we will report an error. The solution here is also very simple, you can directly delete the redundant The code is directly defined as the string of statements without s, or you can choose to supplement the environment, look at the values ​​of href and screen in the browser, and define it:

var window = {
    "document": {
        "location": {
            "href": "http://spider.wangluozhe.com/challenge/1"
        }
    },
}

var screen = {
    "availHeight": 1040
}

Then run it again, and you will be prompted again sign is not defined, what is here is sign()actually window.sign(), that is, the following window[liIIIi11('a')]method , you can change any writing method:

22.png

Run it again, there is no error, we can write a method to get _signatureit: choose one of the following ways to write:

function getSign(){
    return window[liIIIi11('9')](window[liIIIi11('a')]())
}

function getSign(){
    return window.byted_acrawler(window.sign())
}

// 测试输出
console.log(getSign())

We run it and find that there is no output in Pycharm. Similarly, we output it in the console of the topic page console.logand find that it is empty, as shown in the following figure:

23.png

It seems that he console.loghas dealt with . In fact, this situation is not a big problem. We can directly use the Python script to call the getSign()method we wrote earlier to get _signaturethe value of . However, once again, Brother K has always pursued details! I have to find console.loga it and make it normal!

Here we still use Pycharm to debug, further familiarize yourself with local joint debugging, set a breakpoint at the console.log(getSign())statement , follow up step by step, you will find that you have entered the statement var IlII1li1 = function() {};, check the value of the variable at this time, and find console.logthat the console.warnmethods are empty, as follows As shown in the figure:

24.png

Followed up to the next step and found that it returned directly. It is possible that the console-related commands will be blanked when running JS for the first time. Therefore, first set a few breakpoints in the suspected console processing method, and then Re-debugging, you will find that you will go to the else statement, and then directly assign IlII1li1, which is an empty method, to the console related commands, as shown in the following figure:

25.png

After locating the problem, we commented out the if-else statement directly without leaving it empty, and then debugged again, and found that we could output the result directly:

26.png

Call Python to carry _signature to calculate the data of each page one by one, and finally submit successfully:

2.png

full code

GitHub pays attention to Brother K's crawler and continues to share crawler-related code! Welcome star! https://github.com/kgepachong/

**The following only demonstrates some key codes and cannot be run directly! **Full code repository address: https://github.com/kgepachong/crawler/

JavaScript encryption key code architecture

var window = {
    "document": {
        "location": {
            "href": "http://spider.wangluozhe.com/challenge/1"
        }
    },
}

var screen = {
    "availHeight": 1040
}
var document = {}
var navigator = {}
var location = {}

// 先保留原 constructor
Function.prototype.constructor_ = Function.prototype.constructor;
Function.prototype.constructor = function (a) {
    // 如果参数为 debugger,就返回空方法
    if(a == "debugger") {
        return function (){};
    }
    // 如果参数不为 debugger,还是返回原方法
    return Function.prototype.constructor_(a);
};

// 先保留原定时器
var setInterval_ = setInterval
setInterval = function (func, time){
    // 如果时间参数为 0x7d0,就返回空方法
    // 当然也可以不判断,直接返回空,有很多种写法
    if(time == 0x7d0)
    {
        return function () {};
    }
    // 如果时间参数不为 0x7d0,还是返回原方法
    return setInterval_(func, time)
}

var iil = 'jsjiami.com.v6'
  , iiIIilii = [iil, '\x73\x65\x74\x49\x6e\x74\x65\x72\x76\x61\x6c', '\x6a\x73\x6a', ...];
var liIIIi11 = function(_0x11145e, _0x3cbe90) {
    _0x11145e = ~~'0x'['concat'](_0x11145e);
    var _0x636e4d = iiIIilii[_0x11145e];
    return _0x636e4d;
};
(function(_0x52284d, _0xfd26eb) {
    var _0x1bba22 = 0x0;
    for (_0xfd26eb = _0x52284d['shift'](_0x1bba22 >> 0x2); _0xfd26eb && _0xfd26eb !== (_0x52284d['pop'](_0x1bba22 >> 0x3) + '')['replace'](/[fnwRwdGKbwKrRFCtSC=]/g, ''); _0x1bba22++) {
        _0x1bba22 = _0x1bba22 ^ 0x661c2;
    }
}(iiIIilii, liIIIi11));
// window[liIIIi11('0')](function() {
//     var l111IlII = liIIIi11('1') + liIIIi11('2');
//     if (typeof iil == liIIIi11('3') + liIIIi11('4') || iil != l111IlII + liIIIi11('5') + l111IlII[liIIIi11('6')]) {
//         var Ilil11iI = [];
//         while (Ilil11iI[liIIIi11('6')] > -0x1) {
//             Ilil11iI[liIIIi11('7')](Ilil11iI[liIIIi11('6')] ^ 0x2);
//         }
//     }
//     iliI1lli();
// }, 0x7d0);
(function() {
    var iiIIiil = function() {}();
    var l1liii11 = function() {}();
    window[liIIIi11('9')] = function byted_acrawler() {};
    window[liIIIi11('a')] = function sign() {};
    (function() {}());
    // (function() {
    //     'use strict';
    //     var i1I1i1li = '';
    //     Object[liIIIi11('1f')](window, liIIIi11('21'), {
    //         '\x73\x65\x74': function(illllli1) {
    //             i1I1i1li = illllli1;
    //             return illllli1;
    //         },
    //         '\x67\x65\x74': function() {
    //             return i1I1i1li;
    //         }
    //     });
    // }());
    var iiil1 = 0x0;
    var l11il1l1 = '';
    var ii1Ii = 0x8;
    function i1Il11i(iiIll1i) {}
    function I1lIIlil(l11l1iIi) {}
    function lllIIiI(IIi1lIil) {}

    // 此处省略 N 个函数
    
    window[liIIIi11('37')]();
}());

function iliI1lli(lil1I1) {
    function lili11I(l11I11l1) {
        if (typeof l11I11l1 === liIIIi11('38')) {
            return function(lllI11i) {}
            [liIIIi11('39')](liIIIi11('3a'))[liIIIi11('8')](liIIIi11('3b'));
        } else {
            if (('' + l11I11l1 / l11I11l1)[liIIIi11('6')] !== 0x1 || l11I11l1 % 0x14 === 0x0) {
                (function() {
                    return !![];
                }
                [liIIIi11('39')](liIIIi11('3c') + liIIIi11('3d'))[liIIIi11('3e')](liIIIi11('3f')));
            } else {
                (function() {
                    return ![];
                }
                [liIIIi11('39')](liIIIi11('3c') + liIIIi11('3d'))[liIIIi11('8')](liIIIi11('40')));
            }
        }
        lili11I(++l11I11l1);
    }
    try {
        if (lil1I1) {
            return lili11I;
        } else {
            lili11I(0x0);
        }
    } catch (liIlI1il) {}
}
;iil = 'jsjiami.com.v6';

// function getSign(){
//     return window[liIIIi11('9')](window[liIIIi11('a')]())
// }

function getSign(){
    return window.byted_acrawler(window.sign())
}

console.log(getSign())

Python calculation key code

# ==================================
# --*-- coding: utf-8 --*--
# @Time    : 2021-12-01
# @Author  : 微信公众号:K哥爬虫
# @FileName: challenge_1.py
# @Software: PyCharm
# ==================================


import execjs
import requests

challenge_api = "http://spider.wangluozhe.com/challenge/api/1"
headers = {
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Cookie": "将 cookie 值改为你自己的!",
    "Host": "spider.wangluozhe.com",
    "Origin": "http://spider.wangluozhe.com",
    "Referer": "http://spider.wangluozhe.com/challenge/1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}


def get_signature():
    with open('challenge_1.js', 'r', encoding='utf-8') as f:
        ppdai_js = execjs.compile(f.read())
    signature = ppdai_js.call("getSign")
    print("signature: ", signature)
    return signature


def main():
    result = 0
    for page in range(1, 101):
        data = {
            "page": page,
            "count": 10,
            "_signature": get_signature()
        }
        response = requests.post(url=challenge_api, headers=headers, data=data).json()
        for d in response["data"]:
            result += d["value"]
    print("结果为: ", result)


if __name__ == '__main__':
    main()

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4585873/blog/5350698