Analysis of a website's highly encrypted and obfuscated javascript

Preface


I have analyzed the javascript code after encrypting and obfuscating a certain website for some time. Although I haven't figured it out yet, it is somewhat new. Let me record it.

Tools and information


Analysis process


Get javascript code

  • Only a small part of the encrypted core code is directly written in the <script> of the web page, most of the code is eval, and some are asynchronously loaded by jsonp
  • You can use cdp4j to monitor the Debugger.ScriptParsed event, and call Debugger.getScriptSource in the listener to get the js code text
  • In this way, all front-end javascript source code can be obtained. Even if the source code is encrypted in the network response, it will be restored to the legal js source code before executing with eval.
  • To facilitate analysis, save the code as a file. The website js will use a timer to repeatedly eval a piece of code, so you can use ScriptParsed.hash as the file name to avoid saving files repeatedly
  • In this way, the js code encrypted by the same obfuscation method has 4 sections, two of which are extremely long and are core codes, and the other two sections should be codec algorithms, which add up to about 5000 lines.

Get constant mapping

  • After I got the js, I formatted it and found that it was still a mess. All variables and functions are "_$xx", and the readability is equal to 0
  • Tried it in the Chrome console and found that global variables and functions are saved in the window   
  • Some function calls without parameters actually return constant strings
  • There are also some _$xx.call. After a look, they are actually system methods, such as
  • String.formatCharCode,Array.prototype.slice等
  • Therefore, you can write a console script to traverse all the members of the form _$xx in the window object to determine its type and function execution result. In this way, constant string mapping, system method mapping, etc. can be worked out. Execute the following code in the console to get the string mapping table.
(function () }
    for (var p in window) {
        if (p.substr(0, 2) !== "_$") continue;
        if (typeof window[p] !== "function" || window[p].name !== "") continue;
        try {
            var s = window[p]();
            console.log(p + "=" + s)
        } catch (e) {}
    }
}){}

Readability restoration

  • After getting the mapping relationship, can it be done simply by replacing it with a regular expression? It's not that simple! The local variables and local functions of the function are likely to have the same name as the global variable. If you replace it with regular no-brainer, you will definitely be pitted!! If the code is less, there are 5000 lines of code. , The difference is a thousand miles away!
  • In addition, the local variables of different functions also have a lot of duplicate names, which cause serious interference during static analysis. Therefore, local variables should also be replaced with unique and more meaningful names, such as <function name>_<variable index>
  • Therefore, the correct method is to perform grammatical level substitutions based on compilation principles. Seeing here, is it going to give up treatment? I still need to write a compiler to crawl some data?!
  • Fortunately, there are already mature industry standards and a number of sophisticated third-party libraries on js, at least you don’t need to start from Dragon Book...
  • I chose acornjs and astring here, the former is used to parse the js source code into an abstract syntax tree AST, and the latter is used to restore the AST to the js source code. Of course, with AST, you can do it.....
  • In order to run acornjs and atring in java code, please refer to the article "Calling npm modules in java" in the reference. Note that astring relies on endswith and repeat two polyfills, both of which can be downloaded to npm
  • Briefly describe the AST transformation algorithm. After getting the AST with acorn.parse(), scan each node recursively:

Before entering each FunctionDeclaration/FunctionExpression node, create a new scope object and put it on the top of the stack, and put the mapping table of all local variables (including function parameters) and new names in the domain; when exiting, the top of the stack will be popped up

When encountering the Identifier node, first look for the current variable name from the top to the bottom in the scope stack. If it is found, it is the local variable of the method or the local variable of the closure, and replace it with the new name; otherwise, it is the global variable, go Find and replace in the mapping table

Note that special processing is required when CallExpression is encountered. The previous AST transformation only involves modifying the identifier name. In order to transform _$xx() to "xxx", it involves structural transformation. The CallExpression node must be modified to a Literal node and added value attribute

  • After all processing is complete, you can use astring.generate() to generate the restored code
  • The code before and after the readability recovery can be seen in the following comparison:

image description

Before processing: a mess, completely incomprehensible

image description

After processing: Although it is still very laborious, at least it can be seen that this is hanging various event listeners. In addition, take a look at how many events the world monitors...

Code analysis

After the above steps are completed, this code can at least barely be seen, don't relax, there are countless pits behind...

The code before the restoration can only make people look confused, the code after the restoration is enough to make people grit your teeth, what a grudge, the full 5000 lines are all positive.....

Here are some of the anti-hacking techniques that have been discovered.

Continuously actively interrupt the interference debugging, and detect whether there is dynamic analysis behavior

var eI_v1 = window["eval"]("(function() { var a = new Date(); debugger"; return new Date() - a > 100;}())");
_$n1 = _$n1 || eI_v1;
//这个在上篇文章分析了,在这找到调用来源了。注意,在可读性还原之前这货长这样:
var _$pW = _$u9[_$mz()](_$oi());
_$n1 = _$n1 || _$pW;

js code dynamic obfuscation

  • As mentioned in the previous article, every refresh of the js code will completely change, including global/local variable names, function arrangement order, etc.
  • Setting breakpoints will be disturbed and the code cannot be executed repeatedly. What does it mean for debugging?

Check whether the key function is injected and replaced

function __RW_checkNative(rh_p0, rh_p1) { //函数名是我手动改的
    try {
        var rh_v2 = Function["prototype"]["toString"]["apply"](rh_p0);
        var rh_v3 = new RegExp("{\\s*\\[native code\\]\\s*}");
        if (typeof rh_p0 !== "function" || !rh_v3["test"](rh_v2) || rh_p1 != undefined && rh_p0 !== rh_p1) __GL_undefined_$sy = true;
    } catch (_$r0) {}
}
  • Will use this function to detect whether the eval, Function, setTimeout, setInterval system functions are injected
  • Knowing this piece of logic, you can use some means ( https://segmentfault.com/a/1190000018742189 ) to deceive it. If you don’t know...

Check whether the current window is hidden

document["addEventListener"]("visibilitychange", _$r0);
  • It will monitor whether the current window is at the top, if you open more browsers to crawl in parallel...

Detect Selenium, WebDriver, PhantomJS, etc.

var rm_v5 = "_Selenium_IDE_Recorder,_selenium, callSelenium",
    rm_v6 = "__driver_evaluate,__webdriver_evaluate,__selenium_evaluate,__fxdriver_evaluate,__driver_unwrapped,__webdriver_unwrapped,__selenium_unwrapped,__fxdriver_unwrapped,__webdriver_script_func,__webdriver_script_fn"
    , rm_v7 = ["selenium", "webdriver", "driver"];
    if (_$un(window, "callPhantom,__phantom")) { ... }
  • If you see this, you will know what will happen...

Hook住AJAX

var ec_v4 = window["XMLHttpRequest"];
if (ec_v4) {
    var ec_v5 = ec_v4["prototype"];
    if (ec_v5) {
        __GL_f_open = ec_v5["open"];
        __GL_f_send = ec_v5["send"];
        ec_v5["open"] = function () {
            _$t5();
            arguments[1] = _$pK(arguments[1]);
            return __GL_f_open["apply"](this, arguments);
        };
    } else { ... }
}
  • An encrypted parameter MmEwMD will be automatically added after the ajax request, and the parameter value may include information such as mouse track

Check if the navigator is fake

var hi_v14 = window["navigator"];
for (hi_v11 in hi_v14) {
    try {
        hi_v13 = hi_v14["hasOwnProperty"](hi_v11);
    } catch (_$r0) {
        hi_v13 = false;
    }
}
  • If the navigator object you inject uses the parallel version created by {...}, it would be revealing...

Check browser characteristics

  • This piece of code is very long and complicated. It has not been analyzed yet. What you can see now includes:
  • navigator.languages-this field is not available in headless chrome
  • navigator.plugins-The plug-in list returned by headless and headed chrome is not the same

 

WebGL capability check

  • There is a large piece of code that uses webgl to draw on canvas. I haven’t done webgl before. I don’t understand it yet, but it’s definitely one of the means to check browser features.

Guess you like

Origin blog.csdn.net/zhangge3663/article/details/108443013