Summary of various methods of calling JS in Python

1. Write in front

  Everyone who is a crawler knows that the general protection of domestic web or app is very good, and the more valuable the website is, the stronger this aspect is.

No matter how small or weak the website is now, it must be anti-crawled more or less on the whole
insert image description here

JS is widely used in anti-crawling. Now a crawler engineer basically needs to understand JS, because various JS encryption needs to be reversed! Cracking JS encryption is only the first step, and then how to execute JS directly in our Python code. Here are several ways to execute JS code in Python.

2. PyExecJS method

The first step is to install:

pip3 install PyExecJS

PyExecJS is an easy-to-use library that provides a common interface to execute JavaScript code and can work in multiple JavaScript runtime environments, including Node.js, PhantomJS

Then import excejs , which is a library for executing JS code in Python, the usage example is as follows:

md5_js = 
'''function hex_md5(s){ return binl2hex(core_md5(str2binl(s), s.length * 8));}
function b64_md5(s){ return binl2b64(core_md5(str2binl(s), s.length * 8));}
function hex_hmac_md5(key, data) { return binl2hex(core_hmac_md5(key, data)); }
function b64_hmac_md5(key, data) { return binl2b64(core_hmac_md5(key, data)); }
function calcMD5(s){ return binl2hex(core_md5(str2binl(s), s.length * 8));}

function md5_vm_test()
{
  return hex_md5("abc") == "900150983cd24fb0d6963f7d28e17f72";
}
function cxx(a, d) {
    var e;
    a = a - 0,e = str_spl[a]
    return e;
}
function core_md5(x, len)
{

  x[len >> 5] |= 0x80 << ((len) % 32);
  x[(((len + 64) >>> 9) << 4) + 14] = len;
  var a =  1732584193;
  var b = -271733879;
  var c = -1732584194;
  var d =  271733878;
  for(var i = 0; i < x.length; i += 16)
  {
    var olda = a;
    var oldb = b;
    var oldc = c;
    var oldd = d;

    a = md5_ff(a, b, c, d, x[i+ 0], 7 , -680876936);
    d = md5_ff(d, a, b, c, x[i+ 1], 12, -389564586);
    c = md5_ff(c, d, a, b, x[i+ 2], 17,  606105819);
    b = md5_ff(b, c, d, a, x[i+ 3], 22, -1044525330);
    a = md5_ff(a, b, c, d, x[i+ 4], 7 , -176418897);
    d = md5_ff(d, a, b, c, x[i+ 5], 12,  1200080426);
    c = md5_ff(c, d, a, b, x[i+ 6], 17, -1473231341);
    b = md5_ff(b, c, d, a, x[i+ 7], 22, -45705983);
    a = md5_ff(a, b, c, d, x[i+ 8], 7 ,  1770035416);
    d = md5_ff(d, a, b, c, x[i+ 9], 12, -1958414417);
    c = md5_ff(c, d, a, b, x[i+10], 17, -42063);
    b = md5_ff(b, c, d, a, x[i+11], 22, -1990404162);
    a = md5_ff(a, b, c, d, x[i+12], 7 ,  1804603682);
    d = md5_ff(d, a, b, c, x[i+13], 12, -40341101);
    c = md5_ff(c, d, a, b, x[i+14], 17, -1502002290);
    b = md5_ff(b, c, d, a, x[i+15], 22,  1236535329);
    a = md5_gg(a, b, c, d, x[i+ 1], 5 , -165796510);
    d = md5_gg(d, a, b, c, x[i+ 6], 9 , -1069501632);
    c = md5_gg(c, d, a, b, x[i+11], 14,  643717713);
    b = md5_gg(b, c, d, a, x[i+ 0], 20, -373897302);
    a = md5_gg(a, b, c, d, x[i+ 5], 5 , -701558691);
    d = md5_gg(d, a, b, c, x[i+10], 9 ,  38016083);
    c = md5_gg(c, d, a, b, x[i+15], 14, -660478335);
    b = md5_gg(b, c, d, a, x[i+ 4], 20, -405537848);
    a = md5_gg(a, b, c, d, x[i+ 9], 5 ,  568446438);
    d = md5_gg(d, a, b, c, x[i+14], 9 , -1019803690);
    c = md5_gg(c, d, a, b, x[i+ 3], 14, -187363961);
    b = md5_gg(b, c, d, a, x[i+ 8], 20,  1163531501);
    a = md5_gg(a, b, c, d, x[i+13], 5 , -1444681467);
    d = md5_gg(d, a, b, c, x[i+ 2], 9 , -51403784);
    c = md5_gg(c, d, a, b, x[i+ 7], 14,  1735328473);
    b = md5_gg(b, c, d, a, x[i+12], 20, -1926607734);
    a = md5_hh(a, b, c, d, x[i+ 5], 4 , -378558);
    d = md5_hh(d, a, b, c, x[i+ 8], 11, -2022574463);
    c = md5_hh(c, d, a, b, x[i+11], 16,  1839030562);
    b = md5_hh(b, c, d, a, x[i+14], 23, -35309556);
    a = md5_hh(a, b, c, d, x[i+ 1], 4 , -1530992060);
    d = md5_hh(d, a, b, c, x[i+ 4], 11,  1272893353);
    c = md5_hh(c, d, a, b, x[i+ 7], 16, -155497632);
    b = md5_hh(b, c, d, a, x[i+10], 23, -1094730640);
    a = md5_hh(a, b, c, d, x[i+13], 4 ,  681279174);
    d = md5_hh(d, a, b, c, x[i+ 0], 11, -358537222);
    c = md5_hh(c, d, a, b, x[i+ 3], 16, -722521979);
    b = md5_hh(b, c, d, a, x[i+ 6], 23,  76029189);
    a = md5_hh(a, b, c, d, x[i+ 9], 4 , -640364487);
    d = md5_hh(d, a, b, c, x[i+12], 11, -421815835);
    c = md5_hh(c, d, a, b, x[i+15], 16,  530742520);
    b = md5_hh(b, c, d, a, x[i+ 2], 23, -995338651);
    a = md5_ii(a, b, c, d, x[i+ 0], 6 , -198630844);
    d = md5_ii(d, a, b, c, x[i+ 7], 10,  1126891415);
    c = md5_ii(c, d, a, b, x[i+14], 15, -1416354905);
    b = md5_ii(b, c, d, a, x[i+ 5], 21, -57434055);
    a = md5_ii(a, b, c, d, x[i+12], 6 ,  1700485571);
    d = md5_ii(d, a, b, c, x[i+ 3], 10, -1894986606);
    c = md5_ii(c, d, a, b, x[i+10], 15, -1051523);
    b = md5_ii(b, c, d, a, x[i+ 1], 21, -2054922799);
    a = md5_ii(a, b, c, d, x[i+ 8], 6 ,  1873313359);
    d = md5_ii(d, a, b, c, x[i+15], 10, -30611744);
    c = md5_ii(c, d, a, b, x[i+ 6], 15, -1560198380);
    b = md5_ii(b, c, d, a, x[i+13], 21,  1309151649);
    a = md5_ii(a, b, c, d, x[i+ 4], 6 , -145523070);
    d = md5_ii(d, a, b, c, x[i+11], 10, -1120210379);
    c = md5_ii(c, d, a, b, x[i+ 2], 15,  718787259);
    b = md5_ii(b, c, d, a, x[i+ 9], 21, -343485551);

    a = safe_add(a, olda);
    b = safe_add(b, oldb);
    c = safe_add(c, oldc);
    d = safe_add(d, oldd);
  }
  return Array(a, b, c, d);

}

function md5_cmn(q, a, b, x, s, t)
{
  return safe_add(bit_rol(safe_add(safe_add(a, q), safe_add(x, t)), s),b);
}
function md5_ff(a, b, c, d, x, s, t)
{
  return md5_cmn((b & c) | ((~b) & d), a, b, x, s, t);
}
function md5_gg(a, b, c, d, x, s, t)
{
  return md5_cmn((b & d) | (c & (~d)), a, b, x, s, t);
}
function md5_hh(a, b, c, d, x, s, t)
{
  return md5_cmn(b ^ c ^ d, a, b, x, s, t);
}
function md5_ii(a, b, c, d, x, s, t)
{
  return md5_cmn(c ^ (b | (~d)), a, b, x, s, t);
}

function core_hmac_md5(key, data)
{
  var bkey = str2binl(key);
  if(bkey.length > 16) bkey = core_md5(bkey, key.length * 8);

  var ipad = Array(16), opad = Array(16);
  for(var i = 0; i < 16; i++)
  {
    ipad[i] = bkey[i] ^ 0x36363636;
    opad[i] = bkey[i] ^ 0x5C5C5C5C;
  }

  var hash = core_md5(ipad.concat(str2binl(data)), 512 + data.length * 8);
  return core_md5(opad.concat(hash), 512 + 128);
}

function safe_add(x, y)
{
  var lsw = (x & 0xFFFF) + (y & 0xFFFF);
  var msw = (x >> 16) + (y >> 16) + (lsw >> 16);
  return (msw << 16) | (lsw & 0xFFFF);
}

function bit_rol(num, cnt)
{
  return (num << cnt) | (num >>> (32 - cnt));
}

function str2binl(str)
{
  var bin = Array();
  var mask = (1 << 8) - 1;
  for(var i = 0; i < str.length * 8; i += 8)
    bin[i>>5] |= (str.charCodeAt(i / 8) & mask) << (i%32);
  return bin;
}

function binl2hex(binarray)
{
  var hex_tab = 0 ? "0123456789ABCDEF" : "0123456789abcdef";
  var str = "";
  for(var i = 0; i < binarray.length * 4; i++)
  {
    str += hex_tab.charAt((binarray[i>>2] >> ((i%4)*8+4)) & 0xF) +
           hex_tab.charAt((binarray[i>>2] >> ((i%4)*8  )) & 0xF);
  }
  return str;
}
function binl2b64(binarray)
{
  var tab = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
  var str = "";
  for(var i = 0; i < binarray.length * 4; i += 3)
  {
    var triplet = (((binarray[i   >> 2] >> 8 * ( i   %4)) & 0xFF) << 16)
                | (((binarray[i+1 >> 2] >> 8 * ((i+1)%4)) & 0xFF) << 8 )
                |  ((binarray[i+2 >> 2] >> 8 * ((i+2)%4)) & 0xFF);
    for(var j = 0; j < 4; j++)
    {
      if(i * 8 + j * 6 > binarray.length * 32) str += '';
      else str += tab.charAt((triplet >> 6*(3-j)) & 0x3F);
    }
  }
  return str;
}'''
content = '加密内容'
ctx = execjs.compile(md5_js)
sk = ctx.call('hex_md5', content)

The first parameter in the call is the name of the JS function. If the JS to be executed has parameters, the parameters can be followed by the above content .

3. js2py method

The first step is to install:

 pip3 install js2py

js2py is a library specially designed for Python, which allows JavaScript code to be executed in Python, and provides a relatively complete JavaScript interpreter. It supports more complex JavaScript features such as closures and exception handling

Good for some reverse engineering scenarios as it can handle obfuscated and complex JavaScript code

Its principle is to directly translate JS code into Python code for execution, no need to install and run the local environment, and the startup is faster

However, the execution time is increased during translation, and errors are sometimes reported during translation, especially when there are many JS codes. Since the local JS environment is not used, it is often easy to report errors when translating code that is confusing like JS

Example usage:

import js2py
js2py.eval_js('console.log( "Hello World!" )')

Use execute to execute JS code example:

import js2py
context = js2py.EvalJs() # 实例化解析js对象
js_code = '''
var a = 10
function f(x) {return x*x}
'''
context.execute(js_code) # 调用js代码里面的函数
context.a

Example of using Python built-in functions directly in JS:

import js2py
context = js2py.EvalJs({
    
    'python_sum': sum})
context.eval('python_sum(new Array(1, 2, 3))')

The function sum is a function in Python, which can be used directly in js2py. In fact, it is easy to understand, because js2py translates js into python code and then executes it. Of course, it can directly execute Python functions.

Example of how to translate JS files:

from example import example
js2py.translate_file('example.js', 'example.py')
example.someFunction()

In addition to using the previous method above, you can also translate the JS file into Python code and then call it. The advantage of this is that you don’t need to translate it every time you call the JS file

Now js2py also supports JavaScript6. If you are interested in more usage skills, you can go to the official website to check: Js2Py

4. node method

First make sure to install nodejs first, and then use subprocess to call node subprocess to execute JS

Advantages : fast speed, and generally no error, because nodejs uses the v8 engine like chrome browser, this method is the most reliable among these five methods, especially when there are a lot of js codes to be executed, it is recommended to call node directly

Disadvantages : need to install nodejs, and if the js of Dachang website or app will generally detect nodejs, so the js code must be handled well, otherwise it will not pass

import subprocess
# js文件最后必须有输出,我使用的是 console.log
pro = subprocess.run("node abc.js", stdout=subprocess.PIPE)
# 获得标准输出
_token = pro.stdout
# 转一下格式
token = _token.decode().strip()

5. Selenium method

The first step is to install:

pip3 install selenium

Selenium is an automated testing tool that simulates browser behavior. JavaScript can be executed in the browser to implement a series of operations for web page rendering

Here we need to download webdriver and install chrome browser, or other browsers and their corresponding webdriver

Driver address: chromedriver

Selenium Official Website: Selenium

Advantages : Save the boring and cumbersome matters of digging JS code

Disadvantages : There must be browser environment support, and the execution efficiency is low. It is only suitable for executing JS code in the presence of a browser environment.

It is generally not recommended to use selenium, but if you don't mind the low collection efficiency, selenium is a good choice

Example usage:

from selenium import webdriver
jsstr = '''
function add() {
    let a = 1;
    let b = 2;
    return a+b;
}'''
# 调用js
driver = webdriver.chrome()
# 异步用这个driver.execute_async_script(js)
result = driver.execute_script(jsstr)
print(result)

The following is a JS call to update the browser User-Agent in the early years:

insert image description here

6. PyV8 method

PyV8 is a library based on Google's V8 engine that provides the ability to execute JavaScript code in Python. It has an advantage in performance because the V8 engine is a high-performance JavaScript engine

PyV8 only supports the pip installation of Python2, and does not support the pip installation of python3 environment, please go directly to the official website to download and install the binary file: PyV8

Then after decompression, copy PyV8.py and _PyV8.so (note: if it is not these two file names, you need to modify), copy the two files to the site-packages directory of Python, such as /usr/local/lib/python3.6 /site-packages

Example usage:

import PyV8
ctxt = PyV8.JSContext()
# ctxt.__enter__()
ctxt.enter()
jsstr = '''
function add() {
    let a = 1;
    let b = 2;
    return a+b;
}'''
result = ctxt.eval(jsstr)
print(result)

Finally, you can choose the appropriate method above according to your business needs.

  Well, it's time to say goodbye to everyone here again. It's not easy to create, please give me a like before leaving. Your support is the driving force for my creation, and I hope to bring you more high-quality articles

Guess you like

Origin blog.csdn.net/qiulin_wu/article/details/132282819