JS Reverse Analysis - Notes "Python Crawler Use"


The most important thing is that you have to patiently search for the code data you want in the browser, and then roll back the encrypted results step by step;

1. Summary

Search: global search, search within code
Debug: regular debug, XHR debug, behavior debug
Check the stack called by the request
Execute the function in the heap memory
Modify the parameter value in the stack
Write js code
Print the value of the windows object
Hook: cookie hook, request Hook, header hook

2. The current encryption methods are summarized as follows:

Symmetric encryption (encryption and decryption key is the same): DES, DES3, AES

Asymmetric encryption (divided into public and private keys): RSA

Information Digest Algorithm/Signature Algorithm: MD5, HMAC, SHA

In the actual use of the front end, MD5, AES, RSA, and custom encryption functions are used most frequently

The sequence of several encryption methods: use an asymmetric encryption algorithm to manage the key of the symmetric algorithm, then use a symmetric encryption algorithm to encrypt data, and use a signature algorithm to generate an asymmetric encryption summary

The incoming messages or keys of DES, DES3, AES, RSA, MD5, SHA, and HMAC are all of bytes data type, and those that are not of bytes data type need to be converted first; the key is generally a multiple of 8

In the implementation of RSA in Python, there are methods for generating signatures and proofreading signatures in the rsa library

Security: DES<DES3=AES<RSA, as for MD5, SHA, HMAC, it’s hard to say.
Search for other keywords such as RSA, encrypt, especially encrypt

Where this.exponent is the RSA encryption offset, the value is generally in the HTML file, the global search, its value is

The value of the key is generally an element value in the source code of the web page. Global search, its value is, or found in js

3. Reverse decryption solution ideas:

(1) It is based on encryption parameters, such as "x-uab" keyword to search in all keys

Open the chrome browser and press F12

Find and click the source, press ctrl + shift + F shortcut key, enter x-uab to find the js code

Next, make a breakpoint to debug: click on the number, and a blue dot will appear at the number position, indicating that the breakpoint is added successfully

Then refresh the page to get the store list, the program will stop at the breakpoint, debug the o.getUA() function on the console, and look at the output

Continue to look down at the reference of this keyword parameter js generation function, and push back step by step to find the most original js generation method

After finding it, take out all the code of this function method and save it as a js file.

(2) How to execute js script with python?

Method 1:
Understand the generation process by yourself, whether it is md5 or AES encryption, find the key value, timestamp and other parameters can also be generated. Those who have done background development will be more clear about the generation principle, because it needs to connect the front-end and back-end interfaces

Method 2: execjs
Because in the script copied above, only one method is defined and this method is not called, so some code should be added at the end of the js file to call

function getParam() {
    
    
 var a;
 var param = e(2,a);
 return param
};

Python code
Principle: Replace the execjs engine with PhantomJS, a headless browser. In other words, use PhantomJS to execute js scripts. PhantomJS is a browser, and it will naturally create a window object.

import execjs
  
import os
os.environ["EXECJS_RUNTIME"] = "PhantomJS"
node = execjs.get()
file = 'eleme.js'
ctx = node.compile(open(file).read())
js_encode = 'getParam()'
params = ctx.eval(js_encode)
print(params)

Without the PhantomJS approach

import execjs
  
node = execjs.get()
file = 'eleme.js'
ctx = node.compile(open(file).read())
js_encode = 'getParam()'
params = ctx.eval(js_encode)
print(params)

An error may be reported: execjs._exceptions.ProgramError: TypeError: 'window' is undefined
Reason: The window object is estimated to be created when the browser is opened, and contains browser information, so when using Python to execute this code, there is no such object
method three:

The idea is similar to Plan 2, but more brutal. Because it is not executed in the browser, it simulates the browser to execute.

Before execution, also modify the js script, call the e method at the end of the js file, and add the following code, for example:

var a;
var param = e(2,a);
return param;

Note: Do not put the calling method in any function. I put this code in the function to enforce the execution before. The result is that the encrypted string can be obtained in the browser, but None is obtained in Python.

Simulate the selenium and chrome webDriver used by the browser, the code is as follows:

from selenium import webdriver
  
browser = webdriver.Chrome(executable_path='chromedriver.exe')
with open('eleme.js', 'r') as f:
     js = f.read()
print(browser.execute_script(js))

Finally get the encrypted string

Finally, it is necessary to say that if you need to obtain a large amount of x-uab, the efficiency of the third option will be higher, because if you use the second option, you can open a browser (all call a webdriver object), and then quickly execute js, return Encrypted string.

Guess you like

Origin blog.csdn.net/G_GUi/article/details/127344019