JS reversely searched for the encryption algorithm that generates the bid variable, and the operation was as fierce as a tiger, but it turned out to be obfuscated code

Share my recent JS reverse experience.

I recently used Python to crawl a certain link on a certain website, and the status_code obtained by sending the request with the get of requests was not 200, and the request failed. View the detailed information of the link in the developer tools of the browser, and see that the cookie must add the qgqp_b_id parameter and other parameters generated by random numbers to obtain data, as shown in the figure below.

This parameter is a 32-bit string. By analyzing the source code of the web page through the developer tool, I learned that the qgqp_b_id variable is not returned by the server to the client, but calculated by a certain piece of code in the front-end JS. After a period of time setting breakpoints, single-step tracking, and repeated debugging (10,000 words are omitted here), it is found that this variable comes from the bid variable. Where is the bid calculated?

At first I thought that the 32-bit string is the MD5 algorithm, but after in-depth reverse analysis of the JS code, I jumped around in multiple JS codes such as jQuery.js, usercollect.min.js, require.min.js, main.js, etc. , finally found an encryption algorithm of x64hash128 in usercollect.min.js:

 Then search for x64hash128 in this JS code to find that the variable r is encrypted:

Then look up the variable r, it can be seen that r = e, e is a lot of JS functions, when obtaining the parameters of the current operating environment, such as user-agent, system language, display color depth, screen pixel ratio, screen resolution, etc., these Information is concatenated with ~~~ symbols.

The resulting strings are very long, and the strings generated by the two functions canvasKey(e) and webglKey(e) account for the most. Look up canvasKey(e), and find a smiley face symbol inside, hey, very interesting:

This canvasKey(e) function generates a base64 encrypted string. I base64 decrypted it and got a PNG image.

Look at another function webglKey(e), which also generates a picture and then converts it to base64 ciphertext. The picture is as follows:

Haha, in order to encrypt layer by layer, it is necessary to draw a picture of the code and then convert it into ciphertext for further encryption. As a JS novice, I have seen you for a long time.

This is not over yet. It says r = e above. The last element of e is to obtain the font name list through the fontskey (e) function. The u in the function is to set a long list of font names. The function will search the existing font names from the current system. Installed font names, and then filter out the font names that exist in the system from the u font name list to form a string of font names.

The bid variable is matched by the system environment information, plus two real-time generated images converted to base64 ciphertext, plus a list of font names, and then performs x64hash128() encryption operation.

The x64hash128() encryption operation also calls x64Add, x64Multiply, x64Xor, x64Rotl and other sub-functions for comprehensive calculation... I am afraid you will not!

Just looking at it took my breath away. If you want to convert so many of their codes into Python codes, I believe that 99% of people have been persuaded. I didn't convert it anyway.

In the subsequent debugging, I tried to change the qgqp_b_id parameter to a random character, and then added it to headers and cookies, and then tried whether requests.get() could make a successful request. Hey! The returned status_code is 200! You can connect successfully!

I tried to change the value of qgqp_b_id randomly, for example: qgqp_b_id = '9527-3547-709394' (Cantonese, you understand). Also connected successfully! Subsequent access to information will be a matter of course.

What does this mean?

1. After a large round of reverse JS analysis, the operation was as fierce as a tiger, and it was found that these encrypted codes were originally obfuscated codes, which were used to scare Xiaobai. The so-called qgqp_b_id variable calculated by encryption is sent to the server with it, but the server does not verify it at all, but only requires that each subsequent request should carry this variable, so just make up a string of "9527" You can also get away with it! What a jerk!

2. Shit luck. It was really pure luck this time. A series of encrypted codes turned out to be paper tigers. The next time JS reverses other websites, it may not be so lucky. You must carefully analyze, debug and judge, otherwise you will return 400 every minute. , 403, 500 and other error codes torment the dead.

3. Why do some websites take so long to load. Of course, the content of the website is much richer than before, and high-resolution pictures and videos consume a lot of time. In addition, a lot of JS codes need to be calculated. Even for a high-end computer, the time it takes to load a web page containing a large amount of JS code is not much different from the time it took a low-end computer to load a small amount of JS code more than ten years ago. In order to improve the security of the current website, the JS code is designed more and more complex, and a considerable part of the loading time is spent on running the JS code.

4. The wave of information collection mentioned in this article is called fingerprint2 user fingerprint, you can refer to the following article to learn more:

The user fingerprint generated by fingerprint2 repeatedly steps on the pit - Nuggets

Guess you like

Origin blog.csdn.net/Scott0902/article/details/129058285