Practical analysis of CSS anti-crawling for novels on a certain website

Since I have just started writing js reverse type articles, it is inevitable that there will be some inaccuracies, please understand.

The target this time is the novel interface of hongshu.com. After we entered the official website and found a novel at random, we opened the network request and analyzed the interface
Insert image description here
as shown in the figure. We can see that there is a bookajax.do interface that is questionable, and there are three interfaces. The preliminary judgment may be that the interfaces are calling each other. Let's open the interface first to check the data. We
Insert image description here
see the following data in the second interface. One of the fields, content, is a very long encrypted data, which may be a novel article. In this way, Come on, it’s most likely front-end encryption. Let’s first follow the stack and take a look at the process.

Insert image description here
First, near the interface, we found the place where other interfaces call this interface. After the getchptkey method was successfully called, we then passed in the getchpcontent method call. This confirmed our previous guess.
Insert image description here
Following up, we found that there is a continuous In the third decryption, the decrypted parameter happens to be data.content, and the subsequent parameter key is obtained from the first interface. Then we put a breakpoint here, and we output the decrypted section directly on the console to get a large section. The combination of html tag + Chinese. Then, it is very simple to check the decryption method.
Insert image description here
Just extract the method. At this point, the first layer of encryption is cracked. Next, let’s look at the second layer of encryption, which is the html tag, let’s take a look at the steps below in the process. Below
Insert image description here
, we find that data.other has also been decrypted. I won’t say more about the decryption here. The decryption process is similar to the data.content decryption process. We take the decrypted data. other After setting a breakpoint and then outputting it on the console, it was found that it was a large string of js code. Then according to the analysis below, the web page called an asynchronous js method. This js is exactly this large string of js code. After the execution of this js is completed, The web page is displayed normally, so we suspect that this js is where the previous section is decrypted. After we copied this code and ran it with nodejs, we found that the document environment was missing. After we completed the environment, everything ran normally. At this time, we analyzed Take a look at this code.
Insert image description here
At the end of the code, it is the part that restores the text on the web page. Replace the words variable with the corresponding tags in turn. We will output the words.

Insert image description here
In this way, the truth is basically revealed. Let’s combine the codes.

Insert image description here
Successfully captured and displayed the original text!

Guess you like

Origin blog.csdn.net/qq_36551453/article/details/132875668