Table of contents
1. Case Analysis
-
As shown in the figure, the data corresponding to this request is studied, that is, the data on the details page. Case URL: aHR0cHM6Ly9zcy5jb2RzLm9yZy5jbi9tb2JpbGUvc2hhcmVEZXRhaWwvYmE4Yzc0YmJiODY4Nzc1NjM4NGExMDkyMzdlN2NjNmYvNGQ=
-
This case is
某验反爬
about the js reverse of a test, which has been introduced in this article , and the repeated part will not be introduced.无感
Just click the button to pass the verification directly, and no other verification codes will pop up to verify again. This article is an actual slider application case, not a test case on the official website, so I will add some previously missed pits (thanks to the时光大佬
idea test suggestions given by the pitfalls during the research process) -
The approximate request process analysis is shown in the figure:
-
Maybe most people only get the validate after passing the demo test on the official website, but the verification in this case is relatively strict, and there will be a phenomenon that if you don’t get the 3 w values completely according to the above picture, but take the first The w value of the ajax is left blank, trying to skip the insensitivity and go directly to the logic of the slider, and you can also get validate.
但是!你可能会发现一个困惑,为什么有validate了,却拿不到数据?
-
Unable to get data refers to the situation in the figure. The generated validate has a value and the response result can be obtained, but there is no value in the response result.
-
After many tests, there are probably these pits:
- ① Only the last w value is generated for the 3 w values. The previous w value process has not gone through. In fact, it plays a
aes_key
key role through the 3 w values, that is, the aes_key must carry the w value in the first get request for an activation process - ② There is a parameter in the second w encryption parameter called captcha_token (the request of the non-inductive verification step), the w value in the request of the first ajax interface is related to this, and captcha_token is related to the fullpage.js file code
captcha_token
( Generated according to some code in this file), because it is generated according to the js file code, so theoretically a version of captcha_token is fixed - ③ There is a random value in the third w encryption parameter that cannot be written to death. For example
{"rm1y":"1346065000"}
, its name and value are dynamically changed, because the gct.js file code is dynamically changed, so the random value will be different every day
- ① Only the last w value is generated for the 3 w values. The previous w value process has not gone through. In fact, it plays a
Second, ast restore confusion
- The reason for de-obfuscation: It is possible to just de-obfuscate the code, but it will take longer. There are also ready-made de-obfuscation codes on the Internet, so we use it directly, so that the analysis speed will be faster. The ast de-obfuscation article is recommended
- Due to a small problem when the predecessor's ast code is running, an error will be reported. In the second picture below, I made a random guess and added some code as a solution.
- According to the logic of senior AST deobfuscation, the general process is as shown in the figure. First, prepare a js that needs to be deobfuscated, and then replace and modify the parts in the following box accordingly, and then restore the output
fullpage.9.0.9.js
- Of course, the restored fullpage9.0.9.js uses fiddler
替换到网页上是没法校验通过的
, because the encryption of the w value of ajax for the first timecaptcha_token
is to convert the function code into a string, so the js you replace after restoration changes the original code structure, so Will always fail to verify the pop-up slider
- The solution is to change
fullpage.js
the encrypted value of this part to the correct one as shown in the figure, and then replace it on the web page to pass the verification
- At this point, the ast de-obfuscation js is over, and the logic of other slide.js/clcik.js/gct.js restoration is consistent
Three, 3 w value positions
-
第一个w值
, which is carried in the first get request
-
第一个w值
, generated in fullpage.9.0.9.js, the search"\u0077":
can be located, the encrypted parameter comes from the response value of the gettype request, -
第一个w值
, use the deobfuscated js to directly search for "w" to locate
-
第一个w值
, the effect is almost like the screenshot below
-
第二个w值
, which is in the first ajax request, and the request parameters depend on the response parameters in the first get request
-
第二个w值
, generated in fullpage.9.0.9.js, almost in the position shown in the figure, is to get 1432
-
第二个w值
, search(), new Date()),
or directly search 1432 to locate the position of 1432, the encrypted parameter comes from the response value of the first get request
-
第二个w值
, use the deobfuscated js to directly search for "captcha_token" to locate
-
第二个w值
, captcha_token encrypts the key code part. Note that the encryption result here should be viewed in a non-obfuscated way. The result of the restored js encryption in the picture is an error value, because the captcha_token encryption is to convert the non-formatted function code into a string encryption
-
第二个w值
, the effect is almost like the screenshot below. There is a captcha_token here that is related to the fullpage.9.0.9.js code. Generally, it can be written to death without revision
-
第三个w值_滑块
, which is in the second ajax request, and the request parameters depend on the response parameters in the second get request
-
第三个w值_滑块
, generated in slide.7.8.6.js, the search"\u0077":
can be located, the encrypted parameter comes from the response value of the second get request
-
第三个w值_滑块
, use the deobfuscated js to directly search for "w" to locate
-
第三个w值_滑块
, the key code part where the random value is generated is as follows
-
第三个w值_滑块
, the effect is almost like the screenshot below. There is a random value rm1y here, which may also be xaof. The key name and value are changed every day. It cannot be simply written to death, otherwise there will be no data, which is mainly related to the code of gct_js
-
第三个w值_点选
, which is in the second ajax request, and the request parameters depend on the response parameters in the second get request
-
第三个w值_点选
, generated in click.3.0.4.js, the search"\u0077":
can be located, the encrypted parameter comes from the response value of the second get request
-
第三个w值_点选
, the key code part of the trajectory generation position is as follows, s is the value after encrypting the trajectory
-
第三个w值_点选
, the key code part where the random value is generated is as follows
-
第三个w值_点选
, the effect is almost like the screenshot below, here is a random value fp0u may also be qs48, the key name and value are changed every day, cannot be simply written to death, otherwise there will be no data, mainly related to the code of gct_js
4. Analysis of dynamic font woff.2
-
The previous analysis is the sharing URL on the app side, and the corresponding URL on the web page: aHR0cHM6Ly93d3cuY29kcy5vcmcuY24v. The list page has font anti-climbing, but the detail page does not, so the font anti-climbing on the list page has little effect. This time, I just tried this font How to deal with anti-climbing, as shown in the figure, Chinese characters have a set of fonts that are anti-climbing, numbers and letters have another set of fonts that are anti-climbing
-
As shown in the figure, the font anti-climbing is in the format of woff.2. From the figure, we can find that the Chinese character font woff2 is a static file, while the digital letter woff2 is a dynamic file.
-
For the file format of woff.2, it cannot be opened by directly downloading it to the online FontEditor tool. You need to convert woff2 to ttf first , and then open it in the online FontEditor tool
-
As
python解析woff.2后缀
for the font file, you can start the packagefrom fontTools.ttLib.woff2 import decompress
to convert the woff.2 file into a ttf file. The detailed use of fontTools , including the code used in the picture below, is in this articlefrom fontTools.ttLib import TTFont from fontTools.ttLib.woff2 import decompress woff2_path = "./woff/704224.woff2" ttf_path = './woff/704224.ttf' xml_path = './woff/704224.xml' decompress(woff2_path, ttf_path) # 将woff2文件转成ttf文件 font = TTFont(ttf_path) font.saveXML(xml_path)
-
汉字字体ttf分析
: For the font anti-climbing in this case, our solution is as follows, directly乱码
use it对应的ttf文件
to find the font result and convert it into a picture
-
汉字字体ttf分析通用的识别方案
: Add an ocr to recognize the text in the picture, but the ocr recognition may be wrong, so you can do a mapping and proofreading of the recognition result, collect fonts, etc.
-
In this case, there is a special solution through observation, as shown in the figure, it
事
actually corresponds to a garbled string㑁
, andord('㑁')
the result is actually13377
, and it is actually the code13377
recognized in ttf ; that is to saycmap_code
cmap_code码=ord('㑁'),网页上看到的乱码汉字都是chr(cmap_code)码转换而得
-
Looking at the ttf file in this case, you will find that not only garbled characters are stored in the ttf file, but also real characters are stored. The real characters are in the second half, and the garbled characters are in the first half, so even if the ttf file is dynamic, this method can still matched to the corresponding real value
-
As long as the source code response.text is replaced later, the following recognition results can be obtained