js reverse case-jy/woff.2 of zzjg

1. Case Analysis

  • As shown in the figure, the data corresponding to this request is studied, that is, the data on the details page. Case URL: aHR0cHM6Ly9zcy5jb2RzLm9yZy5jbi9tb2JpbGUvc2hhcmVEZXRhaWwvYmE4Yzc0YmJiODY4Nzc1NjM4NGExMDkyMzdlN2NjNmYvNGQ=
    insert image description here

  • This case is 某验反爬about the js reverse of a test, which has been introduced in this article , and the repeated part will not be introduced. 无感Just click the button to pass the verification directly, and no other verification codes will pop up to verify again. This article is an actual slider application case, not a test case on the official website, so I will add some previously missed pits (thanks to the 时光大佬idea test suggestions given by the pitfalls during the research process)

  • The approximate request process analysis is shown in the figure:
    insert image description here
    insert image description here

  • Maybe most people only get the validate after passing the demo test on the official website, but the verification in this case is relatively strict, and there will be a phenomenon that if you don’t get the 3 w values ​​completely according to the above picture, but take the first The w value of the ajax is left blank, trying to skip the insensitivity and go directly to the logic of the slider, and you can also get validate.但是!你可能会发现一个困惑,为什么有validate了,却拿不到数据?

  • Unable to get data refers to the situation in the figure. The generated validate has a value and the response result can be obtained, but there is no value in the response result.
    insert image description here

  • After many tests, there are probably these pits:

    • ① Only the last w value is generated for the 3 w values. The previous w value process has not gone through. In fact, it plays a aes_keykey role through the 3 w values, that is, the aes_key must carry the w value in the first get request for an activation process
    • ② There is a parameter in the second w encryption parameter called captcha_token (the request of the non-inductive verification step), the w value in the request of the first ajax interface is related to this, and captcha_token is related to the fullpage.js file code captcha_token( Generated according to some code in this file), because it is generated according to the js file code, so theoretically a version of captcha_token is fixed
    • ③ There is a random value in the third w encryption parameter that cannot be written to death. For example {"rm1y":"1346065000"}, its name and value are dynamically changed, because the gct.js file code is dynamically changed, so the random value will be different every day

Second, ast restore confusion

  • The reason for de-obfuscation: It is possible to just de-obfuscate the code, but it will take longer. There are also ready-made de-obfuscation codes on the Internet, so we use it directly, so that the analysis speed will be faster. The ast de-obfuscation article is recommended
  • Due to a small problem when the predecessor's ast code is running, an error will be reported. In the second picture below, I made a random guess and added some code as a solution.
    insert image description here
    insert image description here
  • According to the logic of senior AST deobfuscation, the general process is as shown in the figure. First, prepare a js that needs to be deobfuscated, and then replace and modify the parts in the following box accordingly, and then restore the outputfullpage.9.0.9.js
    insert image description here
  • Of course, the restored fullpage9.0.9.js uses fiddler 替换到网页上是没法校验通过的, because the encryption of the w value of ajax for the first time captcha_tokenis to convert the function code into a string, so the js you replace after restoration changes the original code structure, so Will always fail to verify the pop-up slider
    insert image description here
    insert image description here
  • The solution is to change fullpage.jsthe encrypted value of this part to the correct one as shown in the figure, and then replace it on the web page to pass the verification
    insert image description here
  • At this point, the ast de-obfuscation js is over, and the logic of other slide.js/clcik.js/gct.js restoration is consistent

Three, 3 w value positions

  • 第一个w值, which is carried in the first get request
    insert image description here

  • 第一个w值, generated in fullpage.9.0.9.js, the search "\u0077":can be located, the encrypted parameter comes from the response value of the gettype request,insert image description here

  • 第一个w值, use the deobfuscated js to directly search for "w" to locate
    insert image description here

  • 第一个w值, the effect is almost like the screenshot below
    insert image description here

  • 第二个w值, which is in the first ajax request, and the request parameters depend on the response parameters in the first get request
    insert image description here

  • 第二个w值, generated in fullpage.9.0.9.js, almost in the position shown in the figure, is to get 1432
    insert image description here
    insert image description here

  • 第二个w值, search (), new Date()),or directly search 1432 to locate the position of 1432, the encrypted parameter comes from the response value of the first get request

insert image description here

  • 第二个w值, use the deobfuscated js to directly search for "captcha_token" to locate
    insert image description here

  • 第二个w值, captcha_token encrypts the key code part. Note that the encryption result here should be viewed in a non-obfuscated way. The result of the restored js encryption in the picture is an error value, because the captcha_token encryption is to convert the non-formatted function code into a string encryption
    insert image description here

  • 第二个w值, the effect is almost like the screenshot below. There is a captcha_token here that is related to the fullpage.9.0.9.js code. Generally, it can be written to death without revision
    insert image description here

  • 第三个w值_滑块, which is in the second ajax request, and the request parameters depend on the response parameters in the second get request
    insert image description here

  • 第三个w值_滑块, generated in slide.7.8.6.js, the search "\u0077":can be located, the encrypted parameter comes from the response value of the second get request
    insert image description here

  • 第三个w值_滑块, use the deobfuscated js to directly search for "w" to locate
    insert image description here

  • 第三个w值_滑块, the key code part where the random value is generated is as follows
    insert image description here

  • 第三个w值_滑块, the effect is almost like the screenshot below. There is a random value rm1y here, which may also be xaof. The key name and value are changed every day. It cannot be simply written to death, otherwise there will be no data, which is mainly related to the code of gct_js
    insert image description here

  • 第三个w值_点选, which is in the second ajax request, and the request parameters depend on the response parameters in the second get request
    insert image description here

  • 第三个w值_点选, generated in click.3.0.4.js, the search "\u0077":can be located, the encrypted parameter comes from the response value of the second get request
    insert image description here

  • 第三个w值_点选, the key code part of the trajectory generation position is as follows, s is the value after encrypting the trajectory
    insert image description here
    insert image description here

  • 第三个w值_点选, the key code part where the random value is generated is as follows
    insert image description here

  • 第三个w值_点选, the effect is almost like the screenshot below, here is a random value fp0u may also be qs48, the key name and value are changed every day, cannot be simply written to death, otherwise there will be no data, mainly related to the code of gct_js
    insert image description here

4. Analysis of dynamic font woff.2

  • The previous analysis is the sharing URL on the app side, and the corresponding URL on the web page: aHR0cHM6Ly93d3cuY29kcy5vcmcuY24v. The list page has font anti-climbing, but the detail page does not, so the font anti-climbing on the list page has little effect. This time, I just tried this font How to deal with anti-climbing, as shown in the figure, Chinese characters have a set of fonts that are anti-climbing, numbers and letters have another set of fonts that are anti-climbing
    insert image description here

  • As shown in the figure, the font anti-climbing is in the format of woff.2. From the figure, we can find that the Chinese character font woff2 is a static file, while the digital letter woff2 is a dynamic file.
    insert image description here

  • For the file format of woff.2, it cannot be opened by directly downloading it to the online FontEditor tool. You need to convert woff2 to ttf first , and then open it in the online FontEditor tool
    insert image description here
    insert image description here

  • As python解析woff.2后缀for the font file, you can start the package from fontTools.ttLib.woff2 import decompressto convert the woff.2 file into a ttf file. The detailed use of fontTools , including the code used in the picture below, is in this article

    from fontTools.ttLib import TTFont
    from fontTools.ttLib.woff2 import decompress
    
    
    woff2_path = "./woff/704224.woff2"
    ttf_path = './woff/704224.ttf'
    xml_path = './woff/704224.xml'
    decompress(woff2_path, ttf_path)  # 将woff2文件转成ttf文件
    font = TTFont(ttf_path)
    font.saveXML(xml_path)
    
  • 汉字字体ttf分析: For the font anti-climbing in this case, our solution is as follows, directly 乱码use it 对应的ttf文件to find the font result and convert it into a picture
    insert image description here

  • 汉字字体ttf分析通用的识别方案: Add an ocr to recognize the text in the picture, but the ocr recognition may be wrong, so you can do a mapping and proofreading of the recognition result, collect fonts, etc.
    insert image description here

  • In this case, there is a special solution through observation, as shown in the figure, it actually corresponds to a garbled string , and ord('㑁')the result is actually 13377, and it is actually the code 13377recognized in ttf ; that is to saycmap_codecmap_code码=ord('㑁'),网页上看到的乱码汉字都是chr(cmap_code)码转换而得
    insert image description here

  • Looking at the ttf file in this case, you will find that not only garbled characters are stored in the ttf file, but also real characters are stored. The real characters are in the second half, and the garbled characters are in the first half, so even if the ttf file is dynamic, this method can still matched to the corresponding real value
    insert image description here

  • As long as the source code response.text is replaced later, the following recognition results can be obtained
    insert image description here

Guess you like

Origin blog.csdn.net/weixin_43411585/article/details/124124736