Python JS reverse engineering knock-level Baidu translation case

Python JS reverse engineering knock-level Baidu translation case

Later, you may need to learn about JS reverse engineering. If you have time, I will send you how to implement JS reverse engineering for some encrypted websites. Let’s make progress with me!

I'm afraid that some people don't know what JS reverse engineering is. Here is a brief explanation. The first time I hear this word, I may feel that it is so tall, whether it is difficult, my answer is that it is not difficult after you master the skills. In the process of crawler analysis of the website, some data may not be directly obtained through the response of the website. To construct the parameters to access the website, it is to send a request to the other party's server, and then simulate the website to obtain the data, and the step of constructing parameters is In JS reverse engineering, many parameters are generated by JavaScript on the backend of the website. What we have to do is to find out the principle of backend parameter generation.
Whether you understand it or not, let's look at the following analysis. Learning knowledge in actual combat is the most efficient.

1. Analysis link
1. Enter the Baidu translation website, a wave of routine operations. Press F12, select "XHR", enter any text into the translation box, and observe the link to the captured package.
Insert picture description here
You can see that the red box is the ultimate goal of this analysis, construct this parameter list, and then post to visit the corresponding link.
Now the parameters that need to be decrypted seem to be sign and token, but in fact there is only one sign.
We then enter any text in the translation box and find that the token value is unchanged. The subsequent analysis of the JS script will confirm this. The token is a constant.

2. Directly search for the link fragment containing the "sign" parameter to find the code of the generation mechanism. Why is the site fragment here? Generally, the structure of the script is that the site name precedes the function name and variable name. The sign is the variable. It is easier to locate the sign position by retrieving the site fragment.
Insert picture description here
Insert picture description here
After searching, I found that there is only one alternative, click into the script directly, then format the code that is not easy to watch, and press "{}".
Insert picture description here
Insert picture description here
Then you can see that the parameters we need are defined here, and the breakpoints are directly set for me on the right. But to be on the safe side, you should also put a breakpoint on the yellow line.
Next, we refresh the interface, the website will stop at the place where we hit the breakpoint, and some data will be loaded in, and then hover the mouse on the y next to the sign, and you will see a picture like this:
Insert picture description here
Next click to pop The "e" js link in the window jumps to another js script.
Insert picture description here
This function e function is to generate the js code of the sign parameter, and copy the code in the red box to a js file, which is named code.js here. The next job is to write code and debug.

Second, the code link

import requests
import execjs


class BaiduTranslater:
    def __init__(self, sources):
        self.sources = sources  #输入翻译的内容
        self.url = 'https://fanyi.baidu.com/v2transapi?from=zh&to=en' #访问网站
        self.headers = {
    
     #请求头
            'origin': 'https: // fanyi.baidu.com',
            'referer': 'https: // fanyi.baidu.com /?aldtype = 16047',
            'user - agent': #自行添加,
            'cookie': #自行添加
        }
	
    def data_creater(self):	#生成data参数表
        with open("code.js", 'r')as f: #调用从网站上复制下来的js脚本
            content = execjs.compile(f.read()) #编译脚本
        sign = content.call("e", self.sources) #得到sign值
        self.data = {
    
    
            'from': 'zh',
            'to': 'en',
            'query': self.sources,
            'transtype': 'translang',
            'simple_means_flag': 3,
            'sign': sign,
            'token': '1c65e5489209deafd9e0302de91a0010', #系统常量
            'domain': 'common'
        }

    def crawler(self):
        self.data_creater() #先生成data
        res = requests.post(self.url, data=self.data, headers=self.headers)
        res.encoding = 'utf-8'
        print("翻译结果为:", res.json()['trans_result']['data'][0]['dst']) #对应翻译结果


if __name__ == '__main__':
    while True:
        str_input = input("请输入要翻译的内容:")
        if str_input == 'q':
            break
        baidu = BaiduTranslater(str_input)  #实例化百度翻译类
        baidu.crawler() #调用函数进行翻译

It is necessary to mention that execjs requires additional installation, which can be installed by typing "pip install PyExecJS" in cmd.
The JS script is initially shown in the figure below, and the translation result can be extracted according to the obtained json to extract the desired data, here is only a simple extraction of the translation result.
Insert picture description here
The code is a bit long, so I put away some code. Please pay attention to the structure of the js code to make sure that the matching of the braces is correct.
But if you run the code like this, you will find that an error will be reported! !
Insert picture description here
If the error is reported, it's okay, don't panic, but the i in code.js is not defined, and the value of i cannot be found when compiling. Just go back to the previous website interface and continue the analysis.

3. Debugging link
Insert picture description here
We set a breakpoint in the function e function, unload the other breakpoints of BreakPoint, and then refresh. After executing the "Next" (↓) shown in the figure below, the execution result will be displayed step by step.
Insert picture description here
It can be clearly seen that the value of i is this constant. Why can I be sure that it is a constant? It can be observed by floating on window[l], or a statement in the red box, i is a constant that has been calculated.
Insert picture description here
Hover the mouse over the i variable, copy the content in the pop-up window, and add the constant i to our code.js script.
Insert picture description here
However, a new error has been reported, so let’s continue the analysis. After all, I use other people’s js code instead of understanding the principle of js code to implement sign and write python code by yourself, so you have to succumb to website scripts. If you have a JS foundation, you can choose to write the principle in python yourself, but this method is more expensive. It's about time.
We still proceed step by step and continue to run the next step↓
Insert picture description here
Obviously this is the n we need, and n is a function. We copy the code in the red box into code.js, and then run the program.
Insert picture description here
Finally, it's a success! ! !

4. Summary
From this, you may also peep into the real trouble of JS reverse engineering is debugging on the website, looking for the composition of the desired code and parameters, you need to interrupt, execute the next step, query variables and other operations. This is a process that requires patience and time. To be honest, it really requires perseverance. If you don’t know where to start at the beginning of learning, you should find ready-made video teaching. Don't give up because you don't know what to do next. The most important thing is the learning method, so that we can change our knowledge into something within ourselves, that is, the improvement of skills and technology. Skills and technology cannot be achieved in a day, only accumulated over time.
This article refers to the explanation: [Python crawler] Baidu translation JS reverse

Guess you like

Origin blog.csdn.net/weixin_43594279/article/details/107216474