1. Introduction
Website: http://fanyi.youdao.com/
Effect: Simulate web form submission and realize real-time translation
Use frame: requests
Difficulty factor: ✩✩✩
2. Tutorial
1 Introduction
As a well-known translation company in China, Youdao Translation has also opened an online translation website. The goal of our crawler this time is to crawl the form submission that simulates Youdao's translation and achieve the effect of real-time translation.
2. Website Analysis
website homepage
Try to translate
Crawl network request
By searching and discovering that there is a result we need in this request, then we can achieve the desired effect when we get this request.
Analysis form
Analyze form parameters through different requests
Through different requests, we can find the changed data of the form:
- i : the translated text
- salt : timestamp
- sign : MD5 encrypted ciphertext
- LTS : timestamp than salt more of a
- bv : MD5 encrypted ciphertext
Breakpoint debugging to find form data
Use Ctrl + shift + F to find the JS file where the keyword is located
After finding the file, Ctrl + F looks for keywords, and then breaks at the keywords found.
Go back to the homepage and request again. You can find that the program webpage is paused. At this time, the breakpoint we hit took effect.
Returning to the debugger, there is no change, which means that no salt value is generated when the program runs to the breakpoint we hit. The straight point is the wrong place for the breakpoint. The next work is to repeat the above work.
When we hit this position, we finally found something amazing. All the values we need are found here, then the JS code here is what we need to crack:
Analyze JS code
var r = function(e) {
var t = n.md5(navigator.appVersion) // navigator.appVersion的值为User-Agent,对该值进行md5加密
, r = "" + (new Date).getTime() // 获取当前时间戳
, i = r + parseInt(10 * Math.random(), 10); // 时间戳和一位随机数进行字符串拼接
return {
ts: r,
bv: t,
salt: i,
sign: n.md5("fanyideskweb" + e + i + "]BjuETDhU)zqSxf-=B#7m") // 字符串拼接后进行md5加密
}
};
Python code implementation
import time
import random
from hashlib import md5
data = {
"i": "爬虫",
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
enc = md5()
enc.update(
"5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 "
"Safari/537.36".encode())
data['bv'] = enc.hexdigest()
data['lts'] = time.time() * 1000
data['salt'] = data['lts'] + random.randint(0, 9)
enc = md5()
sign = f"fanyideskweb{self.keyword}{data['salt']}]BjuETDhU)zqSxf-=B#7m"
enc.update(sign.encode())
data['sign'] = enc.hexdigest()
After the form forgery is complete, we can request data. The data request part is relatively simple, so I won’t post a detailed tutorial here. The specific code can be viewed below.