Teach you to use Python crawler to make your own translation dictionary

      There are many functions that a Python crawler can achieve, depending on how you use it. Today, Xiaoqian will teach you how to use Python crawler to make a self-made translation dictionary.

      First open the Youdao translation page, try to enter hello, and the translation appears, and it can be translated automatically. A classmate wrote a crawler to request the link above. Sister Song is about to make a mistake. Because this is achieved through ajax. How to achieve it? Google Chrome F12 or right click and select -> Check.

1

      If you click on the request, you can see the picture below:

2

      Next comes the important point, and the difficulty comes! ! ! What do salt, sign, ts, and bv mean? Can I not carry it?

      Actually this is created in js and loaded here every time it is requested? How do you know what this value is? At this time we have to view the source code, right click ---> view source code:

3

      The js of this file is what we are looking for, click to open the js file to see what it is?

4

      Wow, what the hell is this? Just scare everyone away. At this time, the place to test the patience and confidence of the home is here. We have tools to format these directly. Just click on the webmaster tool. Remember to copy and paste all the code into it and format it directly. Copy the formatted to your pycharm, and then strl+f search for'sign'. (Remember to follow the steps)

5

6

7

      It can be analyzed:

      ts: is the timestamp, and the timestamp of js is 13 digits such as: 1583934479084, python's time.time() is floating-point, such as: 15583344480.832445, if you want to get the timestamp of js, you must: r= str(int(time .time() * 1000))

      bv: n.md5(navigator.appVersion) indicates that the browser version is md5 encrypted

      salt: r + parseInt(10 * Math.random(), 10); represents the timestamp splicing a random number of 0-9

      The python implementation is: f = r + str(random.randint(0, 9))

      sign:n.md5("fanyideskweb" + e + i + "Nw(nmmbP%A-r6U3EUn]Aj")

      Where e represents the content to be queried, at this time if we want to query the Great Wall, then e='Great Wall'

      i is your salt value. Use md5 encryption after splicing with the previous constant value "fanyideskweb" and the following constant value "Nw(nmmbP%A-r6U3EUn]Aj"

      Supplement python's md5 encryption:

8

      At this point, all the content we need is ready!

      The response obtained through the request is in json format. At this time, bejson.com can be used for formatting

9

      So we can easily extract data through json, where'tgt' is the translated content.

10

      Not to mention so much on the code:

11

      Everyone will run the code and see: The result of {"errorCode":50} is not translated successfully! Why is this?

      When everyone crawls, they will link: url ='http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'

      Remember to remove the'_o' in the middle, this is also a mechanism for a good anti-pickup. (This may be the place that bothers many classmates)

I hope that the sharing of this article can help friends who learn Python. If you want to know more about it, you can leave a message below.

This article is from Qianfeng Education , please indicate the source for reprinting.


Guess you like

Origin blog.51cto.com/15128702/2679107