How Python crawlers deal with Cloudflare mailbox encryption

Recently I wrote a small crawler, I need to get the mailbox information, but I found that I can't get it, and it is not an ajax interface. Finally, I checked the information and found that it was encrypted by Cloudflare . If there is encryption, there must be decryption.

Lemon indifference:

This decryption method is converted from that js code

The main sentence

for (e = ”, r = ‘0x' + a.substr(0, 2) | 0, n = 2; a.length – n; n += 2) e += ‘%' + (‘0' + (‘0x' + a.substr(n, 2) ^ r).toString(16)).slice( – 2);

After the first 2 digits are converted into hexadecimal, it is the secret key. After the next two characters are XORed with the secret key, they are converted into hexadecimal, and then converted into characters

Finally, put all the solved characters together, and you will get the mailbox [email protected]

js here is using urlencode

The following is the reproduced js decryption code:

function jiemi(val) { for (e = '', r = '0x' + val.substr(0, 2) | 0, n = 2; val.length - n; n += 2) e += '%' + ('0' + ('0x' + val.substr(n, 2) ^ r).toString(16)).slice(-2); return decodeURIComponent(e)}


Then call js through python to get it perfectly:

import execjs def get_js(): # f = open("./../js/my.js",'r', encoding='utf-8') # Open the JS file f = open("./jiemi. js",'r', encoding='utf-8') # Open the JS file line = f.readline() htmlstr ='' while line: htmlstr = htmlstr+line line = f.readline() return htmlstr def get_des_psswd( e): js_str = get_js() ctx = execjs.compile(js_str) #Load JS file return (ctx.call('jiemi', e)) #The first parameter of calling js method is the method name of JS, the data behind And key is the parameter of the js method if __name__ =='__main__': print(get_des_psswd(e='30515253705152531e535f5d'))


The above is the whole content of this article, I hope it will be helpful to everyone's study, and I hope you can support it.


Guess you like

Origin blog.51cto.com/14825302/2542688