How to call the proper way to translate python "elegant"

Foreword

In fact, eyeing ago 有道翻译了, but due to time issues have never been studied (my show operations still to come , remember to focus), this article explains how to call the proper way translation with python, explain this reptile and proper way translation of JS " 斗争" the process of!

Of course, this is only for the exchange of learning to use for their own entertainment to do some small things, prohibits commercial use! Reproduced please specify micro-channel public number: bigsai. Project github address: https: //github.com/javasmall/python

Here Insert Picture Description

analysis

For a Web site, it must first of all is that analysis, which rules page

Analysis url

Translation proper way to enter the url you will find that it is not changed, that its request by ajax asynchronous interaction. Click F12, is easy in XHR find in this interactive request, click to view the information you'll find a bunch of parameters, several of which still encrypted, Han Han's salt salt. Prior number.
Here Insert Picture Description

01 analysis parameters

We can safely guess: This is certainly a key parameter. We search salt, then click on the normal, expanded formatting, search again in the js salt. Looking to see nearby salt-related can be found breakpoint debugging! Of course, eventually you can find 11 relevant content can be near each breakpoint debugging. You so happy to find the encryption key field position and function.

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

02 analysis parameters

This time, call stack using the browser function, see the stack js performed to find out. Click on the corresponding module to the break point can be observed. Eventually you will find the location generateSaltSign(n)of the function, the main encryption functions are executed inside

Here Insert Picture Description
Here Insert Picture Description

Cryptanalysis

In fact, encryption proper way translation is relatively simple, and you look,

  • 不知道navigator.appVersion是啥是吧,我打印一看。就是浏览器头进行md5加密的嘛,可以固定不变的,也就是说这个bv(t)参数它可以是固定不变的。
  • 这个ts不就是13位当前时间戳吗!
  • 这个salt不就是时间戳后面加上100内的随机数吗,随便取一个就行。
  • 这个sign不就是"fanyideskweb" + 翻译的字符串 + salt + "n%A-rKaT5fb[Gy?;N5@Tj"这么一串串数字然后md5加密的嘛!

通过后面的分析发现这些参数并没有变化。所以这次生成的是唯一的,但是有一个前提是5000字以内,如果超出5000字他会截取前5000字,这点需要注意一下。

Here Insert Picture Description

模拟请求

注意点

既然有了上面的规则,那么咱们就可以通过这部分的规则和抓包的信息整合用python模拟完成js的事件,发送请求。这里面有几点需要注意的。

  • Fristly,你要搞定python中md5加密模块时间time模块,能够做出一些等价的一些转化。刚好,py的hashlibtime模块 can fullfill 你。这个问题解决。
  • In addition,post请求的主体data字典需要进行url编码才能当成data发送请求发过去。
  • last but not least,解决完加密最重要的就是header,大家一定不要麻批大意。这个content—length,经过我的经验告诉我它如果填错了就会报错,并且不填经过抓包分析系统会自动生成。所以不要计算主体长度的,这个参数一定要省略不放cookie会报错,放了cookie经过测试你会发现有些可以该甚至可有可无,有些必须遵从其样式。而cookie中必须遵从的就是[email protected]数字+@+ip形式地址。可能是为了检验而用,这个可以直接进行模拟。

Here Insert Picture Description

请求代码

返回结果是一串json,直接拿即可!

import requests
import hashlib
import time
import urllib.parse
# 创建md5对象
def nmd5(str):
    m = hashlib.md5()
    # Tips
    # 此处必须encode
    # 若写法为m.update(str)  报错为: Unicode-objects must be encoded before hashing
    # 因为python3里默认的str是unicode
    # 或者 b = bytes(str, encoding='utf-8'),作用相同,都是encode为bytes
    b = str.encode(encoding='utf-8')
    m.update(b)
    str_md5 = m.hexdigest()
    return  str_md5
def formdata(transtr):
    # 待加密信息
    headerstr = '5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
    #print(round(time.time()*1000))
    bv=nmd5(headerstr)
    ts=str(round(time.time()*1000))
    salt=ts+'90'
    strexample='fanyideskweb'+transtr+salt+'n%A-rKaT5fb[Gy?;N5@Tj'
    sign=nmd5(strexample)
    #print(sign)
    i=len(transtr)
    #print(i)
    # print('MD5加密前为 :' + headerstr)
    # print('MD5加密后为 :' + bv)
    dict={'i':transtr,'from':'AUTO','TO':'AUTO','smartresult': 'dict',
          'client':'fanyideskweb',
          'salt':salt,
          'sign':sign,
          'ts':ts,
          'bv':bv,
          'doctype':'json',
          'version':'2.1',
          'keyfrom':'fanyi.web',
          'action':'FY_BY_REALTlME'
    }
    return dict


url='http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
 'Referer':'http://fanyi.youdao.com/',
 'Origin': 'http://fanyi.youdao.com',
 'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
 'X-Requested-With':'XMLHttpRequest',
 'Accept':'application/json, text/javascript, */*; q=0.01',
 'Accept-Encoding':'gzip, deflate',
 'Accept-Language':'zh-CN,zh;q=0.9',
 'Connection': 'keep-alive',
 'Host': 'fanyi.youdao.com',
 'cookie':'_ntes_nnid=937f1c788f1e087cf91d616319dc536a,1564395185984; OUTFOX_SEARCH_USER_ID_NCOO=; [email protected]; JSESSIONID=; ___rl__test__cookies=1'
 }
input=input("请输入翻译内容:")
dict=formdata(input)
dict=urllib.parse.urlencode(dict)
dict=str(dict)
#dict=urllib.parse.urlencode(dict).encode('utf-8')

req=requests.post(url,data=dict,headers=header)
val=req.json()
print(val['translateResult'][0][0]['tgt'])

执行结果

Here Insert Picture Description

结语

就这样,我们从0开始优雅的揭开有道翻译的面纱!你可以利用这个做一些有趣的事情(待续------)

当然,这个可能难度不大,对于老鸟老说很简单(勿喷),但是对于新手来说特别适合练手,如果感觉有问题或者不理解的可以通过公众号交流!当然,这个代码不知道能保存多久会失效。所以请抓紧收藏尝试!如果感觉可以还请奉献爱心点点赞!当然,这个只是我脑洞的一个开端,好玩的还在后面!

项目和爬虫仓库github地址,欢迎star和fork!

欢迎关注一波公众号:bigsai 一起学习,一起进步!长期分享更多乐趣!

He published 185 original articles · won praise 1706 · Views 510,000 +

Guess you like

Origin blog.csdn.net/qq_40693171/article/details/103717189