Obtain Google Translate API reversely by capturing packets

first attempt

  • Google's translation API is always changing, we need to find Google's translation API by ourselves, this is the safest solution

  • First, open Google's translation interface with Google Chrome to see how it requests data

  • Right-click to check, enter the developer tool, and select the network (network), as shown in the figure

  • Click clear to clear all previous requests visually, so that we can find the real API request later

  • Google Translate will request about once per second, and we will soon get the translation result returned to us by the Google server. After investigation, the following request at the beginning of batchexecute is what we need.

  • Let's click on it for further analysis, and we can see that this is a post request

  • From the payload, you can see the parameters used when sending the post request, which is a form data

  • From the preview, you can see the translation result, which is a data in JSON format

  • 2022.12.26

  • The above method is applicable, but now this interface is not as easy to find as before. We use Google Translate provided on the Google search engine to find this interface

  • Let's test a translation on the web page first, and you can see that the result is returned

  • This is a post request, you send the text you need to translate to the next URL, and the server will give you back the corresponding result

Google Translate API related information

send url

### https://www.google.com/async/translate?vet=12ahUKEwjp-9mwmZf8AhXT0GEKHQc7Cs8QqDh6BAgFECw..i&ei=6YepY6njDdOhhwOH9qj4DA&yv=3&cs=0
  • Although this URL is long, the core part should just be
### https://www.google.com/async/translate
  • The following parameters are not very useful, but we will keep them here first, and then do the streamlined verification later

submitted data

### async=translate,sl:zh-TW,tl:zh-CN,st:1111,id:1672054875193,qc:true,ac:true,_id:tw-async-translate,_pms:s,_fmt:pc

  • Use postman to construct a post request

But it failed unexpectedly

  • What is returned is a 404 webpage, not the translation result I want,

  • I guess there should be no relationship between adding cookies, let's add cookies in the parameters
  • But I didn't expect to fail after adding cookies

experiment

Remove parameters

  • After removing all the parameters, it is found that the results can still be obtained normally

Remove Headers

  • After removing the Content-Length, the result cannot be returned #The headers parameters set are not enough

  • Conclusion: There should be as many parameters in the headers as possible, and it is best to bring all the parameters in the original request.

  • Write the corresponding Python code

the code

import requests 
 
url = "https://www.google.com.hk/async/translate" 
 
payload = "async=translate,sl:en,tl:zh-CN,st:1111,id:1672056488960,qc:true,ac:true,_id:tw-async-translate,_pms:s,_fmt:pc" 
headers = { 
  'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"', 
  'DNT': '1', 
  'sec-ch-ua-mobile': '?0', 
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36', 
  'sec-ch-ua-arch': '"x86"', 
  'sec-ch-ua-full-version': '"108.0.5359.125"', 
  'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8', 
  'sec-ch-ua-platform-version': '"10.0.0"', 
  'sec-ch-ua-full-version-list': '"Not?A_Brand";v="8.0.0.0", "Chromium";v="108.0.5359.125", "Google Chrome";v="108.0.5359.125"', 
  'sec-ch-ua-bitness': '"64"', 
  'sec-ch-ua-model': '', 
  'sec-ch-ua-wow64': '?0', 
  'sec-ch-ua-platform': '"Windows"', 
  'Accept': '*/*', 
  'X-Client-Data': 'CKW1yQEIhbbJAQiktskBCMS2yQEIqZ3KAQjb08oBCLD+ygEIlaHLAQjv8swBCN75zAEI5PrMAQjxgM0BCLKCzQEI7ILNAQjIhM0BCO+EzQEIt4XNAQ==', 
  'Sec-Fetch-Site': 'same-origin', 
  'Sec-Fetch-Mode': 'cors', 
  'Sec-Fetch-Dest': 'empty', 
  'host': 'www.google.com.hk', 
  'Cookie': '1P_JAR=2022-12-26-12; NID=511=eVLI1bG9nhyOZtqU14JBHm5Be00epdxfR4XmfQeehYyIkzgpXi6dbpNY75ZMVyS7aOjoM2oZ5WdoR8eNq6wi1-e_J0NeoyI0dtsHW-_8Ik4PGrqvuGHdcvVC03zTOEK2TY1FZL85Wimo_ZPIE3hGIrmGPSiel6-rRRW9lD30UPs' 
} 
 
response = requests.request("POST", url, headers=headers, data=payload) 
 
print(response.text)
  • It can return our current results normally,
  • But this result is not what we want, we need to analyze it together

Parsing the returned results

  • In fact, it is also very easy to locate, just find the text between <span id="tw-answ-target-text">and </span>
  • Very nice after extraction

full code

def Google_Translate(origin_string): 
  import requests 
 
  url = "https://www.google.com.hk/async/translate" 
 
  payload = "async=translate,sl:en,tl:zh-CN,st:{},id:1672056488960,qc:true,ac:true,_id:tw-async-translate,_pms:s,_fmt:pc".format(origin_string) 
  headers = { 
    'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"', 
    'DNT': '1', 
    'sec-ch-ua-mobile': '?0', 
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36', 
    'sec-ch-ua-arch': '"x86"', 
    'sec-ch-ua-full-version': '"108.0.5359.125"', 
    'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8', 
    'sec-ch-ua-platform-version': '"10.0.0"', 
    'sec-ch-ua-full-version-list': '"Not?A_Brand";v="8.0.0.0", "Chromium";v="108.0.5359.125", "Google Chrome";v="108.0.5359.125"', 
    'sec-ch-ua-bitness': '"64"', 
    'sec-ch-ua-model': '', 
    'sec-ch-ua-wow64': '?0', 
    'sec-ch-ua-platform': '"Windows"', 
    'Accept': '*/*', 
    'X-Client-Data': 'CKW1yQEIhbbJAQiktskBCMS2yQEIqZ3KAQjb08oBCLD+ygEIlaHLAQjv8swBCN75zAEI5PrMAQjxgM0BCLKCzQEI7ILNAQjIhM0BCO+EzQEIt4XNAQ==', 
    'Sec-Fetch-Site': 'same-origin', 
    'Sec-Fetch-Mode': 'cors', 
    'Sec-Fetch-Dest': 'empty', 
    'host': 'www.google.com.hk', 
    'Cookie': '1P_JAR=2022-12-26-12; NID=511=eVLI1bG9nhyOZtqU14JBHm5Be00epdxfR4XmfQeehYyIkzgpXi6dbpNY75ZMVyS7aOjoM2oZ5WdoR8eNq6wi1-e_J0NeoyI0dtsHW-_8Ik4PGrqvuGHdcvVC03zTOEK2TY1FZL85Wimo_ZPIE3hGIrmGPSiel6-rRRW9lD30UPs' 
  } 
 
  response = requests.request("POST", url, headers=headers, data=payload) 
 
  def find_string_between_A_amd_B(string, string_A, string_B):  # 查找两段字符串之间的字符 
    import re 
 
    regular = '{}(.*?){}'.format(string_A, string_B) 
    result = re.findall(regular, string) 
    return result 
 
  result = find_string_between_A_amd_B(response.text, '', '') 
  return result 
 
 
result = Google_Translate('222') 
print("result:", result)


return result

result = Google_Translate(‘222’)
print(“result:”, result)


[外链图片转存中...(img-nfuTyPwt-1672058973251)] 
[外链图片转存中...(img-ICHoHbvS-1672058973252)] 

# 

Guess you like

Origin blog.csdn.net/u014723479/article/details/128449232