Method 1, use Selenium to simulate Google Translate to automatically translate keywords
1. Use selenium to simulate opening Google translation
from selenium import webdriver
import time
driver = webdriver.Chrome()
kw_text = '我爱你'
driver.get('https://translate.google.cn/#view=home&op=translate&sl=auto&tl=en&text=' + kw_text)
time.sleep(3)
ele = driver.find_element_by_css_selector('span[jsname="W297wb"]')
print(ele.text)
2. Loop batch translation
from selenium import webdriver
import re
import time
import random
with open('en.txt', encoding='utf-8') as f:
lines = f.readlines()
driver = webdriver.Chrome()
# driver.maximize_window()
n = 0
every_time_trans_nums = 100
kw_num = len(lines)
while n < kw_num:
kw_text = ''
keywords = lines[n:n + every_time_trans_nums]
for i in keywords:
kw_text += i.replace(' ', '%20') + '%0A'
try:
# tl可以改为es,en,id,fr等
driver.get('https://translate.google.cn/#view=home&op=translate&sl=auto&tl=vi&text=' + kw_text)
time.sleep(3)
ele = driver.find_element_by_css_selector('span[jsname="W297wb')
print(ele.text)
except Exception as e:
print(e)
time.sleep(5)
else:
with open('vn.txt', 'a', encoding='utf-8') as f:
f.write(ele.text + '\n')
finally:
if kw_num - n < every_time_trans_nums:
every_time_trans_nums = kw_num - n
else:
n += 50
if n % 500 == 0:
print('已翻译完成 [%s]' % n)
time.sleep(random.random())
driver.quit()
Method 2, using the googletrans library
1. Install googletrans
pip instal googletrans
2. Cyclic translation keywords
from googletrans import Translator
translator = Translator()
lange = 'en'
with open('Google搜索字词包含sensor.txt', encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
try:
result = translator.translate(line.strip(), dest=lange)
with open(lange + '_Google搜索字词包含sensor.txt', 'a', encoding='utf-8') as f:
f.write(result.text + '\n')
except Exception as e:
with open(lange + '_Google搜索字词包含sensor_error.txt', 'a', encoding='utf-8') as f:
f.write(line.strip() + '\n')
Method 3, use the pygtrans library (this requires a foreign network)
1. Install pygtrans
# coding=utf8
from pygtrans import Translate
import sys
import os
client = Translate()
lange = os.path.split(sys.argv[0])[0][-2:]
print('正在翻译语种:' + lange)
keywords = 'dingqinghua.txt'
with open(keywords, encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
try:
text = client.translate(line, target=lange)
with open(lange + '_' + keywords, 'a', encoding='utf-8') as f:
f.write(text.translatedText + '\n')
except Exception as e:
with open('error_'+lange + '_' + keywords, 'a', encoding='utf-8') as f:
f.write(line.strip() + '\n')
After testing, the three methods have the following advantages and disadvantages.
Method 1: The speed is the fastest, but selenium needs to be installed , and there are certain technical requirements. There is also a point that Google can detect the use of script translation. The translation result is a literal translation , which is somewhat different from the manual translation result. diff, which can be used if the translation is not demanding.
Method 2: You need to use a foreign network. If the foreign network is disconnected, the translation result will be a direct machine translation, which is not what I want. After a long time, the translation will stop, which does not meet my requirements.
Method 3: This should be written by a domestic tycoon, calling the Google.cn interface to translate Chinese into English, and our need is to translate English into other minor languages, so some modifications to the source code are required, and foreign networks are also required , I added the agent directly in the source code, and the modification is as follows.
proxies = {
'http': '127.0.0.1:10809', 'https': '127.0.0.1:10809'}
def __init__(
self,
target: str = 'en',
source: str = 'auto',
_format='html',
user_agent: str = None,
domain: str = 'com',
proxies: Dict = proxies
):
Summarize
The third method currently used for translation is to store the corresponding language in such a directory structure, just run the script directly, and it will automatically use the directory as the target language for translation