Batch translation of English keywords to other minor languages

Method 1, use Selenium to simulate Google Translate to automatically translate keywords

1. Use selenium to simulate opening Google translation

from selenium import webdriver
import time

driver = webdriver.Chrome()
kw_text = '我爱你'
driver.get('https://translate.google.cn/#view=home&op=translate&sl=auto&tl=en&text=' + kw_text)
time.sleep(3)
ele = driver.find_element_by_css_selector('span[jsname="W297wb"]')
print(ele.text)

2. Loop batch translation

from selenium import webdriver
import re
import time
import random

with open('en.txt', encoding='utf-8') as f:
    lines = f.readlines()

driver = webdriver.Chrome()
# driver.maximize_window()
n = 0
every_time_trans_nums = 100
kw_num = len(lines)

while n < kw_num:
    kw_text = ''
    keywords = lines[n:n + every_time_trans_nums]
    for i in keywords:
        kw_text += i.replace(' ', '%20') + '%0A'

    try:

        # tl可以改为es,en,id,fr等
        driver.get('https://translate.google.cn/#view=home&op=translate&sl=auto&tl=vi&text=' + kw_text)
        time.sleep(3)
        ele = driver.find_element_by_css_selector('span[jsname="W297wb')
        print(ele.text)

    except Exception as e:
        print(e)
        time.sleep(5)

    else:
        with open('vn.txt', 'a', encoding='utf-8') as f:
            f.write(ele.text + '\n')

    finally:
        if kw_num - n < every_time_trans_nums:
            every_time_trans_nums = kw_num - n
        else:
            n += 50

    if n % 500 == 0:
        print('已翻译完成 [%s]' % n)
    time.sleep(random.random())

driver.quit()

Method 2, using the googletrans library

1. Install googletrans

pip instal googletrans

2. Cyclic translation keywords

from googletrans import Translator
translator = Translator()

lange = 'en'

with open('Google搜索字词包含sensor.txt', encoding='utf-8') as f:
    lines = f.readlines()

    for line in lines:
        try:
            result = translator.translate(line.strip(), dest=lange)
            with open(lange + '_Google搜索字词包含sensor.txt', 'a', encoding='utf-8') as f:
                f.write(result.text + '\n')
        except Exception as e:
            with open(lange + '_Google搜索字词包含sensor_error.txt', 'a', encoding='utf-8') as f:
                f.write(line.strip() + '\n')

Method 3, use the pygtrans library (this requires a foreign network)

1. Install pygtrans

# coding=utf8
from pygtrans import Translate
import sys
import os

client = Translate()

lange = os.path.split(sys.argv[0])[0][-2:]
print('正在翻译语种:' + lange)

keywords = 'dingqinghua.txt'

with open(keywords, encoding='utf-8') as f:
    lines = f.readlines()
    for line in lines:
        try:
            text = client.translate(line, target=lange)
            with open(lange + '_' + keywords, 'a', encoding='utf-8') as f:
                f.write(text.translatedText + '\n')
        except Exception as e:
            with open('error_'+lange + '_' + keywords, 'a', encoding='utf-8') as f:
                f.write(line.strip() + '\n')

After testing, the three methods have the following advantages and disadvantages.
Method 1: The speed is the fastest, but selenium needs to be installed , and there are certain technical requirements. There is also a point that Google can detect the use of script translation. The translation result is a literal translation , which is somewhat different from the manual translation result. diff, which can be used if the translation is not demanding.

Method 2: You need to use a foreign network. If the foreign network is disconnected, the translation result will be a direct machine translation, which is not what I want. After a long time, the translation will stop, which does not meet my requirements.

Method 3: This should be written by a domestic tycoon, calling the Google.cn interface to translate Chinese into English, and our need is to translate English into other minor languages, so some modifications to the source code are required, and foreign networks are also required , I added the agent directly in the source code, and the modification is as follows.

proxies = {
    
    'http': '127.0.0.1:10809', 'https': '127.0.0.1:10809'}

    def __init__(
            self,
            target: str = 'en',
            source: str = 'auto',
            _format='html',
            user_agent: str = None,
            domain: str = 'com',
            proxies: Dict = proxies
    ):

Summarize

The third method currently used for translation is to store the corresponding language in such a directory structure, just run the script directly, and it will automatically use the directory as the target language for translation
insert image description here

Guess you like

Origin blog.csdn.net/cll_869241/article/details/122167183