[Python] Python's urllib and urllib2 modules call the "Baidu Translate" API for batch automatic translation

1. Problem description

When processing text data, there are often mixed situations in various languages ​​in the text, including: English, Japanese, Russian, French, etc. It is necessary to batch translate the languages ​​of different languages ​​into Chinese for processing. You can directly call the translation API provided by Baidu through Python for batch translation.

For detailed documentation of Baidu Translator API, see: Baidu Translator API Documentation


2. Problem solving

Development environment: Linux

Separate Chinese and non-Chinese in the text, and translate the non-Chinese part.

The Python code is as follows: translate.py

#!/usr/bin/python
#-*- coding:utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf8")

import json #import json module
import urllib #import urllib module
from urllib2 import Request, urlopen, URLError, HTTPError #Import urllib2 module

def translate(inputFile, outputFile):
	fin = open(inputFile, 'r') #Open the input file for reading
	fout = open(outputFile, 'w') #Open the output file by writing

	
	for eachLine in fin: #Read the file by line
		line = eachLine.strip() #Remove possible spaces at the beginning and end of each line, etc.
		quoteStr = urllib.quote(line) #Convert each line read into a specific format for translation
		url = 'http://openapi.baidu.com/public/2.0/bmt/translate?client_id=WtzfFYTtXyTocv7wjUrfGR9W&q=' + quoteStr + '&from=auto&to=zh'
		try:
			resultPage = urlopen(url) #Call Baidu Translate API for batch translation
		except HTTPError as e:
			print('The server couldn\'t fulfill the request.')
			print('Error code: ', e.code)
		except URLError as e:
			print('We failed to reach a server.')
			print('Reason: ', e.reason)
		except Exception, e:
			print 'translate error.'
			print e
			continue

		resultJason = resultPage.read().decode('utf-8') #Get the translation result, the translation result is in json format
		js = None
		try:
			js = json.loads(resultJason) #Convert the result in json format into a Python dictionary structure
		except Exception, e:
			print 'loads Json error.'
			print e
			continue
	
		key = u"trans_result"
		if key in js:
			dst = js["trans_result"][0]["dst"] #Get the translated text result
			outStr = dst
		else:
			outStr = line #If the translation is wrong, output the original text

		fout.write(outStr.strip().encode('utf-8') + '\n') #Output the result
		
	fin.close()
	fout.close()

if __name__ == '__main__':
	translate(sys.argv[1], sys.argv[2]) #Execute by obtaining the input and output file names by obtaining command line parameters, which is convenient


 After the program is completed, enter on the Linux command line: python translate.py myinput.txt myoutput.txt

will be able to execute. The final translation results are written to the output file myoutput.txt.


3. Pay attention

(1) The first few lines of the program are conventionally written, in order to solve the Chinese encoding problem that may often occur.

(2) In line 18, the text that needs to be translated needs to be converted into a specific format code by the quote function for translation.

(3) Line 19, "&from=auto&to=zh" in the url, from is followed by the code of the source language, to is followed by the code of the destination language, such as: zh means Chinese, en means English, auto means any language that is automatic .



Hope it helps everyone, thank you.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325391043&siteId=291194637