Youdao translation for Python web crawler

2016.12.27

 

I heard that Python has a good role in crawling before learning Python. Next, what is a reptile?

A web crawler is a program, mainly used by search engines, that reads all the content and links of a website, builds a relevant full-text index into a database, and then jumps to another website, looking like a big spider.

 

1. What is JSON?

JSON stands for JavaScript Object Notation

JSON is a syntax for storing and exchanging textual information, similar to XML

JJSON is smaller, faster and easier to understand than xml.

JSON is a lightweight text data interchange format

JJSON is language independent.

JSON is self-describing and easier to understand.

2. Two common HTTP methods are: GET and POST

What is HTTP?

The Hypertext Transfer Protocol (HTTP) was designed to ensure communication between clients and servers, and HTTP works as a request-response protocol between clients and servers.

GET - request data from the specified resource

POST - Submits data to be processed to the specified resource.

Here's a piece of code in Python that results in a class that translates your input.

import urllib.request
import urllib.parse
import json

content = input("Please enter the content to be translated: ")

url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=http://www.youdao.com/"
data = {}
data['type'] = 'AUTO'
data['i'] = content
data['doctype'] = 'json'
data['xmlVersion'] = '1.6'
data['keyfrom'] = 'fanyi.web'
data['ue'] = 'UTF-8'
data['typoResult'] = 'true'
data = urllib.parse.urlencode(data).encode('utf-8')

response = urllib.request.urlopen(url, data)
html = response.read().decode('utf-8')
target = json.loads(html)

print("Translation result: %s" % (target['translateResult'][0][0]['tgt']))

 There are two ways to get the status code:

The first is to use the urllib module. Here is the code for the display:

import request.urllib
status=request.urllib.urlopen("http://www.jb51.net").get_code
print status

 The second is to use the requests module, the following is the listing code:

import requests
code=requests.get("http://www.jb51.net").status_code
print(code)

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326779941&siteId=291194637