2016.12.27
I heard that Python has a good role in crawling before learning Python. Next, what is a reptile?
A web crawler is a program, mainly used by search engines, that reads all the content and links of a website, builds a relevant full-text index into a database, and then jumps to another website, looking like a big spider.
1. What is JSON?
JSON stands for JavaScript Object Notation
JSON is a syntax for storing and exchanging textual information, similar to XML
JJSON is smaller, faster and easier to understand than xml.
JSON is a lightweight text data interchange format
JJSON is language independent.
JSON is self-describing and easier to understand.
2. Two common HTTP methods are: GET and POST
What is HTTP?
The Hypertext Transfer Protocol (HTTP) was designed to ensure communication between clients and servers, and HTTP works as a request-response protocol between clients and servers.
GET - request data from the specified resource
POST - Submits data to be processed to the specified resource.
Here's a piece of code in Python that results in a class that translates your input.
import urllib.request import urllib.parse import json content = input("Please enter the content to be translated: ") url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=http://www.youdao.com/" data = {} data['type'] = 'AUTO' data['i'] = content data['doctype'] = 'json' data['xmlVersion'] = '1.6' data['keyfrom'] = 'fanyi.web' data['ue'] = 'UTF-8' data['typoResult'] = 'true' data = urllib.parse.urlencode(data).encode('utf-8') response = urllib.request.urlopen(url, data) html = response.read().decode('utf-8') target = json.loads(html) print("Translation result: %s" % (target['translateResult'][0][0]['tgt']))
There are two ways to get the status code:
The first is to use the urllib module. Here is the code for the display:
import request.urllib status=request.urllib.urlopen("http://www.jb51.net").get_code print status
The second is to use the requests module, the following is the listing code:
import requests code=requests.get("http://www.jb51.net").status_code print(code)