python urllib_urlopen * ()

Python urllib library provides to obtain a Web page from the URL address of the specified data, and then analyze it, and obtain the desired data.

A, urllib the urlopen module () function:

urlopen(url, data=None, timeout, proxies=None,context=None)

  • Create a file-like object representation of a remote url, then operating as a local file as the file-like object to acquire data remotely.

  • Url parameter represents remote data path, typically a URL, or may be a urllib.request object.

  • The default parameter data is None, a GET request sent at this time; when the user gives data parameter, send requests to POST.

  • HTTP is one of many network communication http, https, ftp and other protocols implemented in python, the only use data parameters, that is only open when the URL is http custom data parameters will have effect.

  • data byte must be a data object (Python of bytes object)

  • data must conform to a standard format, using urllib.parse.urlencode () the customized data into a standard format; parameter type and the function can receive a mapping object pyhon of (key / value pairs, such as a dict) or a sequence of two-element tuples (a tuple is a list of elements).

  • timeout timeout is set.

  • Parameters for setting proxy proxies.

  • context parameter: implement SSL encrypted transmission.

urlopen returns a file-like object (fd), which provides the following methods:

  • read (), readline (), readlines (), fileno (), close (): Use of these methods with exactly the same file object;
  • info (): returns a httplib.HTTPMessage object representing the header information (header) of the remote server returns
  • getcode (): returns the status code Http. If http request 200 indicates that the request is completed successfully; 404 URL not found;
  • geturl (): returns the request url;

Second, the experimental ①

1, open a web page to get all the content

''' urlopen是一个类,res相当于它的一个实例
class request(self,url,data)
       self.url = url
       self.data = data
res = request('http://baidu.com','data')
''' 

# 其实res相当于'类urlopen'的一个实例,给它传入'baidu.com'等参数的过程就是实例化  
from urllib.request import urlopen
res = urlopen("http://www.baidu.com")
doc = res.read()
print(doc)

#另一种书写格式
import urllib.request
res = urllib.request.urlopen('http://www.baidu.com')
doc = res.read()
print(doc)

2, obtaining the http header (header encoding format information)

'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:579817333 
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
from urllib.request import urlopen
res = urlopen("http://www.baidu.com")

#返回一个httplib.HTTPMessage 对象,表示远程服务器返回的头信息
print(res.info())

# 请求头部
print(res.getheader('Content-Type'))

# 返回请求的url地址
print(res.geturl())

# 返回Http状态码.如果是http请求,200=请求成功完成;404=网址未找到
print(res.getcode())

3, call interface, and process the return value json

from urllib.request import urlopen

def start_pack(real_match):

    for x in real_match.keys():
        start_url = 'http://www.google.com/start.do?ips=%s&versionId=%s&operator=dw_%s' % (",".join(real_match[x]), x,os.getlogin())
        start_html = urllib2.urlopen(start_url)
        start_json = json.loads(start_html.read())

        task_url = 'http://www.google.com/TaskId.do?task_id=%s' % start_json['object']['taskId']
        task_html = urllib2.urlopen(task_url)
        task_json = json.loads(task_html.read())

        if start_json['code'] == 0:
        print "package %s start succees!" % task_json['object'][0]['package_name']
        else:
        print "package %s start error!" % task_json['object'][0]['package_name']

        start_html.close()

② experiment

'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:579817333 
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
#coding=utf-8
#Python3.4.3   OS:W7-32

'''利用有道翻译进行在线翻译'''

import urllib.request
import urllib.parse
import json

def traslate(words):
    # 目标URL
    targetURL = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null'
    # 自定义表单,words表示的是用户要翻译的内容。这里使用的是dict类型,也可以使用元组列表(已经试过的)。
    data = {}
    data['type'] = 'AUTO'
    data['i'] = words
    data['doctype'] = 'json'
    data['xmlVersion'] = '1.8'
    data['keyfrom'] = 'fanyi.web'
    data['ue'] = 'UTF-8'
    data['action'] = 'FY_BY_CLICKBUTTON'
    data['typoResult'] = 'true'

    # 将自定义data转换成标准格式
    data = urllib.parse.urlencode(data).encode('utf-8')

    # 发送用户请求
    html = urllib.request.urlopen(targetURL, data)

    # 读取并解码内容
    rst = html.read().decode("utf-8")
    rst_dict = json.loads(rst)

    return rst_dict['translateResult'][0][0]['tgt']

if __name__ == "__main__":
    print("输入字母q表示退出")
    while True:
        words = input("请输入要查询的单词或句子:\n")
        if words == 'q':
            break
        result = traslate(words)
        print("翻译结果是:%s"%result)
Published 706 original articles · won praise 728 · views 1 million +

Guess you like

Origin blog.csdn.net/sinat_38682860/article/details/103937238