Python - use urllib.request module

The official document: http://cn.python-requests.org/zh_CN/latest/

llib.request request returns pages

ulbrquett The simplest application is urlie.requet.urlopen, and use the following functions:
urllib. request.urlopen(ur1[,data[, timeout[, cafile[, capath[, cadefaultl,context] ] ] ] ]
According to the official document, urllib.request.urlopen can open the URL HTTP, HTTPS, FTP protocol, mainly used in the HTTP protocol.
Parameters related to verification, less commonly begins with the ca are with identity.
, usually not by much use when submitting URL post is the way of data parameters.
The most commonly used on only the URL and timeout parameters.
url parameter is the network address filed (full address, protocol name required distal end, a rear end port needs, such as a timeout http://192.168.1.1:80),timeout.

  • Function returns the object has three additional use
  1. () Function returns the response url information, commonly used in the redirect url case geturl.
  2. info () function returns the response of the basic information.
  3. getCode () function returns the response status code, the most common code is successfully returned the page server 200, 404 the requested page does not exist, the server 503 is temporarily unavailable.

[Example D]

import urllib.request

__author__ = 'ling.'

def linkBaidu():

	#网址
    url = 'http://www.baidu.com'
    
    try:
    	#通过urllib发起的请求,会有一个默认的header:Python-urllib/version,指明请求是由urllib发出的,所以遇到一些验证user-agent的网站时,我们需要伪造我们的headers
  #伪造headers,需要用到urllib.request.Request对象
    	req = urllib.request.Request(url, headers=headers)
    	
    	#向指定的url发送请求,并返回服务器响应的类文件对象; 设置timeout参数,如果请求超出我们设置的timeout时间,会跑出timeout error 异常。
		response = urllib.request.urlopen(req,timeout=3)
		
        # response.read() 接收json数据; decode 解码方式为utf-8
        result = response.read().decode('utf-8')
    except Exception as e:
        print("网络地址错误")
        exit()
    with open('baidu.txt', 'w') as fp:
        fp.write(result)
    print("获取url信息 : response.geturl() : %s" %response.geturl())
    print("获取返回代码 : response.getcode() : %s" %response.getcode())
    print("获取返回信息 : response.info() : %s" %response.info())
    print("获取的网页内容已存入当前目录的baidu.txt中,请自行查看")

Pure code

import urllib.request

__author__ = 'ling.'

def linkBaidu():
    url = 'http://www.baidu.com'
    try:
    	req = urllib.request.Request(url, headers=headers)
		response = urllib.request.urlopen(req,timeout=3)
        result = response.read().decode('utf-8')
    except Exception as e:
        print("网络地址错误")
        exit()
    with open('baidu.txt', 'w') as fp:
        fp.write(result)
    print("获取url信息 : response.geturl() : %s" %response.geturl())
    print("获取返回代码 : response.getcode() : %s" %response.getcode())
    print("获取返回信息 : response.info() : %s" %response.info())
    print("获取的网页内容已存入当前目录的baidu.txt中,请自行查看")
Released six original articles · won praise 1 · views 120

Guess you like

Origin blog.csdn.net/qq_44071258/article/details/104071888