Python library network request --- urllib

urllib library:
1: urllib Python library is a basic network request library can simulate the behavior of the browser sends a request to the specified server, and save the data server
urllib is python comes standard library, without having to install directly use
Here Insert Picture Description
urlopen function: in Python3 of urllib library, all network-related method request, will be integrated into the module below urllib.request, look to use some basic functions of urlopen:
Here Insert Picture Description
urlopen Detailed function
to create a class file representation of the remote url object, and then operates like a local file-like object acquires the remote data
url: URL requested
data: data request, if this value is set, then into post request.
return value: returns a value in http.client. HTTPResponse object, which is a file-handle to an object, if there is read (size), readline readline and getcode the like.
Here Insert Picture Description
urlretrieve function:
this function can easily save a file on a web page to a local following codes may very easily be. Baidu home page downloaded to the local
code is as follows:

from urllib import request
r = request.urlretrieve('http://www.baidu.com/','baidu.html')

urlencode function
urlencode dictionary data can be converted to encoded data url

from urllib import parse

# 定义一个字典类型的数据
data = {'name': '光头强', 'age': '18'}
# 通过pares.ulrencode的方法来进行转换数据
qs = parse.urlencode(data)
print(qs)
***
#返回的结果是十六进制的名字
name=%E5%85%89%E5%A4%B4%E5%BC%BA&age=18
from urllib import request, parse

# 创建一个字典类型的数据
data = {'word': '古天乐'}
# 通过parse.urlencode方法来实现字符串拼接
result = parse.urlencode(data)
# 使用.format方法
url = 'http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&{result}'
# 获取网页的源代码
html = request.urlopen(url)
# 打印网页的源代码,并设置他的一个编码格式为utf*8
print(html.read().decode('utf-8'))

***
#返回结果为网页源代码

parse_qs function
can be decoded after encoded parameters url

data = {'name': '古天乐', 'age': 18}
result = parse.urlencode(data)  # 将data中的数据进行解码
print(result)  # 返回的结果是一个进行编码后结果
####
name=%E5%8F%A4%E5%A4%A9%E4%B9%90&age=18
result_list = parse.parse_qs(result)  # 返回的结果是一个进行编码后在进行解码的结果
print(result_list)  # 返回的是一个字典类型
###
{'name': ['古天乐'], 'age': ['18']}

urlparse function and urlsplit function
sometimes get a url, think of the various components of this url is split, then this time we can use urlparse or urlsplit to split

urlparse and urlsplit is basically exactly the same of
the sole is not the same place: urlparse there are params property, but no params property urlsplit

from urllib import parse

url = 'http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=%E5%8F%A4%E5%A4%A9%E4%B9%90'

result = parse.urlparse(url)
print(result)  # 对URL中的数据进行分割,对各各部分组成的数据进行分割

result_list = parse.urlsplit(url)
print(result_list)  # 也是对URL的数据进行分割,唯一不一样的是urlsplit少了params属性的内容
#####
ParseResult(scheme='http', netloc='image.baidu.com', path='/search/index', params='', query='tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=%E5%8F%A4%E5%A4%A9%E4%B9%90', fragment='')
SplitResult(scheme='http', netloc='image.baidu.com', path='/search/index', query='tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=%E5%8F%A4%E5%A4%A9%E4%B9%90', fragment='')

from urllib import parse

url = 'http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=%E5%8F%A4%E5%A4%A9%E4%B9%90'

result = parse.urlparse(url)
print(result)# 对URL中的数据进行分割,对各各部分组成的数据进行分割
data = result.query
data_list = parse.parse_qs(data)   #将经过编码后的文件进行解码
print(data_list)
####
ParseResult(scheme='http', netloc='image.baidu.com', path='/search/index', params='', query='tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=%E5%8F%A4%E5%A4%A9%E4%B9%90', fragment='')
{'tn': ['baiduimage'], 'ps': ['1'], 'ct': ['201326592'], 'lm': ['-1'], 'cl': ['2'], 'nc': ['1'], 'ie': ['utf-8'], 'word': ['古天乐']}

request.Requests function
when we want to set request headers, we need to
requests.Requests: If we want to add some request headers when requested, then we must use request.Requests classes to implement, for example, to add a Users -Agent
Here Insert Picture Description

from urllib import request,parse
url = 'https://maoyan.com/'
headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
    'Referer': 'https://maoyan.com/'
}
result = request.Request(url,headers=headers)
html = request.urlopen(result)
print(html.read().decode('utf-8'))
Published 54 original articles · won praise 26 · views 6190

Guess you like

Origin blog.csdn.net/qq_37662827/article/details/102951559