Reptile library Urllib

1. urllib.request.urlopen

response = urllib.request.urlopen("https://movie.douban.com/",None,2)

html = response.read().decode("utf-8")  # decode:从byte到str。  encode:从str到byte

# print(html)
with open("html.txt","w",encoding="utf-8") as file:
    file.write(html)

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

When the transfer parameters using configuration data dictionaries, this parameter must be dict form, converted by the data and parameters into a dictionary type byte, used at this time urllib.parse.
https://www.jianshu.com/p/4c3e228940c8
this article is talking about well, you can access.

2. urllib.request.Request

url ="https://movie.douban.com/"
hearders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
                          " AppleWebKit/537.36 (KHTML, like Gecko) "
                          "Chrome/75.0.3770.90 "
                          "Safari/537.36",
            "Referer": "https://movie.douban.com/",
            "Connection": "keep-alive"}
req = urllib.request.Request(url,headers=hearders)
print(req)
html = urllib.request.urlopen(req).read().decode("utf-8")
with open("htmlhtml.txt","w",encoding="utf-8") as file:
    file.write(html)

If we do not need too many parameters are passed in obtaining request object, what I can directly select urllib.request.urlopen (); If you need further packaging request, you need to use urllib.request years.

3. Use of Cookies

Cookies are mainly used to obtain user logon information, for example, after a user logs on to achieve, will generate Cookies with login status, then Cookies can be stored in a local file by submitting data, the next time the program is running, you can read directly Cookies files to implement user login. Especially for some complicated login, such as the verification code SMS verify login such sites, the use of Cookies can simply log on to solve the problem of repetition.

Urllib provided HTTPCookieProcessor () operation on Cookies, but is read by the Cookies MozillaCookieJar () to complete. The following example of an implementation Cookies write files, as follows:

import urllib.request
from http import cookiejar
filename = 'cookie.txt' 
# MozillaCookieJar保存cookie
cookie = cookiejar.MozillaCookieJar(filename)
# HTTPCookieProcessor创建cookie处理器
handler = urllib.request.HTTPCookieProcessor(cookie) # 创建自定义opener
opener = urllib.request.build_opener(handler) # open方法打开网页
response = opener.open('https://movie.douban.com/') # 保存cookie文件
cookie.save()

The code is automatically processed cookiejar HTTP Cookie class, MozillaCookieJar () is used to write the contents files Cookies. To create programs that run when MozillaCookieJar () object, then the object directly passed to the function HTTPCookieProcessor (), to generate opener object; and finally use opener object access URL, visit Cookies process generated directly write text documents have been created.

Read cookies

import urllib.request
from http import cookiejar
filename = 'cookie.txt'
# MozillaCookieJar保存cookie
cookie = cookiejar.MozillaCookieJar(filename)
# 读取cookie内容到变量
cookie.load(filename)
# HTTPCookieProcessor创建cookie处理器
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
# open方法打开网页
response = opener.open('https://movie.douban.com/')
# 保存cookie文件
print(cookie)

4. urllib.parse

Tuples Urllib two elements when the request to access the server, if the data transfer occurs, it is necessary to encode the content processing comprising str or bytes object converted to a percentage encoded ASCII text strings. If the string to be used as POST, then it should be encoded as a byte, otherwise it will cause a TypeError. The method of transmitting Urllib POST request:

import urllib.request 
import urllib.parse 
url = 'https://movie.douban.com/' 
data = {     'value': 'true', } 
#数据处理
data = urllib.parse.urlencode(data).encode('utf-8')
req = urllib.request.urlopen(url, data=data)

Code urllib.parse.urlencode (data) the data into byte data type , and encode ( 'utf-8') disposed byte coding format . It should be noted that the encoding format is mainly determined according to the encoding format of the site. action urlencode () simply request parameters for data format conversion processing .

Published 16 original articles · won praise 3 · Views 1075

Guess you like

Origin blog.csdn.net/weixin_42233120/article/details/101376389