[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes 5-Cookie loading and saving

1. Save

Introduce a new class:

MozillaCookieJar ()

 MozillaCookieJar is derived from FileCookieJar, it can create an instance of FileCookieJar compatible with Mozilla browser cookietxt.

from urllib import request
from http.cookiejar import MozillaCookieJar

#保存
# cookiejar = MozillaCookieJar('cookie.txt')
# handler = request.HTTPCookieProcessor(cookiejar)
# opener = request.build_opener(handler)
# resp = opener.open('http://www.httpbin.org/cookies/set/course/abc')
#
# cookiejar.save()

We can create a cookiejar through this MozillaCookieJar() class, then create the handler and the opener in turn, we write the request url of the website we need to open, and finally we save the obtained cookie.

However, we found that the saved file did not contain the cookie information we wanted. Why?

The save() function, where you can add the saved file name, or it can be written when MozillaCookieJar() is created.

Looking at the source code of this save function, we found:

The parameters ignore_discard and ignore_expires of this save function are false by default.

ignore_discard = false means that if the cookie is discarded, it will not be saved.

ignore_expires = false means that the cookie will not be saved when it expires.

Therefore, when we need to save the cookie information of a login website, we'd better change these two ignored values ​​to true, so that even if the cookie is discarded and expired, the cookie information we need can still be saved. That is the following line of code:

cookiejar.save(ignore_discard=True,ignore_expires=True)

Pay attention to the use of two parameters

ignore_discard = true means that cookies should be saved even if they are about to be discarded

ignore_expires = true means that if the cookies have expired, they will be saved and the files will be overwritten if they exist.

 

2. Loading

Loading a cookie information is to read the loaded content, using load(), and other parts are similar to saving.

#加载

cookiejar = MozillaCookieJar('cookie.txt')
cookiejar.load()
handler = request.HTTPCookieProcessor(cookiejar)
opener = request.build_opener(handler)
resp = opener.open('http://www.httpbin.org/cookies/set/course/abc')
for cookie in cookiejar:
    print(cookie)

 

Guess you like

Origin blog.csdn.net/weixin_44566432/article/details/108559759