Python crawler learning (1)-simple cookies crawling

Simple crawling of cookies of Python crawler urllib module

1. Crawl directly (the cookie is displayed in the compiler):

import http.cookiejar,urllib.request

cookie = http.cookiejar.CookieJar()   #   声明CookieJar对象
handler = urllib.request.HTTPCookieProcessor(cookie) # 构建Handler
opener = urllib.request.build_opener(handler)
response = opener.open('url') # 打开链接
for item in cookie:
    print(item.name+"="+item.value)

2. The cookie is saved in the specified file

import http.cookiejar,urllib.request

filename = '指定文件名(文件类型一般为txt)'

cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('url')

cookie.save(ignore_discard=True,ignore_expires=True)



3. LWPCookieJar storage:

cookie = http.cookiejar.LWPCookieJar(filename)

Simple notes:

1.Cookiejar():

Objects that manage HTTP cookie values, store cookies generated by HTTP requests, and add cookies to outgoing HTTP requests.

2.
A subclass of MozillaCookieJar CookieJar, which can be used to process read and save Cookies, and save Cookies as Mozi browser's Cookies format

3. Cookie.save parameter explanation:

ignore_discard means to save cookies even if they will be discarded, ignore_expires means to save cookies if they have expired and will overwrite the file if it already exists.

Refer and recommend books: https://cuiqingcai.com/5052.html .

Guess you like

Origin blog.csdn.net/qq_45742511/article/details/112060619