requests.session () session remains

Everyone on the session may have been more familiar with, but also about understanding the mechanisms and principles of the session, but when we do how crawlers will use the session, that is, the next session remains to be mentioned.

First talk about, why should operate to keep the conversation?

session requests session object libraries across request to keep certain parameters, plainly, is such a successful login session on a site you use, then reuse the session object beg other pages of the site will default before the session to use such as the use of cookie parameters

Most especially use in the stay logged in certain sites to crawl, crawl or app time, when some forced landing, landing some do not return data is false or incomplete data, that we do not likely to do each request must log in to how to do it, you need to use the function to keep the session, we can visit only once, and then keep it that way or do other more requests.

Secondly, how do we use the session remains? As an example to illustrate this:

# Requests.session (): to maintain the session, it allows us to save certain parameters when cross request 
 
 
Import Requests 
 
# instantiate the session 
the session = requests.session () 
 
# target url 
url = ' https://www.douban.com/ Accounts / Login ' 
 
form_data = {
     ' Source ' : ' index_nav ' ,
     ' form_email ' : ' XXX ' ,
     ' form_password ' : ' XXX ' ,
     ' captcha-Solution ' :' Stamp ' ,
     ' captcha-ID ' : ' b3dssX515MsmNaklBX8uh5Ab: EN ' 
} 
 
# Set request header 
req_header = {
     ' the User-- Agent ' : ' the Mozilla / 5.0 (X11; the Linux the x86_64) AppleWebKit / 537.36 (KHTML, like the Gecko) the Chrome / Safari 67.0.3396.99 / 537.36 ' , 
} 
 
# using the session initiation request 
Response = session.post (URL, headers = req_header, Data = form_data) 
 
IF response.status_code == 200 is : 
 
    # access profile: 
    URL = ' HTTPS: // www.douban.com/people/175417123/'
 
    response = session.get(url,headers = req_header)
 
    if response.status_code == 200:
 
        with open('douban3.html','w') as file:
 
            file.write(response.text)

 


The next thing is, how do we manually set a cookie in the session?

Import Requests
 Import Time 
MyCookie = { " the PHPSESSID " : " 56v9clgo1kdfo3q5q8ck0aaaaa " } 
X = requests.session () 
requests.utils.add_dict_to_cookiejar (x.cookies, { " the PHPSESSID " : " 07et4ol1g7ttb0bnjmbiqjhp43 " }) 
x.get ( " HTTP: / /127.0.0.1:80 " , Cookies = MyCookie) 
the time.sleep ( 5 )
 # subsequent requests can test packet capture is not added successfully 
x.get ( " http://127.0.0.1:80 ")
 



In this way, by setting a cookie on the session object requests.utils.add_dict_to_cookiejar, after all the requests will automatically add the contents of my custom cookie.

Cookiejar may also become a target by Mr. requests.utils.cookiejar_from_dict, at the time assigned to session.cookies. Looks like may also be used session.cookies.set () or update ().

Another said that a separate deal with the cookie field, handling a dictionary format

# Processing contents of a dictionary cookie 
cookie = "SINAGLOBAL=821034395211.0111.1522571861723; wb_cmtLike_1850586643=1; [email protected]; wb_timefeed_1850586643=1; UOR=,,login.sina.com.cn; wvr=6; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WWsNeq71O_sXkkXNnXFHgOW5JpX5KMhUgL.Fo2RSK5f1hqcShe2dJLoI0qLxK-L12qLB-zLxKqL1hnL1K2LxK-LBo5L12qLxKqL1hML1KzLxKnL1K.LB-zLxK-L1K-LBKqt; YF-V5-G0=c99031715427fe982b79bf287ae448f6; ALF=1556795806; SSOLoginState=1525259808; SCF=AqTMLFzIuDI5ZEtJyAEXb31pv1hhUdGUCp2GoKYvOW0LQTInAItM-ENbxHRAnnRUIq_MR9afV8hMc7c-yVn2jI0.; SUB=_2A2537e5wDeRhGedG7lIU-CjKzz-IHXVUm1i4rDV8PUNbmtBeLVrskW9NUT1fPIUQGDKLrepaNzTEZxZHOstjoLOu; SUHB=0IIUWsCH8go6vb; _s_tentry=-; Apache=921830614666.5322.1525261512883; ULV=1525261512916:139:10:27:921830614666.5322.1525261512883:1525239937212; YF-Page-G0=b5853766541bcc934acef7f6116c26d1"
cookie_dict = {i.split("=")[0]: i.split("=")[1] for i in cookie.split("; ")}
 

 

----------------
Disclaimer: This article is the original article CSDN bloggers "Danker01", and follow CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement. .
Original link: https: //blog.csdn.net/weixin_42575020/article/details/95179840

Guess you like

Origin www.cnblogs.com/cangqinglang/p/11991130.html