关于爬虫里面的Cookie

requests 进行cookie携带登录

  1. 把cookie放到headers中直接登录获取数据
  2. 把cookie单独拿出来利用字符串切割转换成字典然后放到请求的url中进行请求响应获取参数
;cookie_dict = {}
requests.get(url, headers = headers = headers, cookies = cookie_dict)

三元运算符

a = b if b else c # 如果b 为真则 a=b 否则 a=c
如下:

if b:
a=b
else:
a=c

字典推导式

In [8]: {i:i+10 for i in range(10)}
Out[8]: {0: 10, 1: 11, 2: 12, 3: 13, 4: 14, 5: 15, 6: 16, 7: 17, 8: 18, 9: 19}

In [9]: {i:i+10 for i in range(10) if i%2==0}
Out[9]: {0: 10, 2: 12, 4: 14, 6: 16, 8: 18}
--------------------- 
#例子一:大小写key合并

#复制代码
mcase = {'a': 10, 'b': 34, 'A': 7, 'Z': 3}
mcase_frequency = {
    k.lower(): mcase.get(k.lower(), 0) + mcase.get(k.upper(), 0)
    for k in mcase.keys()
    if k.lower() in ['a','b']
}
print mcase_frequency
#  Output: {'a': 17, 'b': 34}
#复制代码
#例子二:快速更换key和value

mcase = {'a': 10, 'b': 34}
mcase_frequency = {v: k for k, v in mcase.items()}
print mcase_frequency
#  Output: {10: 'a', 34: 'b'}

cookie的验证

  1. 假如 cookie_dict 有值;headers 里面的cookie 没有值
  2. 则请求响应失败
  3. 假如 cookie_dict 没有值;headers 里面的cookie 有值
  4. 则请求成功
  5. 结论:
    一切以headers 里面的值为准!!!
  6. 尽量贴近浏览器里面的 request headers

Request Headers

Accept: text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8
Connection: keep-alive
Cookie: BAIDUID=282741C42EDBBD44E074BE59757F0CFE:FG=1; BIDUPSID=282741C42EDBBD44E074BE59757F0CFE; PSTM=1547171768; delPer=0; BD_HOME=0; H_PS_PSSID=26522_1461_21082_28131_28267_22159; BD_UPN=12314753; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_WISE_SIDS=124610_127694_114550_109776_114744_127492_128456_120151_127471_123018_128619_118893_118876_118847_118833_118793_127181_128037_128363_107316_126995_127772_127405_127769_117330_117429_128451_128402_127836_128589_127807_127027_128790_128448_128497_128247_128005_124938_126720_128527_127871_127764_125873_128240_124030_128245_110085_124868_123289_127123_128763_127318_127226_127380_128558_127417; FEED_SIDS=400689_0111_8; plus_lsv=0124f216e0a61014; Hm_lvt_12423ecbc0e2ca965d84259063d35238=1547171778; plus_cv=1::m:d03af37f; SE_LAUNCH=5%3A25786196_0%3A25786196; Hm_lpvt_12423ecbc0e2ca965d84259063d35238=1547171910
Host: www.baidu.com
Referer: https://www.baidu.com/
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
X-Requested-With: XMLHttpRequest

寻找登录接口的方法

-form表单action对应的url地址

-用户名和密码的input标签中,name的值作为键,用户名和密码作为值的字典,作为post data

-通过抓包,定位url地址

-form data

猜你喜欢

转载自blog.csdn.net/weixin_44090435/article/details/86291224