[Python reptile road dya3]: basic library use requests

Earlier we learned urllib library source code crawling, today to introduce more humane requests library use.


import requests
'''response=requests.get("https://baidu.com/")
print(response.text)#获取源代码方法1
print(response.content.decode("utf-8"))#获取源代码方法2

**** Note : **** response.content direct crawling source code has not been decoded, it is the type of bytes, is decoded through requets library response.text own guesses, and sometimes there may be garbled, so in this case should response.content.decode ( "utf-8")
basic properties:

print(response.url)    #查看url地址
print(response.encoding)  #查看响应头部字符编码
print(response.status_code)  #查看响应码

'' '
#Get request with params, post request with data
NOTE: use request may be so:
1.requests.get ( "https://baidu.com/")
2.requests.post ( "HTTPS: // baidu. com / ")

example 1: for the" Shenzhen Baidu, # # "in the source code (get request example):

params={"wd":"深圳"}
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"}
****response=requests.get("https://baidu.com/s",params=params,headers=headers)****
with open("baidu.html","w",encoding='utf-8')as fp:#保存到当地文件
    fp.write(response.content.decode("utf-8"))
print(response.url)

Example 2: For the "retractor network # python #" source code (get request Example)

import requests
data={"first": "true" ,                                                                                                                                
      "pn": "1"   ,                                                                                                                                    
      "kd": "python" }                                                                                                                                 
headers={"Referer":" https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=",                                                    
         "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"}                
response=requests.post("https://www.lagou.com/jobs/positionAjax.json?city=%E6%B7%B1%E5%9C%B3&needAddtionalResult=false",data=data,headers=headers)     
print(response.json)****#response.json可以将json源代码转换成字典或者列表的形式。****
*

requests the library to use a proxy IP *

import requests
proxy={"HTTP":"114.226.246.144:9999"}
response=requests.get("http://httpbin.org/ip",proxies=proxy)
print(response.text)

# Here I found that when implementing agency has been unable to succeed in ensuring that the code is correct after free agents of the reason that several attempts here should be replaced or use a paid proxy IP ip
Requests cookie information processing

import requests
response=requests.get("https://baidu.com/")
print(response.cookies)#*可通过这样获取cookie信息*
print(response.cookies.get_dict())#*具体的cookie(字典形式)*

** If the share cookie information in multiple use, you should use the session**

url="http://www.renren.com/"
data={"email":"135*********","password":"***08***"}
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"}
session=requests.session()
session.post(url,data=data,headers=headers)
response=session.get("http://www.renren.com/973687886/profile")
with open("renren.html","w",encoding="utf-8") as fp:
    fp.write(response.text)`

Treatment is not trusted SSL certificate :( when crawling some of the pages can not enter)

response=requests.get("http://*******",verify=False)
print(response.content.decode("utf-8"))

```

发布了5 篇原创文章 · 获赞 1 · 访问量 183

Guess you like

Origin blog.csdn.net/dinnersize/article/details/104292413
Recommended