requests
What's requests Module
python encapsulated in a web-based request module
effect
Used to simulate the browser sends a request
Environment Installation
pip install requests
Coding process
- Specify the url
- Initiate a request
- Fetch response data
- Persistent storage
Sogou crawling home page source data
#爬取搜狗首页的页面源码数据
import requests
#1.指定url
url = 'https://www.sogou.com/'
#2.请求发送get:get返回值是一个响应对象
response = requests.get(url=url)
#3.获取响应数据
page_text = response.text #返回的是字符串形式的响应数据
#4.持久化存储
with open('sogou.html','w',encoding='utf-8') as fp:
fp.write(page_text)
#实现一个简易的网页采集器
#需要让url携带的参数动态化
url = 'https://www.sogou.com/web'
#实现参数动态化
wd = input('enter a key:')
params = {
'query':wd
}
#在请求中需要将请求参数对应的字典作用到params这个get方法的参数中
response = requests.get(url=url,params=params)
page_text = response.text
fileName = wd+'.html'
with open(fileName,'w',encoding='utf-8') as fp:
fp.write(page_text)
- The code execution found:
- 1. garbled
- 2. Data on the order right
#解决乱码
url = 'https://www.sogou.com/web'
#实现参数动态化
wd = input('enter a key:')
params = {
'query':wd
}
#在请求中需要将请求参数对应的字典作用到params这个get方法的参数中
response = requests.get(url=url,params=params)
response.encoding = 'utf-8' #修改响应数据的编码格式
page_text = response.text
fileName = wd+'.html'
with open(fileName,'w',encoding='utf-8') as fp:
fp.write(page_text)
- UA detection: Portal request by detecting the identity of the carrier determine whether the request is a request to change the reptile launched
- UA伪装:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36
#解决UA检测
url = 'https://www.sogou.com/web'
#实现参数动态化
wd = input('enter a key:')
params = {
'query':wd
}
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}
#在请求中需要将请求参数对应的字典作用到params这个get方法的参数中
response = requests.get(url=url,params=params,headers=headers)
response.encoding = 'utf-8' #修改响应数据的编码格式
page_text = response.text
fileName = wd+'.html'
with open(fileName,'w',encoding='utf-8') as fp:
fp.write(page_text)
Crawling IMDb movie data details
- https://movie.douban.com/typerank?type_name=%E7%88%B1%E6%83%85&type=13&interval_id=100:90&action=
- Analysis: when the scroll bar is slid to the bottom of the current page occurs partial refresh (Ajax request)
url = 'https://movie.douban.com/j/chart/top_list'
start = input('您想从第几部电影开始获取:')
limit = input('您想获取多少电影数据:')
dic = {
'type': '13',
'interval_id': '100:90',
'action': '',
'start': start,
'limit': limit,
}
response = requests.get(url=url,params=dic,headers=headers)
page_text = response.json() #json()返回的是序列化好的实例对象
for dic in page_text:
print(dic['title']+':'+dic['score'])
KFC inquiry
#肯德基餐厅查询http://www.kfc.com.cn/kfccda/storelist/index.aspx
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
for page in range(1,5):
data = {
'cname': '',
'pid': '',
'keyword': '西安',
'pageIndex': str(page),
'pageSize': '10',
}
response = requests.post(url=url,headers=headers,data=data)
print(response.json())