[Reptile] reptiles medium-specific anti

Reptile learn first day

Tools: pycharm, Edge

Learning Address: https://www.luffycity.com/free/128

Ideas:

Step crawler base (Requests encoding process) 
- Specifies the URL
- initiating a request
- fetch response data
- persistent data

Case:

  - Requirement 1: crawling specify the entry corresponding to the search results page

  - 2 Demand: Baidu Translation

  - Demand 3: IMDb

  - Demand 4: KFC Case (work)

pit:

Two kinds of ways request page: get / post

  requests.get(url,params,headers) vs requests.post(url,data,headers)

  Crawling process, data or params must be complete, but also with empty ''.

  

Job Code:

 1 import requests
 2 import json
 3 
 4 if __name__ == '__main__':
 5     url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
 6     headers = {
 7      'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.29 Safari/537.36 Edg/79.0.309.18'
 8     }
 9     kw = '北京'
10     file_name = kw + '.json'
11     fp = open(file_name, 'w', encoding='utf-8')
12     for page in range(10):
13         data = {
14             'cname':'',
15             'pid':'',
16             'keyword': kw,
17             'pageIndex': page,
18             'pageSize': '10',
19         }
20         respone = requests.post(url=url,data=data,headers=headers)
21         obj = respone.json()
22         json.dump(obj=obj,fp=fp,ensure_ascii=False)
23 
24     print('over')

 

 

 

Guess you like

Origin www.cnblogs.com/watalo/p/11877687.html