Reptile learn first day
Tools: pycharm, Edge
Learning Address: https://www.luffycity.com/free/128
Ideas:
Step crawler base (Requests encoding process)
- Specifies the URL
- initiating a request
- fetch response data
- persistent data
Case:
- Requirement 1: crawling specify the entry corresponding to the search results page
- 2 Demand: Baidu Translation
- Demand 3: IMDb
- Demand 4: KFC Case (work)
pit:
Two kinds of ways request page: get / post
requests.get(url,params,headers) vs requests.post(url,data,headers)
Crawling process, data or params must be complete, but also with empty ''.
Job Code:
1 import requests 2 import json 3 4 if __name__ == '__main__': 5 url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword' 6 headers = { 7 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.29 Safari/537.36 Edg/79.0.309.18' 8 } 9 kw = '北京' 10 file_name = kw + '.json' 11 fp = open(file_name, 'w', encoding='utf-8') 12 for page in range(10): 13 data = { 14 'cname':'', 15 'pid':'', 16 'keyword': kw, 17 'pageIndex': page, 18 'pageSize': '10', 19 } 20 respone = requests.post(url=url,data=data,headers=headers) 21 obj = respone.json() 22 json.dump(obj=obj,fp=fp,ensure_ascii=False) 23 24 print('over')