- Reptiles Category:
- Common
- Focus
- Incremental: Monitoring
- requests
- Role: Analog browser send request
- get/post:url,data/params,headers
- Anti-climbing mechanisms:
- robots.txt
- UA detection
- Encoding process:
- Specify the url
- Initiate a request
- Fetch response data
- Persistent storage
- get / post Return value: response object response
- text: string response data
- json (): returns the standard string json
- content: the response data in binary form
- encoding: the coded data in response to
# Simple collector web WD = INPUT ( ' Enter A Word: ' ) URL = ' https://www.sogou.com/web ' # will be set to a dynamic parameter request param = { ' Query ' : WD } # The UA camouflage headers = { ' the User-- Agent ' : ' the Mozilla / 5.0 (the iPhone; the CPU the iPhone the OS like the Mac the OS X-11_0) AppleWebKit / 604.1.38 (KHTML, like the Gecko) Version / 11.0 Mobile / 15A372 Safari / 604.1 ' }
response = requests.get (URL = URL, the params = param, headers = headers) # manually set the response code data (processing distortion problems Chinese) response.encoding = ' UTF-. 8 ' # text string is returned in the form of response data page_text = response.text fileName = wd+'.html' with open(fileName,'w',encoding='utf-8') as fp: fp.write(page_text) Print (fileName, ' download successful! ' )
User-Agent: user agent, a part of the http protocol, part of the header field belonging to the special string head
UA detection: portal server detects UA every request, if the UA is detected request crawlers, the request fails
# Crawling KFC location information URL = ' http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword ' City = INPUT ( ' Enter City A name: ' ) data = { "cname": "", "pid": "", "keyword": city, "pageIndex": "1", "pageSize": "10", } headers = { 'User-Agent':'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1' } response = requests.post(url=url,data=data,headers=headers) # Json () returns a json object type page_text = response.json () fp = open('./kfc.txt','w',encoding='utf-8') for dic in page_text['Table1']: address = dic['addressDetail'] fp.write(address+'\n') fp.close()