Requests the use of reptiles pytho

First, install the library requests

(1)、pip3 install requests

(2), installed in the pycharm

         

 

Second, the mechanism is based on the request of the requests HTTP protocol

 1, HTTP protocol to request :( Baidu, for example)
  (1) request url:
      https://www.baidu.com/   (2) request method:     GET   (3) request header:     Cookie: may need attention.     User-Agent: to prove that you are a browser     NOTE: The browser request headers to find     Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 65.0.3325.146 Safari / 537.36     Host: www.baidu.com









  

 2, browser use

 

         

  3, Baidu crawl home page 

 1 import requests
 2 
 3 response = requests.get(url='https://www.baidu.com/')
 4 response.encoding = 'utf-8'
 5 print(response)  # <Response [200]>
 6 # 返回响应状态码
 7 print(response.status_code)  # 200
 8 # 返回响应文本
 9 # print(response.text)
10 print(type(response.text))  # <class 'str'>
11 #将爬取的内容写入xxx.html文件
12 with open('baidu.html', 'w', encoding='utf-8') as f:
13     f.write(response.text)
 

 

三、爬取“梨视频”中的视频

1 # 爬取梨视频
2 import requests
3 url='https://video.pearvideo.com/mp4/adshort/20190613/cont-1565846-14013215_adpkg-ad_hd.mp4'
4 res = requests.get(url)
5 #将爬取的视频写入文件
6 with open('梨视频.mp4', 'wb') as f:
7     f.write(res.content)

 

  

 

Guess you like

Origin www.cnblogs.com/lweiser/p/11033005.html