First, install the library requests
(1)、pip3 install requests
(2), installed in the pycharm
Second, the mechanism is based on the request of the requests HTTP protocol
1, HTTP protocol to request :( Baidu, for example)
(1) request url:
https://www.baidu.com/ (2) request method: GET (3) request header: Cookie: may need attention. User-Agent: to prove that you are a browser NOTE: The browser request headers to find Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 65.0.3325.146 Safari / 537.36 Host: www.baidu.com
2, browser use
3, Baidu crawl home page
1 import requests 2 3 response = requests.get(url='https://www.baidu.com/') 4 response.encoding = 'utf-8' 5 print(response) # <Response [200]> 6 # 返回响应状态码 7 print(response.status_code) # 200 8 # 返回响应文本 9 # print(response.text) 10 print(type(response.text)) # <class 'str'> 11 #将爬取的内容写入xxx.html文件 12 with open('baidu.html', 'w', encoding='utf-8') as f: 13 f.write(response.text)
三、爬取“梨视频”中的视频
1 # 爬取梨视频 2 import requests 3 url='https://video.pearvideo.com/mp4/adshort/20190613/cont-1565846-14013215_adpkg-ad_hd.mp4' 4 res = requests.get(url) 5 #将爬取的视频写入文件 6 with open('梨视频.mp4', 'wb') as f: 7 f.write(res.content)