Article Directory
One, crawl the original page
1. Website link
This example uses the National Geographic Chinese website
Website link: http://www.ngchina.com.cn/ .
2. Picture to be saved
Two, programming ideas
1. Picture conversion binary
For storing pictures, we first need to convert the pictures into binary, the specific method is as follows, first introduce the os library.
import os
Then use the content function to convert the image to binary
with open(path,'wb') as f:
f.write(r.content)
f.close()
2. Name the picture with the original picture name
path = root + url.split('/')[-1]
Three, the complete code
import requests
import os
url = 'http://image.ngchina.com.cn/2020/0812/20200812011133978.jpg'
root = "E://pics//"
path = root + url.split('/')[-1]#spilt 返回⼀个由字符串内单词组成的/分割的列表中的最后一个(即:图片的名字)
try:
kv = {
'user-agent': 'Mozilla/5.0'}
if not os.path.exists(root):
os.mkdir(root)
if not os.path.exists(path):
r = requests.get(url,headers=kv)
with open(path,'wb') as f:
f.write(r.content)
f.close()
print("文件保存成功")
else:
print("文件保存成功")
except:
print("爬取失败")
At the end of this article, please point out any errors~
Quote from
中国大学MOOC Python网络爬虫与信息提取
https://www.icourse163.org/course/BIT-1001870001