python第六章课后习题（3）

打开中国石油大学（华东）校园文化主页http://www.upc.edu.cn/xygk/xywh.htm 。在当前文件夹动态建立一个目录images，将校名、校标等图片用爬虫爬取下来，并以对应的名称存储在images文件夹下。完成后的目录如下所示：

images
├── 石大精神.png
├── 学风.png
├── 校名.png
├── 校旗.png
├── 校标.png
├── 校歌.png
├── 校训.png
└── 校风.png
注意：images文件夹一定是以代码形式动态建立，手工建立的无效。新建之前一定要检查文件夹是否存在，如果存在则不建立。

提示：

1）得到的图片是一个相对地址，需要在前面加上http://www.upc.edu.cn形成完整的链接。例如校歌图片的链接为/__local/8/2D/0D/5CA8B3D8EAE84524B99DA08A212_05DA246B_299BE.png ，需要转换为http://www.upc.edu.cn/__local/8/2D/0D/5CA8B3D8EAE84524B99DA08A212_05DA246B_299BE.png

2）每个图片用其对应的名称进行重命名，图片后缀不变。例如校歌的图片重新命名为“校歌.png"

3）校庆日中没有图片，对于这个异常，可以用try except结构处理，具体用法自行查找学习

注意几点
1.要先判断文件存不存在
2.如何爬取图片以及对应的名字
3.爬取的地址前面还要加上前缀
4.校庆日没有对应的图片

不得不说，卡了超级久
好不容易憋出了一串代码，结果正确，但是提交0分，代码如下

import os
import requests
from bs4 import BeautifulSoup as BS
from pathlib import Path

file=Path('images')
if not os.path.exists(file):
    os.mkdir('images')
    
url="http://www.upc.edu.cn/xygk/xywh.htm"
response=requests.get(url)
soup=BS(response.content,'lxml')

img_list=soup.find_all('img',class_='img_vsb_content')

name=[]
name_list=soup.find_all('div',class_='hd')
for con in name_list:
    name.append(con.text.split()[0])
name.remove('校庆日')

for index,img in enumerate(img_list):
    image=img.get('src')
    newurl=f'http://www.upc.edu.cn{image}'
    content=requests.get(newurl).content
    file=Path('images') /f'{name[index]}{Path(image).suffix}'
    file.write_bytes(content)

这里是没有使用try expect结构的，因为知道校庆日没有图片，所以手动删除了

求问同学和老师，得到以下代码，终于过了/(ㄒoㄒ)/~~
是路径错误，提交测评后images文件会被删除，从而找不到文件

import os
import requests
from bs4 import BeautifulSoup as BS
from pathlib import Path

file=Path(f'{path}/images')
if not os.path.exists(file):
    os.mkdir(file)
    
url="http://www.upc.edu.cn/xygk/xywh.htm"
response=requests.get(url)
soup=BS(response.content,'lxml')

img_list=soup.find_all('img',class_='img_vsb_content')

name=[]
name_list=soup.find_all('div',class_='hd')
for con in name_list:
    name.append(con.text.split()[0])
name.remove('校庆日')

for index,img in enumerate(img_list):
    image=img.get('src')
    newurl=f'http://www.upc.edu.cn{image}'
    content=requests.get(newurl).content
    file=Path(f'{path}/images') /f'{name[index]}{Path(image).suffix}'
    file.write_bytes(content)

python第六章课后习题（3）

猜你喜欢