Educode--Crawling Experiment of Forbidden City Wallpaper Webpage Pictures

Level 1: Web crawler multiple choice questions

Level 2: Obtain the information of the first picture on the Forbidden City wallpaper page

Implementation code:

import requests

from bs4 import BeautifulSoup

url = "https://www.dpm.org.cn/lights/royal/p/1.html"

# code start

r = requests.get(url)

r.encoding = "utf-8"

soup = BeautifulSoup(r.text,"html.parser")

pic=soup.find("div",class_="pic")

x=pic.find("img")

imglj=x.attrs["src"]

imgmz=x.attrs["title"].strip()

# end of code

print(imglj)

print(imgmz)

f1=open("tpxx.txt","w")

f1.write(imglj+"\n")

f1.write(imgmz+"\n")

f1.close()

Level 3: Download the first image from the Forbidden City wallpaper page

Implementation code:

import requests

from bs4 import BeautifulSoup

url = "https://www.dpm.org.cn/lights/royal/p/1.html"

r=requests.get(url)

r.encoding = 'utf-8'

soup=BeautifulSoup(r.text,"html.parser")

pic=soup.find("div", class_="pic")

x=pic.find("img")

imglj=x.attrs["src"]

imgmz=x.attrs["title"].strip()

# code start

r=requests.get(imglj)

cpmc="image//"+imgmz+".jpg"

f1=open(cpmc,"bw")

f1.write(r.content)

f1.close()

# end of code

Level 4: Download multiple images from a single page of the Forbidden City wallpaper page

Implementation code:

import requests

from bs4 import BeautifulSoup

url = "https://www.dpm.org.cn/lights/royal/p/1.html"

# code start

r=requests.get(url)

r.encoding = 'utf-8'

soup=BeautifulSoup(r.text,"html.parser")

pics=soup.find_all("div", class_="pic")

i=1

for pic in pics:

x=pic.find("img")

imglj=x.attrs["src"]

imgmz=x.attrs["title"].strip()

r=requests.get(imglj)

f=open("image/"+str(i)+imgmz+".jpg","wb")

i=i+1

f.write(r.content)

f.close()

# end of code

Level 5: Obtain the information of multiple pictures of the first five pages of the Forbidden City wallpaper

Implementation code:

import requests

from bs4 import BeautifulSoup

# code start

i=1

for j in range(1,6):

url = "https://www.dpm.org.cn/lights/royal/p/"+str(j)+".html"

r=requests.get(url)

r.encoding = 'utf-8'

soup=BeautifulSoup(r.text,"html.parser")

pics=soup.find_all("div", class_="pic")

for pic in pics:

x=pic.find("img")

imglj=x.attrs["src"]

imgmz=x.attrs["title"].strip()

r=requests.get(imglj)

f1=open("image/"+str(i)+imgmz+".jpg",'wb')

f1.write(r.content)

f1.close()

i=i+1

# end of code