By using Python's Requests and BeautifulSoup libraries, write a crawler program to grab ancient poems and save them in text files

The following is a Python crawler program that crawls three classic ancient poems from Gushici.com and writes them into text files on the desktop. This program uses the Requests and BeautifulSoup libraries to:

# 导入所需的库
import requests
from bs4 import BeautifulSoup
import os

# 确定爬虫目标URL
url = 'https://www.gushiwen.org/'

# 向目标URL发送GET请求
response = requests.get(url)

# 解析HTML代码
soup = BeautifulSoup(response.content, 'html.parser')

# 通过CSS选择器获取古诗列表
poem_list = soup.select('.main3 .left .sons .cont a')

# 获取前三首古诗的标题和内容
poem_titles = []
poem_contents = []

for i in range(3):
    # 获取古诗的标题
    poem_title = poem_list[i].text.strip()
    poem_titles.append(poem_title)
    
    # 获取古诗的URL
    poem_url = url + poem_list[i].get('href')
    
    # 向古诗的URL发送GET请求
    poem_response = requests.get(poem_url)
    
    # 解析HTML代码
    poem_soup = BeautifulSoup(poem_response.content, 'html.parser')
    
    # 获取古诗的内容
    poem_content = poem_soup.select('.main3 .left .sons .contson')[0].text
    
    poem_contents.append(poem_content.strip())

# 将三首古诗写入文本文件
desktop_path = os.path.expanduser("~") + '/Desktop/'
file_path = desktop_path + 'poems.txt'

with open(file_path, 'w', encoding='utf-8') as f:
    for i in range(3):
        f.write(poem_titles[i] + '\n\n')
        f.write(poem_contents[i] + '\n\n\n')

This code will first send a GET request to Gushici.com, and then use the BeautifulSoup library to parse the returned HTML code. Next, it gets the list of ancient poems through the CSS selector, and gets the titles and contents of the first three ancient poems. Finally, it writes these three ancient poems into a text file and saves it to the desktop.

Guess you like

Origin blog.csdn.net/ximu__l/article/details/131696952