Python crawler tourist attractions

Today's goal is tourist attractions around the world

Don't talk nonsense, let's start directly

Since the data after crawling this time is saved to Excel, the relevant library must be installed in advance. Here I use pip

win+R cmd to the command line and enter the following content (to ensure that the python environment is normal)

pip install Workbook

pip install openpyxl

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATG9ycmV5Xw==,size_20,color_FFFFFF,t_70,g_se,x_16

Then you can start the code operation

# -- coding: utf-8 --
import requests
from lxml import html
from openpyxl import Workbook

#创建Excel
wb=Workbook()
ws=wb.active

#获取数据
url='https://place.qyer.com/china/citylist-0-0-1/'

def getpage(url):
    #请求头,模拟浏览器登录
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36'} 

    #访问链接,获取HTML
    r = requests.get(url, headers=headers)
    retext = r.text

    # 解析数据
    ht = html.fromstring(retext)

    #使用xpath获取
    city = ht.xpath('/html/body/div[5]/div/div[1]/ul/li')
    for i in city:
        name = i.xpath('./h3/a/text()')[0]
        beento = i.xpath('./p[@class="beento"]/text()')[0]
        list = i.xpath('./p[@class="pois"]/a/text()')
        list2 = ''
        # for j in list:
        #     list2=list2+','+j.strip()
        # print(name,beento,list2[1:])
        list = [place.strip() for place in list]
        list2 = ','.join(list)
        datalist = [name, beento, list2]
        ws.append(datalist)

for i in range(1,10):
    url='https://place.qyer.com/china/citylist-0-0-{}/'.format(i)
    getpage(url)

#Excel保存
fileanme="D:\Python\Project\test4" #路径可以自己设置,我这里是python源文件同级目录
wb.save("旅游景点.xlsx")

  run code

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBATG9ycmV5Xw==,size_20,color_FFFFFF,t_70,g_se,x_16

Guess you like

Origin blog.csdn.net/Lorrey_/article/details/124132050