爬取安居客上住房信息的简单爬虫，并存储为表格文件

代码如下，有注释进行介绍：

# python3, Firefox浏览器
import requests
from bs4 import BeautifulSoup
import time
import csv

# 定制请求头，请求头在浏览器中查看，具体方法见附录一
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0',
}

# 将要访问的网址
link = 'https://beijing.anjuke.com/sale/'
# 访问该网站
r = requests.get(link, headers=headers, timeout=100)

# 使用BeautifulSoup提取html中的内容
# BeautifulSoup 中文官方文档：https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#id37
soup = BeautifulSoup(r.text, 'lxml')
house_list = soup.find_all('li', class_="list-item")

# 将爬取的内容写入 test.csv中，编码格式为 'UTF-8'
with open('test.csv', 'a+', encoding='UTF-8', newline='') as csvfile:
    w = csv.writer(csvfile)

    for house in house_list:
        temp = []
        
        name = house.find('div', class_="house-title").a.text.strip()
        price = house.find('span', class_='price-det').text.strip()
        price_area = house.find('span', class_='unit-price').text.strip()
        no_room = house.find('div', class_='details-item').span.text
        area = house.find('div', class_='details-item').contents[3].text
        floor = house.find('div', class_='details-item').contents[5].text
        year = house.find('div', class_='details-item').contents[7].text
        broker = house.find('span', class_='brokername').text
        broker = broker[1:]
        address = house.find('span', class_='comm-address').text.strip()
        address = address.replace('\xa0\xa0\n                  ', ' ')
        tag_list = house.find_all('span', class_='item-tags')
        tags = [i.text for i in tag_list]
        temp = [name, price, price_area, no_room, area,
                floor, year, broker, address, tags]
        print(temp)
        # 写入表格（test.csv）
        w.writerow(temp)

附录一，请求头的查找：

打开浏览器到你要爬取的网站 --> 右键选择查看元素 --> 点击网络（如图一）-->

重新载入当前页面 -- > 右键点击开发者工具栏--> 点击原始头（如图二）就可以看到请求头

图一

图二

参考文献：python网络爬虫从入门到实践唐松等【2017.9】

爬取安居客上住房信息的简单爬虫 ，并存储为表格文件

猜你喜欢

爬取安居客上住房信息的简单爬虫，并存储为表格文件