python网络爬虫-爬取北京地区短租房信息源码

其他 2019-03-23 19:00:52 阅读次数: 0

版权声明：未经原作者允许不得转载本文内容，否则将视为侵权 https://blog.csdn.net/springhammer/article/details/88649999

python网络爬虫-爬取北京地区短租房信息源码

# -*- coding:UTF-8 -*-
from bs4 import BeautifulSoup
import requests
import time

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'
}

def judgment_sex(class_name):
  if class_name == ['member_ico1']:
      return '女'
  else:
      return  '男'

def get_links(url):
    wb_data = requests.get(url,headers=headers)
    soup = BeautifulSoup(wb_data.text,'lxml')
    links = soup.select('#page_list > ul > li > a')
    for link in links:
        href = link.get("href")
        get_info(href)

def get_info(url):
    wb_data = requests.get(url,headers=headers)
    soup = BeautifulSoup(wb_data.text,'lxml')
    tittles = soup.select('div.pho_info > h4')
    addresses = soup.select('span.pr5')
    prices = soup.select('#pricePart > div.day_l > span')
    imgs = soup.select('#floatRightBox > div.js_box.clearfix > div.member_pic > a > img')
    names = soup.select('#floatRightBox > div.js_box.clearfix > div.w_240 > h6 > a')
    sexs = soup.select('#floatRightBox > div.js_box.clearfix > div.member_pic > div')
    for tittle, address, price, img, name, sex in zip(tittles,addresses,prices,imgs,names,sexs):
        data = {
            'tittle':tittle.get_text().strip(),
            'address':address.get_text().strip(),
            'price':price.get_text(),
            'img':img.get("src"),
            'name':name.get_text(),
            'sex':judgment_sex(sex.get("class"))
        }
        print(data)

if __name__ == '__main__':
    urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(number) for number in range(1,14)]
    for single_url in urls:
        get_links(single_url)
        time.sleep(2)

猜你喜欢

转载自blog.csdn.net/springhammer/article/details/88649999

python网络爬虫-爬取北京地区短租房信息源码

爬取北京地区的租房信息

python爬虫(一) 爬取北京短租房信息

【python爬虫系列】12.实战一爬取北京地区所有的房租信息

链家深圳租房信息爬取练习附加源码

python爬取北京租房信息

python3爬取“小猪短租-北京”租房信息

Python爬虫入门 | 5 爬取小猪短租租房信息

python爬虫教程：Scrapy框架爬取Boss直聘网Python职位信息的源码

爬虫项目——拉勾在整个北京地区的python职位

python网络爬虫-爬取糗事百科段子源码

python网络爬虫-爬取《斗破苍穹》全文小说源码

python网络爬虫-爬取酷狗TOP500的数据源码

Python3网络爬虫--爬取歌词并制作GUI（附源码）

Python3网络爬虫--爬取海外视频（附源码）

【Python 网络爬虫】使用 urllib 爬取网页源码、图片和视频

python爬虫入门（1）简单爬取网页源码

Python爬虫——爬取网站的实例化源码

python爬虫学习资料以及多个网页爬取的源码

python爬虫入门（一）——爬取整个网页的源码

Python 如何通过网络爬虫简单爬取“安居客”网站的租房信息

Python爬取租房信息

Python爬虫之路-爬取北、上、广租房信息

Python爬虫实战，requests+openpyxl模块，爬取手机商品信息数据（附源码）

用urllib爬取链家北京地区所有小区的户型图

crawlspider 爬取51job nlp北京地区的职位并保存到mongo

python爬虫--小猪短租的租房信息

我的第一个python爬虫程序——爬取网络小说（含错误及源码）

Python3网络爬虫--爬取百度搜索结果（附源码）

python3网络爬虫--最新爬取B站视频弹幕 so文件（附源码）

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

让自己的头脑极度开放

CentOS 6.5(x64) 和Redhat6.5操作系误删libc

高可用注册中心

【日记】12.28/【题解】AtCoder AGC041

XML（5）_XML 约束_DTD

Java集合Map（四）

树梅派安装桌面环境教程

pipenv 的使用和安装

小程序白屏问题和内存研究

C语言简单选择排序

每日归档

更多

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)