Article Directory
foreword
What I will introduce to you today is Python crawler second-hand housing data.
1. Development tools
Python version: 3.6
Related modules:
requests module
parser module
csv module
re module
2. Environment construction
Install Python and add it to the environment variable, and pip installs the required related modules.
3. Data source query analysis
Open the page we want to capture in the browser
Press F12 to enter the developer tool to view the second-hand housing data we want
Here we need the page data.
4. Code implementation
for page in range(1, 11):
print(f'正在爬取第{
page}页的数据内容')
url = f'https://cs.lianjia.com/ershoufang/pg{
page}/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
selector = parsel.Selector(response.text)
href = selector.css('.sellListContent li .title a::attr(href)').getall()
for index in href:
html_data = requests.get(url=index, headers=headers).text
select = parsel.Selector(html_data)
title = select.css('.main::text').get()
price = select.css('.price .total::text').get()
price_1 = select.css('.unitPrice span::text').get()
unit_type = select.css('.room .mainInfo::text').get()
floor = select.css('.room .subInfo::text').get().split('/')
floor_1 = floor[0].replace('楼层', '')
floor_2 = floor[1].replace('共', '').replace('层', '')
face = select.css('.type .mainInfo::text').get()
furnish = select.css('.type .subInfo::text').get().split('/')
furnish_1 = furnish[0]
furnish_2 = furnish[1]
acreage = select.css('.area .mainInfo::text').get().replace('平米', '')
community = select.css('.communityName .info::text').get()
area_list = select.css('.areaName .info::text').getall()
area_list_1 = select.css('.areaName .info a::text').getall()
area_str = ''.join(area_list).strip() # strip()
area = '-'.join(area_list_1) + '-' + area_str
print(title, price, price_1, unit_type, floor, face, furnish, acreage, community, area, index, sep=' | ')