[Python crawler project actual combat] Python crawler second-hand housing data is saved locally


foreword

What I will introduce to you today is Python crawler second-hand housing data.


1. Development tools

Python version: 3.6

Related modules:

requests module

parser module

csv module

re module

2. Environment construction

Install Python and add it to the environment variable, and pip installs the required related modules.

3. Data source query analysis

Open the page we want to capture in the browser
Press F12 to enter the developer tool to view the second-hand housing data we want
Here we need the page data.

insert image description here

4. Code implementation

  for page in range(1, 11):
    print(f'正在爬取第{
      
      page}页的数据内容')
    
    url = f'https://cs.lianjia.com/ershoufang/pg{
      
      page}/'
    
    headers = {
    
    
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
    }
    
    response = requests.get(url=url, headers=headers)
    
    selector = parsel.Selector(response.text)  
    
    href = selector.css('.sellListContent li .title a::attr(href)').getall()
    for index in href:
        
        html_data = requests.get(url=index, headers=headers).text
        select = parsel.Selector(html_data)
        
        title = select.css('.main::text').get()  
        price = select.css('.price .total::text').get()  
        price_1 = select.css('.unitPrice span::text').get()  
        unit_type = select.css('.room .mainInfo::text').get()  
        
        floor = select.css('.room .subInfo::text').get().split('/')  
        floor_1 = floor[0].replace('楼层', '') 
        
        
        floor_2 = floor[1].replace('共', '').replace('层', '') 

        face = select.css('.type .mainInfo::text').get() 
        furnish = select.css('.type .subInfo::text').get().split('/')
        furnish_1 = furnish[0]  
        furnish_2 = furnish[1]  
        
        acreage = select.css('.area .mainInfo::text').get().replace('平米', '')  
        community = select.css('.communityName .info::text').get()  
        area_list = select.css('.areaName .info::text').getall()  
        area_list_1 = select.css('.areaName .info a::text').getall()  
        
        area_str = ''.join(area_list).strip() # strip() 
        area = '-'.join(area_list_1) + '-' + area_str 
        
        print(title, price, price_1, unit_type, floor, face, furnish, acreage, community, area, index, sep=' | ')
        

Guess you like

Origin blog.csdn.net/Modeler_xiaoyu/article/details/128600887