[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 12-regular combat: Ganji.com rental information - Code World

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 12-regular combat: Ganji.com rental information

Others 2020-09-23 10:24:25 views: null

import requests
import re

def parse_page(page_url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
        'Cookie': 'ganji_uuid=3984569194922329389162; _gl_tracker=%7B%22ca_source%22%3A%22www.baidu.com%22%2C%22ca_name%22%3A%22-%22%2C%22ca_kw%22%3A%22-%22%2C%22ca_id%22%3A%22-%22%2C%22ca_s%22%3A%22seo_baidu%22%2C%22ca_n%22%3A%22-%22%2C%22ca_i%22%3A%22-%22%2C%22sid%22%3A35065526370%7D; ganji_xuuid=a9e45a92-73d5-4e3f-d7bf-278ee97c1527.1600652665525; GANJISESSID=p0u4fb9s622s632ur98hrcaqfp; citydomain=tj; ganji_login_act=1600652969366'
    }

    resp = requests.get(page_url, headers=headers)
    # print(resp.text)

    text = resp.text
    houses = re.findall(r"""
        <div.+?ershoufang-list"
        .+? #匹配任意字符 .任意字符
        <a.+?js-title.+?>
        (.+?) #分组形式获取标题信息
        </a>   #结束标志
        .+?<dd.+?dd-item.+?<span>(.+?)</span>   #获取房型
        .+?<span.+?<span>(.+?)</span>       #获取面积
        .+?<div.+?price.+?<span.+?>(.+?)</span> #租房价格
    """, text, re.VERBOSE|re.DOTALL) #|或运算
    for house in houses:
        print(house)

def main():
    base_url = 'http://tj.ganji.com/zufang/pn{}/'
    for i in range(1, 10):
        page_url = base_url.format(i)
        parse_page(page_url)
        break

if __name__ == '__main__':
    main()


'''
1. 如果让.代表所有字符,需要在函数后面加上re.DOTALL来标识,否则不会代表\n
2. 获取数据非贪婪模式,要用?
3. 正则不对,没有输出结果,出现假死
4. 正则不对,不钻牛角尖,更换思路

'''

Guess you like

Origin blog.csdn.net/weixin_44566432/article/details/108707568

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 12-regular combat: Ganji.com rental information

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 12-regular expression matching case: mobile phone number/email/url/ID

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 7-crawler parsing library XPath

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter four-cookie principle explanation

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter six-data storage: MySQL

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes eleven-regular expressions and re module

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes one-crawler basics

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 6-the use of the basic crawler library 2 (requests library)

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes 5-Cookie loading and saving

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes three-network agent

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes ten-crawling Douban movies TOP250 (actual combat)

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 13-data storage: JSON string format

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 14-data storage: CSV file read/write

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 15-data storage: excel file processing

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes chapter 9-search document tree find_all and select methods

[Python web crawler] 150 lectures to easily get the Python web crawler paid course notes eight-crawler parsing library bs4 BeautifulSoup

[Python Web Crawler] 150 Lectures Easy to Get Python Web Crawler Paid Course Notes Part II-Use of Crawler Basic Library 1 (urllib)

150 speak easily get Python web crawler - Chapter V: Advanced Crawler

150 speak easily get Python web crawler - Chapter 4: Data storage

150 speak easily get Python web crawler - Chapter 3: data analysis

Python web crawler and information extraction (3): the actual combat of web crawler

python web crawler information

python web crawler information

Python web crawler and information extraction (2): extraction of web crawler

Python web crawler and information extraction (2): extraction of web crawler

python web crawler - regular resolved

Python web crawler notes (6) GET request and POST request

Python web crawler combat (a) Quick Start

〖Python Web Crawler Actual Combat㉚〗- Selenium Node

Python web crawler and information extraction (examples on)

Recommended

Ranking

Opencv header files and Tips

Should you be afraid of artificial intelligence?

Getting Started with RabbitMQ

Banana Pi BPI-PicoW-S3 development board adopts Espressif ESP32-S3 design, compatible with Raspberry Pi Pico. Support ardduino and microPython development environment

Insert image in GitHub README.md

leetcode-database-175. Combine two tables-use left join to solve.

Problems caused by different Content-types in HTTP.

クラウドネイティブアプリケーションのリスクの概要

netty essay

A, Linux the hard disk partition: root partition (/) swap (/ the swap) and / boot partition

Daily

More

2024-04-18(0)

2024-04-17(31)

2024-04-16(23)

2024-04-15(5)

2024-04-14(0)

2024-04-13(18)

2024-04-12(5)

2024-04-11(0)

2024-04-10(1)

2024-04-09(0)