Python 学习之常用内建模块(HTMLParser)

Python 利用 HTMLParser ,可以把网页中的文本、图像等解析出来。

实例

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

' HTMLParser '

__author__ = 'Kevin Gong'

from html.parser import HTMLParser
from urllib import request

class EventSearchParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.flag = 0  # 状态 1:目标标签 0:不是目标标签

    def handle_starttag(self, tag, attrs):
        if tag == 'h3' and ('class', 'event-title') in attrs:  # 筛选会议名称
            self.flag = 1
        elif tag == 'time' and 'datetime' in attrs[0]:  # 筛选会议时间
            self.flag = 1
        elif tag == 'span' and ('class', 'event-location') in attrs:  # 筛选会议地点
            self.flag = 1

    def handle_data(self, data):
        if self.flag:
            print(data)
            self.flag = 0  # 还原状态

with request.urlopen('https://www.python.org/events/python-events/') as f:
    data = f.read().decode('utf-8')

parser = EventSearchParser()
parser.feed(data)

结果:

PyCon CZ 2020 (canceled)
05 June – 07 June
Ostrava, Czech Republic
PyLondinium 2020 (postponed)
05 June – 07 June
London, UK
PyCon Odessa 2020
13 June – 14 June
Odessa, Ukraine
Python Web Conference 2020 (Online-Worldwide)
17 June – 19 June
https://2020.pythonwebconf.com
Better Python Unit Tests
23 June
Online
FlaskCon (online)
04 July – 05 July
Online
Python fwdays'20
23 May
Online
Python fwdays'20
16 May
Online

猜你喜欢

转载自blog.csdn.net/duoduo_11011/article/details/106506261