Python- parse HTML page (HTMLParser)

Definition of common methods and classes HTMLParser

Class definition

  • HTMLParser is mainly used to parse the HTML file (including the HTML tag is invalid).
  • Convert_charrefs parameter indicates whether all the character references automatically converted to Unicode form, after Python3.5 default is True.
  • HTMLParser can receive the appropriate HTML content, and parses encounter HTML tags will automatically call the appropriate handler (approach) to deal with, we need to create the appropriate subclass inherits HTMLParser themselves, and replication corresponding handler method.
  • HTMLParser does not check whether the start and end tags are a pair.

Common method

Examples of applications

Guess you like

Origin www.cnblogs.com/liuhaidon/p/12060184.html