python third-party library: use html2text to convert html to markdown format

I have looked for many libraries to convert HTML to markdown format before, but some libraries are not very effective. Later, I used html2text, and the effect was somewhat improved.

html2textThe principle is to use to HTMLParserparse the HTML tags one by one and restore them to markdown format according to each tag.

html2text installation

html2text The address is at:

http://www.aaronsw.com/2002/html2text/

It is a web page formatting tool that can convert websites into markdown format online, download it html2text.py, and put it into your own project.

githubThe address above is:

https://github.com/aaronsw/html2text

html2text use

It is also simpler to use than other libraries:

import html2text
article_content = ""
html2text.html2text(article_content)

Sometimes html2textwe don't understand the markdown we need very well. Fortunately, html2text.pyit's not very complicated. We can make corresponding modifications based on the source code.

Guess you like

Origin blog.csdn.net/weixin_40425640/article/details/124074494#comments_28535927