Preface
This article is the 58th of this column. I will continue to share useful knowledge about Python crawlers, so remember to pay attention.
Students who have done crawler projects may have more or less crawled document data, such as document data from "government websites, news websites, novel websites" and other platforms. The author will not go into too much detail here about crawling document data. In this article, the author will mainly introduce how to remove the table and save the body when the text content of the document contains tables during the process of crawling document data .
For specific implementation ideas, follow the author directly to the text for details. (complete code attached)
text
Address : aHR0cDovL2Znay5tb2YuZ292LmNuL3VpL3NyYy92aWV3cy9sYXdfaHRtbC82NDU0Ny5odG1s
Goal : Remove the table from the text and save the text content locally
1. Problem description
As shown below: