The get_text method of beautifulsoup does not get the string or the string is empty

problem:

Today, when writing a crawler, we used beautifulsoup to parse the webpage code. A very strange thing happened: the code printed by the .prettify() method was normal and contained text data, but the text data could not be obtained with .get_text(), so I tried again. .string and .text, the same text cannot be obtained

 

solve:

After searching for a long time, I did not find an effective solution on the Internet, but after a long time of tossing, I finally re-read all the documents of bs4, and suddenly felt that it was a problem with the parser, so I decided to try another parser . Here I am I used html5lib , and I needed to install it with pip and re-run the code. It turned out to be normal. I wasted a few hours. Record it for later reference.

Guess you like

Origin blog.csdn.net/THMAIL/article/details/108250466