Simple use of BeautifulSoup

The official document load more slowly (estimated to be my party's reasons)

https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html#find-parents-find-parent

1, download BeautifulSoup4

2, import module

from bs4 import BeautifulSoup

3, using BeautifulSoup get label text

from BS4 Import BeautifulSoup 
s1 = "" " 
<the p-class =" QL-align = left-The justify "> on the record sheet, rocket teenage girls, including Meng Meiqi, Wu Xuan instrument, Yang beyond, segment Austrian Juan, Yamy, Laimei Yun, Zhang Zining and Lizi Ting, have at least one of data on height, weight fields, but in addition Laimei Yun, others like Meng Meiqi, Wu Xuan instrument, such as a person's height than the official figures dwarf 1-3 cm, like Yang beyond the official figure is 168 cm, but the actual height was 166.5 cm. </ P> 
"" " 
BS = the BeautifulSoup (S1, " html.parser " )
 Print (bs.text)

Objective: To extract text from HTML file

4, the role: to remove the label specified purpose: rich text box to prevent XSS attacks

from bs4 import BeautifulSoup
    bs = BeautifulSoup(s1, "html.parser")
    ret = bs.text
    # print(ret)
    ret = bs.find_all()
    print(ret)
    for tag in bs.find_all():
        print(tag.name)
        if tag.name == "script":
            tag.decompose()

 

Guess you like

Origin www.cnblogs.com/wt7018/p/11361183.html