BeautifulSoup module details

Installation lxml, engine (parser)

soup=BeautifulSoup(html_doc,features="lxml")

tag = soup.select ( '# link2') way selector

tag.name get the tag name

children: son and label content is not the same type

descendants: descendants

clear: Clears the reserved label name decompose: delete, label names are not retained

extract: Remove and return a value (label removed )

encode: the object into byte type decode: the object into a string type

recursive = True if the recursive find

soup.find (class_ = '') class to write out attrs avoid to underline the definition of class conflict and class keyword

Yes wildcard in addition to newline \ n

tag.get ( 'id') acquires attribute tag

www.cnblogs.com/wupeiqi/articles/6283017.html

is_empty_element is empty or self-closing tag label

tag.string not only acquire but also modify the label content

Create Label: obj = Tag (name = 'div', attrs = { 'id': 'it'})

jquery.cuishifeng.cn jquery Daquan

tag.wrap (obj) to the tag label wrapped obj

tag.unwrap () to remove the current label, the package retains its label

 

Guess you like

Origin www.cnblogs.com/jintian/p/11403120.html