python- reptile -bs4-BeautifulSoup

Use Code process:
The core idea: to convert documents into html Beautiful object, and then call the object
Properties and methods of locating html document specifies the contents of the search.
1 导包:from bs4 import BeautifulSoup
Creating Beautiful objects: - If the source is from local html document:
1 Beautiful ( 'open (' local html file ')', 'lxml')
- If html is from the network
1 Beautiful ( 'request to the network page data', 'lxml')
- properties and methods:
(1) according to the label name Find
- soup.a only find the first to meet the requirements of the label
(2) get property
- soup.a.attrs get a all the attributes and attribute values, returns a dictionary
- soup.a.attrs [ 'href'] href attribute acquisition
- soup.a [ 'href'] may be abbreviated as such form
(3) obtain content
- soup.a.string /text()
- soup.a.text //text()
- soup.a.get_text() //text()
[Note] If there is tag label, then the string to get results to None,
While the other two, you can get the text content
(4) find: Find the first to meet the requirements of the label
- soup.find ( 'a') to find the first to meet the requirements of
- soup.find('a', title="xxx")
- soup.find('a', alt="xxx")
- soup.find('a', class_="xxx")
- soup.find('a', id="xxx")
(5) find_all: Find all meet the requirements of the label
- soup.find_all('a')
- soup.find_all ([ 'a', 'b']) find all labels a and b - soup.find_all ( 'a', limit = 2) before the two limit
(6) selected in accordance with the specified content selector
select:soup.select('#feng')
- Common selector: tag selector (A), class selector, id selector (#), layer (.)
Level selector
- Level Selector:
div .dudu #lala .meme .xixi below a lot class div // img
div> p> a> .lala only following a div / img
[Note] selector to select a list of return will always be necessary to extract specified by index
Objects

Guess you like

Origin www.cnblogs.com/person1-0-1/p/11320392.html