9 xpath *** preferred, other languages can also be used
Request data_text = 9.1 (URL, param, header) .text
9.2 Tree = etree.parse (data_text) /etree.HTML (data_text)
9.3 tree.xpath List = ( '// tag name / label = label name @ name' / /a.text()[0] | // tag name / @ name tag = tag name '//a.text()[0]')
9.4 for paging param parameter,
9.5 "./" represents the current directory, in the new URL needs to be spliced together by former URL URL1 + URL2 way.
9.6 times the acquired data is garbled: a position generally garbled added img_name.encode ( 'iso-8895-1') decode ( 'gbk') encoding, and sometimes there is no effect upon the global position response.encoding = "utf-8". at coding. But it can not be used.
#! / usr / bin / env python
-- coding:utf-8 --
from lxml
import etree
if name == “main”:
tree = etree.parse(‘test.html’)
r = tree.xpath(’/html/body/div’)
r = tree.xpath(’/html//div’)
r = tree.xpath(’//div’)
r = tree.xpath(’//div[@class=“song”]’)
r = tree.xpath(’//div[@class=“tang”]//li[5]/a/text()’)[0]
r = tree.xpath(’//li[7]//text()’)
r = tree.xpath(’//div[@class=“tang”]//text()’)
r = tree.xpath('//div[@class="song"]/img/@src')
print(r)