tree = etree.parse(page_text,parser=parser) File "src\lxml\etree.pyx", line 3521, in lxml.etree.parse File "src\lxml\parser.pxi", line 1859, in lxml.etree._parseDocument File "src\lxml\parser.pxi", line 1885, in lxml.etree._parseDocumentFromURL File "src\lxml\parser.pxi", line 1789, in lxml.etree._parseDocFromFile File "src\lxml\parser.pxi", line 1177, in lxml.etree._BaseParser._parseDocFromFile File "src\lxml\parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src\lxml\parser.pxi", line 725, in lxml.etree._handleParseResult File "src\lxml\parser.pxi", line 652, in lxml.etree._raiseParseError OSError: Error reading file
原因
直接用etree.parse(page_text),读取从网上爬取的HTML,而不是从文件中读取则会报错
tree = etree.parse(page_text,parser=parser)
解决方案:
先使用etree.HTML(网上爬取的HTML)让其进行解析,然后再使用xpath()进行数据解析
html = etree.HTML(page_text)
content_list = html.xpath("//ul[@class='house-list-wrap']/li")