When learning Matplotlib, I also slowly learned Cui Qingcai's crawler tutorial Click to open the link
One of them needs to use the etree.parse method of the lxml parsing library to load the local
./test.html text file
.
(Why is the file name of the example written as ./test.html?? What is the extra ./ for? Baidu failed, please answer! Thank you!
)
The code reference is as follows:
from lxml import etree html = etree.parse('./test.html', etree.HTMLParser()) result = etree.tostring(html) print(result.decode('utf-8'))
A file load error occurred after running.
Think of the last time " Using Python's with open function to load and read py's local current directory file problem " Click to open the link
So the full path file address is added to the code, but the loading error still occurs.
It seems that the input method of the display file directory address is incorrect, so again refer to the style in the error description to enter the html text address.
from lxml import etree html = etree.parse('D:/python3.6/scrapy/./test.html', etree.HTMLParser()) result = etree.tostring(html) print(result.decode('utf-8'))
Loading is successful, parsing is successful.
The test.html text code for local testing is as follows:
div the li class=item-0a href=link1.htmlfirst itemali li class=item-1a href=link2.htmlsecond itemali li class=item-inactivea href=link3.htmlthird itemali li class=item-1a href=link4.htmlfourth itemali li class=item-0a href=link5.htmlfifth itema the div