08.06 self-summary
python web crawler parses the module lxml
A. Mounting module
The windows system installation:
method one:pip3 install lxml
Method Two: download the corresponding file system version of the wheel: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml
pip3 install lxml-4.2.1-cp36-cp36m-win_amd64.whl
Path # file is located
linux installation:
method one:pip3 install lxml
Method Two:yum install -y epel-release libxslt-devel libxml2-devel openssl-devel
II. Use of module
from lxml.html import etree
Show
import requests
from lxml.html import etree
rp = requests.get('http://www.baidu.com')
html = etree.HTML(rp.text)
#解析后的对象可以使用xpath进行内容匹配