python爬虫解析库学习

一、xpath库使用:

  1、基本规则:

    

   2、将文件转为HTML对象:

    

1 html = etree.parse('./test.html', etree.HTMLParser())
2 result = etree.tostring(html)
3 print(result.decode('utf-8'))

    3、属性多值匹配:

    //a[contains(@class,'li')]

   4、多属性匹配:

      //a[@class="a" and @font="red"]

   5、按序选择:

    

二、beautifulsoup库学习:

  1、基本初始化:

    

    将HTML字符串用lxml格式来解析,并补全标签,创建html处理对象。

  2、获取信息:

    (1)获取title的name属性:

      soup.title.name

    (2)获取多属性:

      

    

html = etree.parse('./test.html', etree.HTMLParser())
result = etree.tostring(html)
print(result.decode('utf-8'))

猜你喜欢

转载自www.cnblogs.com/monty12/p/9960572.html