xpath is an HTML page looking for ways to filter the data we need, and his result is a list
To be filtered HTML page:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"/> <title>Xpath 测试</title> </head> <body> <div class="song"> 火药 <b>指南针</b> <b>造纸术</b> <</printing>Bb > </ div > < div class = "Tang" > < ul > < li class = "balove" > Parking Fenglin love to sit late, Leaves red flowers in February </ li > < li the above mentioned id = "hua" > business subjugation of women do not know hate, across the rear garden flowers still sing </ li > < li class = "Love" name = "Yang" > a ride Red Feizixiao, no one knows is to litchi </ li > < li the above mentioned id = "bei" >Magic Cup grape wine, To drink immediately pipa reminder </ li > < Li > < A href = "http://www.baidi.com" > Baidu, </ A > </ li > </ ul > < OL > < li class = "lucy" > searching, desolate , desolately sad </ Li > < Li class = "balily" > ye warm also when cold, most difficult to rate </ Li > < Li class = "Lilei" > cups light wine </ Li > < li >How the enemy he later arrivals, winds </ li> < Li > Yan had also, is sad, but it is old acquaintance </ li > < li > Love is a word, I only say this once </ li > < li > loved, do not regret it, Bao Dai </ li > </ OL > </ div > </ body > </ HTML >
xpath example demonstrates:
# Contents of the local file xpath.html lookup from lxml Import etree # -generated objects Tree = etree.parse ( ' xpath.html ' ) # Print (Tree) RET = tree.xpath ( ' // div [@ class = "Tang "] / UL / Li [. 1] ' ) # RET is a listing Print (RET [0] .text) RET = tree.xpath ( ' // div [@ class =" Tang "] / UL / Li [. 1] / text () ' ) Print (RET) # the href attribute Baidu, baidu = tree.xpath ( ' // div [@ class = "Tang"] / UL / Li [. 5] / A / @ the href '), The print (to Baidu), #逻辑and luoji = tree.xpath('//div[@class="tang"]/ul/li[@class="love" and @name="yang"]/text()') print(luoji) #模糊contains mohu = tree.xpath('//li[contains(@class,"l")]') print(mohu) mohu1 = tree.xpath('//li[contains(text(),"爱")]/text()') print(mohu1) start = tree.xpath('//li[starts-with(@class,"ba")]/text()'' text = tree.xpath (fetch text text ()#(Start)Print) div // [@ class = "Song"] ' ) String = text [0] .xpath ( ' String (.) ' ) Print (String.Replace ( ' \ n- ' , ' ' ) .replace ( " \ T " , "" )) # replace all line breaks and tabs
Filter your results: