xpath html document processing

Xpath deal with html files, you need to first convert html files to an XML file, and then find a node or HTML elements with xpath. xpath resolved object is <class 'lxml.etree._Element'> type
crawlers pages processed in two ways:
1 then crawlers, data acquisition and data cleaning issues with the HTML ()
2 separate data acquisition and data cleaning, the parse () and the local html file cleaning

Data acquisition and data cleansing issues with HTML ()

import requests
from lxml import etree

page_info = requests.get("https://www.liepin.com/zhaopin/?init=1&imscid=R000000058&d_sfrom=search_fp_bar&key=%E8%B4%A8%E9%87%8F").content.decode()
# print(type(page_info))
p

Guess you like

Origin blog.csdn.net/Mwyldnje2003/article/details/103543877