Chinese garbled solve xpath

Xpath built using the tag tree after, although improving the efficiency of matching elements, but etree put Chinese into ASCII code, so the future will simply tostring garbled.

Solution:

import requests
from requests.exceptions import RequestException
from lxml import etree

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15',
}


def get_one_page(url, headers):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            response.encoding = response.apparent_encoding
            return response.text
        return None
    except RequestException:
        return None


tree = etree.HTML(html)
aim = tree.xpath(exp)
for i in aim:
    content = etree.tostring(i, encoding='utf-8', pretty_print=True, method="html").decode('utf-8')

 

Guess you like

Origin www.cnblogs.com/Rhythm-/p/11374832.html