Completely solve the garbage problem when Python3 write reptiles or website

The first to write posts, poor writing and writing unclear, please forgive me

Python3 play a lot of coding will encounter problems if directly coded pages to deal with the unknown, not utf8 format will be garbled, following describes a string encoding unknown converted to utf8 to avoid garbled way,

Can be used in many scenes in Python transcoding

Write your own reptile in extracted parts:

#请求网页并转网页编码
def getHtmlAndDealCode(url):
    #html=requests.get(url,verify=False)
    html = s.get(url,headers=header)
    code=html.encoding
    html=html.text
    html=html.encode(code)
    html=html.decode('utf-8')
    parser = 'html.parser'
    soup = BeautifulSoup(html ,parser)
    return soup

Principle is by encoding to obtain encoded string and then encoded by encode this solution, decode ( ' utf8 ' ) to convert the encoding utf8 encoding, and then the subsequent process may be performed

It is not a simple and practical ah

Guess you like

Origin www.cnblogs.com/pozhu15/p/11306335.html