Reptile error occurs gbk

Source:

1  '' ' Baidu Bar crawling, it different in different pages ' '' 
2  
. 3  from the urllib Import Request
 . 4  from the urllib Import the parse
 . 5  
. 6  # Definitions of common variables 
. 7 the base_url = " https://tieba.baidu.com/f? kW = " 
. 8 headers = { ' the User-- Agent ' : ' the Mozilla / 5.0 (the Windows NT 6.1; the WOW64; RV: 6.0) the Gecko / 20,100,101 Firefox / 6.0 ' }
 . 9  
10  # splicing URL, (the first coding, and then stitching, and then request) 
11 tb_name the iNPUT = ( " Please enter the name attached to it: ")
 12 is Key = parse.quote (tb_name)
 13 is URL = the base_url + Key
 14  
15  Print (URL)
 16  
. 17  # three steps 
18  # reconstruct the requested object, packaging the request header 
. 19 REQ = request.Request (URL, headers = headers )
 20  # send a request the urlopen 
21 is RES = request.urlopen (REQ)
 22 is  # acquisition response 
23 is HTML res.read = (). decode ( ' UTF-. 8 ' )
 24  
25  # Print (HTML) 
26 is  
27  # save the file 
28 with open('贴吧.txt','w') as f:
29     f.write(html)

During data reptiles, such a mistake:

 

Enter a name attached to it: the beauty of it
https://tieba.baidu.com/f?kw=%E7%BE%8E%E5%A5%B3%E5%90%A7
Traceback (MOST recent Results Last Call):
File "D : / AID1812 / Spider / day01 / 05_ _ Baidu Post bar to practice .py ", Line 29, in <Module>
f.write (HTML)
UnicodeEncodeError: 'GBK' CODEC CAN not encode Character '\ U0001f236' in position 166 141: illegal multibyte sequence

solution:

with open supplementary add encoding = "utf-8" () inside, OK.

# Save file 
with open ( 'it stick .txt', 'W', encoding = 'UTF-. 8') AS F:
f.write (HTML)

 

 

 

 




Guess you like

Origin www.cnblogs.com/tianxiong/p/10929704.html