Python coding format error solutions and case

Python format error solutions and case

  These days playing reptile, often when parsing and extracting content because the content format problems lead to errors, in order to prevent future errors, look at the whole, the following is a summary of the past few days:

  1. Special symbols or emoticons, etc.

    Background : crawling teaching a cooking website, with BeautifulSoup error when parsing web pages:

    UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001f44d' in position 0: Non-BMP character not supported in Tk

    Solution :

    import sys

    non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)

    targetText=targetText.translate(non_bmp_map)

    That is where the targetText text you need to convert.

 

  2. csv written in Chinese garbled

    BACKGROUND: csv Module1 csv classic method of operating controls, are generally csv file operation using 'utf-8' coding format, as follows:

    

import csv 

targetText=['abc','efg']

csv_target=open('mycsv.csv','a+',newlien='',encoding='utf-8')

writer=csv.writer(csv_target)

writer.writerow(targetText)

csv_target.close()

Thereto Chinese writing (i.e. targetText Chinese comprising, as targetText = [ 'John Doe', 'John Doe']) be garbled.

 

    Solution: modified coding mode is 'utf-8-sig'

import csv 

targetText=['張三','李四']

csv_target=open('mycsv.csv','a+',newlien='',encoding='utf-8')

writer=csv.writer(csv_target)

writer.writerow(targetText)

csv_target.close()

  

  python encoding format is a pit.

  With this being the first update, and then continue to encounter later update.

  

    

 

  

 

Guess you like

Origin www.cnblogs.com/riocasture/p/11237197.html