UnicodeEncodeError: ‘gbk’ codec can’t encode character u’\uXXXX’ in position XX

Since the local system is cmd in Win7, the default codepage is CP936, which is the encoding of GBK, so it is necessary to encode the above-mentioned Unicode titleUni to GBK first, and then display it in cmd, and then because titleUni contains some GBK that cannot be encoded The displayed characters cause the error "'gbk' codec can't encode" at this time.

【Summarize】

For this (class) question:

(1) UnicodeEncodeError -> indicates that it is a problem with Unicode encoding;

(2) 'gbk' codec can't encode character –> Description is a problem when encoding Unicode characters to GBK;

At this time, the most likely possibility is that the characters of the Unicode type themselves contain some characters that cannot be converted to GBK encoding.

The solution is:

  • plan 1:

When encoding unicode characters, add the ignore parameter to ignore characters that cannot be encoded, so that they can be encoded as GBK normally.

The corresponding code is:

gbkTypeStr = unicodeTypeStr.encode("GBK", ‘ignore’);
  • Scenario 2:

Alternatively, convert it to GBK-encoded superset GB18030 (ie, GBK is a subset of GB18030):

gb18030TypeStr = unicodeTypeStr.encode("GB18030");

The corresponding character obtained is the encoding of GB18030.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324594184&siteId=291194637