Python json.dumps Chinese garbled problem solving

json.dumps(var,ensure_ascii=False) does not solve the problem of Chinese garbled characters

json.dumpsThere will be different performances in different versions of Python. Note that the Chinese garbled problem mentioned below does not exist in the Python3 version.

Note: The following code python 2.7is tested under the version

1

2

3

# -*- coding: utf-8 -*-

odata = { 'a' '你好'}

print odata

result:

1

{ 'a''\xe4\xbd\xa0\xe5\xa5\xbd'}

1

print json.dumps(odata)

result:

1

{ "a""\u4f60\u597d"}

1

print json.dumps(odata,ensure_ascii=False)

result:

1

{ "a""浣犲ソ"}

1

print json.dumps(odata,ensure_ascii=False).decode('utf8').encode('gb2312')

result:

1

{ "a""你好"}

To solve Chinese encoding, you need to know how python2.7 handles strings:

Because # -- coding: utf-8 --of the role, the content of the file is encoded in utf-8, so print odata 

The output is the result of utf-8 encoding{‘a’: ‘\xe4\xbd\xa0\xe5\xa5\xbd’}

json.dumps is the ascii encoding used by default for Chinese when serializing, print json.dumps(odata) outputs the result of unicode encoding

print json.dumps(odata,ensure_ascii=False)Unused ascii encoding, encoding in gbk 

"Hello" encoding with utf8 is %E4%BD%A0%E5%A5%BD and decoding with gbk is huan ソ

The representation of strings in Python is unicode encoding.

Therefore, when doing encoding conversion, it is usually necessary to use unicode as the intermediate encoding, that is, first decode other encoded strings into unicode, and then encode from unicode into another encoding.

The function of decode is to convert other encoded strings into unicode encoding

decode('utf-8') means to convert utf-8 encoded string into unicode encoding.

The function of encode is to convert unicode encoding into other encoded strings

encode('gb2312'), which means to convert a unicode-encoded string into gb2312 encoding.

There is no such problem in python3, so the easiest way is to introduce the __future__ module to import the features of the new version into the current version

1

2

from __future__ import unicode_literals

print json.dumps(odata,ensure_ascii=False)

result:

1

{ "a""你好"}

UnicodeEncodeError:'ascii' codec can't encode exception occurred in Python2.7 when writing the file

Great God’s solution: 

Do not use open to open the file, but use codecs:

1

2

3

4

5

from __future__ import unicode_literals

import codecs

fp = codecs.open('output.txt''a+''utf-8')

fp.write(json.dumps(m,ensure_ascii=False))

fp.close()

Guess you like

Origin blog.csdn.net/kexin178/article/details/112761101
Recommended