[Entry] 3-7 Python Python Unicode strings in

There is also a string coding problems.

Because the computer can only handle numbers, if you want to process text, it must first convert text to numbers to be processed. The earliest computers employed in the design of 8 bits (bit) as a byte (byte), therefore, a maximum byte integers can be represented is 255 (decimal 255 = 11111111 in binary), 0 - 255 is used to indicate the size of the write letters, numerals and some symbols, the encoding table is called ASCII code, such as capital letters a coding is 65, z lowercase code is 122.

If you want to represent Chinese, apparently a byte is not enough, we need at least two bytes, and ASCII encoding and can not conflict, so, China has developed GB2312 coding, used to compile into Chinese.

Similarly, Japanese, Korean, and other languages ​​also have this problem. For all Unicode characters, Unicode came into being. All Unicode languages ​​are unified into a set encodings, so you do not have a garbage problem.

Unicode characters typically represent a two-byte, double-byte original into English from the single-byte coding, just need to fill all of the high byte is 0 can be.

Since the birth of the release of Python Unicode standard than the time even earlier, so the earliest Python only supports ASCII encoding, normal string 'ABC' inside Python are ASCII encoding.

Python later adds support for Unicode and Unicode strings to represent the '...' is represented by u, for example:

print u'中文'

Chinese
Note: without u, Chinese can not be displayed properly.

Unicode strings in addition to more than a u, nothing different from ordinary strings, escape characters and multi-line representation is still valid:

Escape:

u'中文\n日文\n韩文'

Multi-line:

u'''第一行
第二行'''

raw + multiple lines:

ur'''Python的Unicode字符串支持"中文",
"日文",
"韩文"等多种语言'''

If you encounter UnicodeDecodeError Chinese string at the Python environment, which is saved because .py file format issue. You can add a comment in the first line

# -*- coding: utf-8 -*-

Purpose is to tell the Python interpreter, in UTF-8 code reading source code. Notepad ++ then select Save As ... and save the UTF-8 format.

task:

The following represents the Unicode strings Tang and multi-line printing:

Nostalgia

Moonlight,
suspected ground frost.
I raise my eyes to the moon,
looking down and think of home.

From writing code:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u'''静夜思

床前明月光,
疑是地上霜。
举头望明月,
低头思故乡。
'''
Published 20 original articles · won praise 0 · Views 422

Guess you like

Origin blog.csdn.net/yipyuenkay/article/details/103871619