Character Encoding combing

Computer Basics

  1. cpu: controlling the operation of the program (text editor removed from the memory data is read into memory
  2. Memory: Run the program (operation after CPU, the memory containing data in a text editor)
  3. Hard: storing data (text editor)

Principle text editor, file access

  1. Open the editor will open to start a process, it is in memory, so content with the editor to write and are also stored in memory, data loss after power failure.
  2. To permanently preserved, you need to click the Save button: the brush editor memory data to the hard disk.
  3. We write a py file (not implemented), there is no difference with the preparation of other documents, I am just writing a bunch of characters only.

  4. Role: read and write data, save data

Step three py python interpreter to execute the file

  • The first stage: Python interpreter starts, then start the equivalent of a text editor
  • The second stage: Python interpreter equivalent of a text editor, from the contents of the hard disk test.py file into memory, read the contents of test.py (involving character encoding)
  • The third stage: the contents of execution just retrieved from the hard disk (generated variables open up the memory space to store variables, variables stored in memory storage, involves a character encoding)

Similarities and differences python interpreter and text editing

  • The same point: Python interpreter is interpreted contents of the file, so Python interpreter have read py file function, which is the same as with a text editor.
  • Different points: a text editor, file contents into memory, in order to display or edit, simply ignore Python syntax, but Python interpreter will file contents into memory, not in order to give you a peek in to write Python code the what, but to execute Python code that identifies the Python syntax.

Character Encoding

Character encoding is a process of mutual conversion between binary and character you can know

Early when we lay hard disk gbk / utf8 / ascii file, the hard disk file into memory to perform, due to the encoding format is not uniform, it can not read, so there was unicode coding, coding all he can recognize the situation memory to read the situation unicode file on your hard disk. form unic of the file is read into memory hard drive, but unicode will take up more memory, so it unicode situation continues to be converted into utf8 code form of code savings space

In both cases garbled

Chinese text editor only know Chinese, then you enter the Japanese -> garbled coding #encode

Text Editor stores Chinese (file), but you open the file with Japanese Editor -> decode garbled #decode

Solve the garbage

What storage format, nothing to read format, it will not be garbled!

windows notepad computer defaults to gbk encoding, in addition to other software default encoding is utf8

python3 (understand)

You see it is actually unicode

But to help you put this terminal unicode zeros and do a conversion, converted from unicode terminal can identify the encoding format, and then turned into Chinese

# coding:gbk
a = '中文'  # 用unicode编码存储了这堆0和1
print(a)  # 010101010

Default encoding is assumed that the terminal gbk, encoded variables recognized unicode

Default encoding is assumed that the terminal utf8, unicode encoding variable awareness

Python2 (understand)

encoding the specified coding unicode + (str type)

# coding:gbk
a = '中文'  # 用gbk编码存储了这堆0和1
a = u'中文'  # 用unicode编码存储了这堆0和1
print(a)

Terminal is a text editor that will be the default encoding.

Default encoding is assumed that the terminal gbk, gbk encoding the variable know

The default code is assumed that the terminal utf8, encoded variables do not know gbk

Guess you like

Origin www.cnblogs.com/aden668/p/11316186.html