An article to get to know python2,3 coding

He said front:

  Coding program has been plagued by road every programmer, if it is not completely clear, then you go this route will be exceptionally difficult, especially for use python programmers, this question more significant,

  Because there are two versions of python, two versions of the encoding format is completely different, but we often need to take into account both versions, so there are chances of problems on the big lot.

  So here I am trying to use an article to completely sort out a problem entire coding python language, the possibility to minimize future problems cited in this regard.

  ps this article to a certain extent and cited reference alex's blog: " https://www.cnblogs.com/alex3714/articles/7550940.html "

He said coding, coding must first know what is, why there is code:

  Baidu Encyclopedia explanation is: "coded information is converted from one form to another form of the process," which is actually a process, and we often say "coding problems", in fact, more is referred to as "encoding format problem".

  Common encoding formats are:

    ASCII one byte, only supports English

    GB2312 2 bytes, supports multiple 6700 characters

    GBK GB2312 upgraded version, support Chinese characters more (21000+ Kanji)

    Shift-JIS Japanese character

  Because the computer only recognizes the binary, each character wants to be recognized by the computer, then it needs to have the correspondence between character and binary, and each country has its own character, but it contains only a national character,

  Resulting in their software systems, to a foreign country it will be garbled, so in order to solve this problem, "Unicode" (Unicode) appeared, which contains the text of all correspondence between the world and its binary.

  Unicode 2-4 bytes, has a collection of 136690+ characters, and still in expansion.

  It supports all global languages, each country without using their original encoding, with Unicode everything will be fine.

  Unicode solves the correspondence between character and binary, but there is still a problem is not resolved, and that is the problem space, because Unicode uses 2-4 bytes identify a character, and second, the original ASCII code although only supports English,

  But a letter only one byte, the original "Python" occupy 6 bytes with ASCII, with Unicode now have to account for 12 bytes, a great burden on storage and network transmission process, so the push to give birth to another encodings appear:

  "UTF" (Unicode Transformation Format), i.e. the conversion of Unicode, the purpose is to store and save space during transport.

  UTF-8: Use 1,2,3,4 bytes represent all characters, using a priority character, unable to meet an increase of one byte up to 4 bytes: English 1 byte, 2-byte European languages, East 3 bytes (Chinese), special characters 4 bytes.

  UTF-16: using 2, 4 bytes represent all characters; preferentially uses 2 bytes, or 4 bytes used.

  UTF-32: 4 bytes represent all characters;

  (UTF encoding scheme is to save space in the storage and transportation as a Unicode encoding format design)

Then the whole coding background, we say that it's a python coding:

  Since the beginning of time python2 appear, so we said from the start it:

  Turtle t had time to develop python, estimated the fire did not think it would be here, so it will come as a default ASCII encoding, so the default encoding of python is ASCII.

  

  

  

 

 

   After we enter the s = 'Song Song of First Instance' in python, using print to print, indeed "Song of First Instance song" Yes, but when you call s directly, but there is one hex binary representation byte, we will call the type of bytes (byte type)

  We print its type, is indeed "str", actually python2 in bytes == str, then there python2 a single type, that is unicode, str after decoding will become unicode, and then when you want unicode encoding from encoding when converted into gbk,

  Only need to encode it ok, so we must remember, Unicode is a bridge, two encodings want any conversion, we need to decode ( 'utf-8') which was first converted to unicode, and then after encode () is converted to the desired code.

  

python3 turned out:

  In 2008, python3 turned out, is not compatible python2, turned into a string unicode, the file becomes the default encoding utf-8 codes means that as long as python3 written, no matter what kind of program is the development of coding, any computer They can be displayed.

  This time, str and bytes have been different, that is, str unicode format string, but is pure binary bytes friends.

  

 

   

 

   We can see from this picture, when the string is already unicode type, so it can not decode () a.

  The bytes have been merely represent a binary file it.

 

Guess you like

Origin www.cnblogs.com/ss-py/p/11742448.html