Encoding and decoding of characters in Python

1 and text byte sequence

  We all know that the string is a sequence of some of the characters constitute a string, then the character what is it? The computer only recognizes a binary thing, then the computer will show why our characters, or is it a letter?

  Since the invention of the computer using the earliest Americans, they show how to solve the English in the computer, they developed a set of criteria: ASCII ((American Standard Code for Information Interchange): American Standard Code for Information Interchange), mainly used to display modern English and other Western European languages. So far a total of 128 characters are defined, from 0-127 binary corresponding to the respective characters, respectively, so that the reality of binary characters, and will be linked to the computer, so that the expression of the character displayed on the computer.

  The computer randomly general, countries have different languages, in each country in order to popularize the use of computers, then how will the language of their respective countries on the computer has become a problem expression. As a result, different countries are developing countries, applicable from character sets, such as for our country, there is gbk "Chinese Internal Code Specification", it is also a binary for each character (word) and computer correspondence. So the question is, different countries use a set of specifications in their respective countries, while cross-border exchanges, garbled text will be displayed. So there will be a unified mechanism Unicode. Unicode (Unicode, Unicode, single), which is set for each language for each character in a unified and unique binary code, in order to meet the cross-language, cross-platform text conversion processing requirements.

1.1 characters and bytes

  Currently used is basically defined by Unicode characters, taken from the object str Python3 of the elements that Unicode characters. Specific character depends on the encoding expression. Coding algorithm is used when the bit between the code and the byte sequences. Use the most is utf-8 encoded using this code, text files can be displayed across platforms. The bit conversion code sequence of bytes to be coded; the code bit byte into a decoding process.

 

   Decoding algorithm using the same encoding algorithm to be used when encoding and decoding, or else there will be garbled phenomenon.

1.2 byte character conversion

  Is a character string of a single element of the ordered sequence of characters can be encoded with the above understanding. The returned string bytes byte sequence in a different character set encoding.

   Byte sequence different character sets mounted decoding returns the string

1 bytes.decode(encoding="utf-8", errors="strict") -> str
2 bytearray.decode(encoding="utf-8", errors="strict") -> str

1.3 bytes与bytearray

  Python built into two basic binary sequence type: Python 3 bytes introduced immutable type and Python 2.6 bytearray increased variable type (array of bytes)

1.3.1 bytes defined

  bytes have the following definitions methods:

Definition Function
bytes() The definition of empty bytes

bytes(int)

Bytes specified bytes, 0 is filled
bytes(iteeable_of_ints) bytes [0,255] int consisting of iterables
bytes(string, encoding[, errors]) Equivalent string.encode ()
bytes(bytes_or_buffer)

immutable copy of bytes_or_buffer copy a byte sequence from the buffer or
a new immutable objects bytes

B defined using the prefix

ASCII characters used only in the form of substantially b'abc9 ';

Hexadecimal representation b "\ x41 \ x61"

 1.3.2 bytes operation

  Str type and similar, are immutable, so many ways are the same. But the bytes of the method, is also input bytes bytes, output. See the following basic operations bytes:

  • b'abcdef'.replace(b'f',b'k')
  • b'abc'.find(b'b')  
  • bytes.fromhex (string) string must be in the form of two hexadecimal characters, such bytes.fromhex ( '6162 09 6a 6b00'), spaces are ignored
  •  'Abc'.encode (). Hex () Returns a string of 16 hexadecimal
  • b'abcdef '[2] Returns the number of bytes corresponding to, int type

 1.3.3 bytearray defined

Definition Function
bytearray() 空bytearray
bytearray(int) The specified byte bytearray, filled with zeros
bytearray(iterable_of_ints) bytearray [0,255] int consisting of iterables
bytearray(string, encoding[, errors]) bytearray approximation string.encode (), but returns the variable object
bytearray(bytes_or_buffer) A new copy of an object from a variable bytearray sequence of bytes or buffer

1.3.4 bytearray operation

  Bytes and the same type of process:

  • bytearray(b'abcdef').replace(b'f',b'k')
  • bytearray(b'abc').find(b'b')
  • bytearray.fromhex('6162 09 6a 6b00')
  • bytearray('abc'.encode()).hex()
  • bytearray (b'abcdef ') [2] Returns the number of bytes corresponding to, int type

  bytearray a byte array corresponding to the same list, but bytearray is stored in the form of a sequence of bytes. It also supports and a list of some of the same operation:

 

Guess you like

Origin www.cnblogs.com/dabric/p/11718428.html