7.5 character encoding and file handling

Character Encoding

Character encoding is the character for the word, which means there is no need to consider other file video, audio, etc., only to do with character encoding text files.

Input and output text editor is actually two processes:

The first procedure: when we operate the computer, enter the characters that we can understand, but the computer can not read, so the computer only recognizes 10100 binary electrical signals, then the character we've entered will be converted * * into a binary number.

Here is the need to use conversion tool character code table:

1.ASCII code table

This is the earliest character encoding table, American invention that represents a character, such as 0000 0000, 1111 1111, 1111 1111 is also the largest with eight binary, so all English characters + sign at most about 125

A binary = 1bit, 8bit = 1Bytes

2.GBK

Character code table invented by the Chinese, with 2Bytes represent a Chinese character, or use 1Bytes represent an English character, 1,111,111,111,111,111 it can represent up to 65,535 characters (the Chinese culture is profound)

3. Japan: Shift_JIS Korea: EUC-KR ......

In summary, any country to get a computer to support their language must create a corresponding relationship between the characters and numbers themselves, which is the character encoding table

4.Unicode: Unicode

This time the urgent need for a standard world (the world can contain language) character encoding table, then came into Unicode, a uniform representation of all the characters 2Bytes.

advantage:

1. When users enter, no matter what the character input can be identified Unicode

2. Other countries encoded data from the hard disk into memory when, unicode encoded with various other countries have a corresponding relationship

Disadvantages:

This time the garbage problem is gone, all the documents we can use, but a new problem has arisen, if our documents are in English throughout your space-consuming than twice as many ascii unicode will be used in the storage and transmission of very inefficient. So it appeared UTF-8

5.UTF-8

It will unicode characters from the original English 2Bytes become 1Bytes, the Chinese unicode characters from the original 2Bytes become 3Bytes

6. It is important knowledge:

Now computer

  Memory is Unicode

  Hard drives are UTF-8

1. The data stored by the memory to the hard disk

Process: unicode format binary data encoded binary digital >>>>> (encode) memory >>>>> utf-8 format

2. The data read from the hard disk of the hard disk memory

Process: the binary data in the hard utf-8 format >>>>> decoding (decode) >>>>> memory unicode format, binary data

3. Ensure that no garbled

What encoded text files compiled on what coding solution!

7. header

In the python2

  Py file according to the text file is read into interpreter default ASCII code (because no prevalent in developing python2 interpreter unicode)

In the python3

  Py file read into interpreter according to a text file using the default utf-8

coding:

For python interpreter does not appear decoding problem, .py files need to add a header, such as: coding: utf-8  

As is the data python interpreter in accordance with the stored encoding format of the data file header (such as python2 in when you do not specify a file header, the default data in ASCII store, if you specify the file header storage then follow the file header encoding format data, python3 the default encoding format is a binary number of unicode)

supplement:

pycharm terminal using a utf-8, windows terminal using a gbk

8.endcode和decode

= X ' on ' 
RES1 = x.encode ( ' utf-8 ' )   # unicode-coded to storage and transmission of binary data utf-8 
Print (RES1)   # B '\ XE4 \ XB8 \ x8a' representative of bytes B type byte string type you can put it as binary data 
RES2 = res1.decode ( ' utf-8 ' )   # binary data in the hard utf-8 format decoded into binary data unicode format 
Print (RES2)

File Handling

1. What is a file?

Operating system provides a simple interface to a user operation complicated hardware (hard disk) of

2. Why manipulate files?

People or applications need to save the data permanently

3. How?

with open(r'F:\python\day07\代码\day07\a.txt',encoding='utf-8') as f,\
        Open (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ b.txt ' , encoding = ' UTF-. 8 ' ) AS F1:
 # F just a variable name, you think of it as a remote control, R & lt role is to cancel the escape, followed by the path you want to open the file, encoding = on behalf of what the file format decoding 
    Print (f)
     Print (f.read ())   # read the file contents 
    Print (f1)
     Print (f1.read ())   # read the file contents

4. The file should be opened

File operations unit way

  t t you need to specify the text file encoding parameter when in use if you do not know the default operating system's default encoding

  b binary encoding parameters must not be specified

1. r: read-only mode

Open with (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ a.txt ' , MODE = ' RT ' , encoding = ' UTF-. 8 ' ) AS F:
     Print (f.readable ())   # is readable 
    Print (f.writable ())   # is writable 
    Print ( " >>> 1: " )
     Print (f.read ())   # -time all contents of the file read 
    Print ( ' >>> 2: ' )
     Print (f.  the Read ()) # cursor after the file has been read once at the end of the file, read no readable content of 
# the MODE parameter can not write, do not write it defaults to rt, read-only text file, need to bring back decoding format

If it is a picture format

Open with (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ 1.jpeg ' , ' RB ' ) AS F:   # MODE key can not write 
    Print (f.readable ())   # is readable 
    Print (F .writable ())   # is writable 
    Print (f.read ())   # -time read all the contents of the file 
# the mODE parameter can also write rb, pictures and other types can be read, read out a binary mode, certain You can not specify encoding parameters

redline和redlines

Open with (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ a.txt ' , encoding = ' UTF-. 8 ' ) AS F:   # MODE key can not write 
    Print (f.readable ())   # whether reading 
    Print (f.writable ())   # is writable 
    Print (f.readline ())   # only read the contents of files row 
    Print (f.readlines ())   # returns a list a list of corresponding elements is the file line by line content 
    for i in f:   # f can be for each loop for loop in order to read a single line 
        Print (i)   # this method can solve a one-time reading large files take up too much memory problems

r mode when opening the file if the file does not exist, an error directly

File Path can write a relative path, but note that the file must be executable file in the same layer file

2. w: write-only mode, must be used with caution

(1) When the case file does not exist, the file is automatically created

(2) when the presence of a file, first empty the contents of the file and then write

Open with (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ C.txt ' , ' W ' , encoding = ' UTF-. 8 ' ) AS F:   # MODE key can not write 
    Print (f.readable () )   # is readable 
    Print (f.writable ())   # is writable 
    f.write ( ' the first row \ n- ' )
    f.write ( ' second row \ n- ' )

(3)writelines

Open with (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ C.txt ' , ' W ' , encoding = ' UTF-. 8 ' ) AS F:   # MODE key can not write 
    L = [ ' first line \ n- ' , ' second line \ n- ' , ' third line \ n- ' ]
    f.writelines(l)

Equivalent to

Open with (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ C.txt ' , ' W ' , encoding = ' UTF-. 8 ' ) AS F:   # MODE key can not write 
    L = [ ' first line \ n- ' , ' second line \ n- ' , ' third line \ n- ' ]
     for I in L:
        f.write(i)

3. a: additional write mode

(1) When the case file does not exist, the file is automatically created

Finally, (2) when the presence of a file, the file contents are not emptied, the cursor will move the file

Open with (R & lt ' F.: \ Python \ day07 \ Code \ day07 \ C.txt ' , ' A ' , encoding = ' UTF-. 8 ' ) AS F:   # MODE key can not write 
    Print (f.readable () )   # is readable 
    Print (f.writable ())   # is writable 
    f.write ( ' fourth line \ n- ' )

Guess you like

Origin www.cnblogs.com/francis1/p/11140999.html
Recommended