2019-07-05 coding format and file operations

First, the character encoding

  Character encoding for the text, with text about the character encoding only, there is no need to consider other video files, audio, and so on.

  When entered by a human operating a computer can read the characters, but the computer can only recognize 010101 such binary data to be processed, then the correspondence relationship between the characters and numbers using character code table.

  The first is the ASCII table showing an English-bit binary characters, but it can only be expressed in English and some symbols, not suitable for Chinese. So China himself out of a GBK encoding format, represent a Chinese character with 2Bytes. Based on the above derivation, the text of each country is different, then use the code format is not the same, if you want to use each other very easily. So out of a Unicode unicode, unified by 2Bytes represent all characters.

  unicode has two characteristics: 1 because all the characters are represented by character 2Bytes, it will waste a lot of storage space. 2.io number increased operating efficiency program to reduce (this is fatal)

  Later modified for these two characteristics: the presence of the hard disk when unicode encoding format data memory, will follow utf-8 encoded. The unicode characters from the original English 2Bytes become 1Bytes. The Chinese unicode characters from the original 2Bytes become 3Bytes.

  Now computer:

    Memory is unicode

    Hard disk is utf-8

  You need to know:

    unicode two characteristics:

      1. When the user enters, no matter what the character inputs are compatible with the character of nations

      From the hard disk into memory when 2. Other countries unicode encoded data encoded relations with other countries, we have a corresponding.

    Stored data from the memory to the hard disk:

      1. unicode format of binary data in memory >>> encoding (encode) >>> hard disk utf-8 format, binary data

      2. The binary data in the hard disk utf-8 format >>> unicode format, binary data by decoding (decode) >>> memory

    Why should we clearly garbled file: read because the format and the format of the file stored is not the same (******)

 

  There are different points of a python2 and python3:

  python2: py file in accordance with the text file is read into interpreter default ASCII code

  python3: py file according to the text files are read into interpreter default utf-8

Header: # coding: utf-8 write this header file at the beginning of py, will automatically be converted to the encoding format

 

Added: 1.pycharm terminal using a utf-8 format

   2.windows terminal uses gbk

Character Encoding Summary:

  We use the code example:

    x = 'on'

    res1 = x.encode ( 'utf-8') # unicode encoded into the binary data can be stored and transported in a utf-8

    res2 = res1.decode ( 'utf-8') # in the hard disk utf-8 format decoded into binary data format of binary data unicode

 

Second, file operations

    File processing comprises three steps: Open (open) / write (read, write) / closing (Close) 

# Python code by manipulating files
# R unescaping
 F = Open (R & lt ' D: \ projects the Python \ day07 \ a.txt ' , encoding = ' UTF-. 8 ' ) # open a file, the operating system sends a request to the
# Application in order to operate the computer hardware must be through the introduction of an operating system
 print (f) # f is a file object
 print (f.read ()) # windows operating system default encoding is gbk
 f.read () # retransmission request to the operating system to read the contents of the file
 f.close () # tell the operating system to close open files
 

    Open the file with open must remember to write close close

    The above method of operation of this more complex, so the operation of the file context

    with open approach, this usage not write close, you can also open several files simultaneously.

with open(r'D:\Python项目\day07\a.txt',encoding='utf-8') as f ,\
        Open (R & lt ' D: \ projects the Python \ day07 \ b.txt ' , encoding = ' UTF-. 8 ' ) AS F1: F # is a variable name you just put it as a remote controller
    print(f)
    print(f.read())
    print(f1)
    print(f1.read())

    File Open Mode:

      r read-only mode is a default mode

      w write-only mode

      a write append mode

    File operations unit way:

      t text files    by default, you can not write, when manipulate text files, be sure to specify the encoding parameters, if you do not add the default encoding parameter is the default encoding of the operating system (gbk)

      b binary data files (such as video, audio files)      If you do not write this with the encoding parameters

    r read-only mode

      r mode when opening the file, if the file does not exist direct error

      There are three ways to read: read, readline, readlines

        read: read all the files at one time, the data returned is a string format, the disadvantage is if the file is too large, will cover the entire memory

        readline: reading the contents of a file line, the data format of the returned string is

        readlines: reading each line of each row, each row as a string, all data in a large list which

 
# Reads the text file RT 
with Open (R & lt ' D: \ projects the Python \ day07 \ a.txt ' , MODE = ' RT ' , encoding = ' UTF-. 8 ' ) AS F:
      Print (f.readable ())   # is readable 
     Print (f.writable ())   # is writable 
     Print (f.read ())   # -time all contents of the file read 
   for i in f: #f may be one for each loop for loop, it to read single line
      print (i) # this method can solve the one-time read files take up too much memory problems
# Reading binary file RB with Open (R & lt ' D: \ projects the Python \ day07 \ 1.jpeg ' , MODE = ' RB ' ) AS F: Print (f.readable ()) # is readable Print (F. Writable ()) # is writable Print (f.read ()) # -time all contents of the file read Print (f.read ()) # read the file again after the end of the cursor is already in the file, and read it not read the contents of Print (f.readlines ()) # returns a list of rows of contents in a list of elements is the corresponding file

    w write-only mode ( the mode to be used with caution, if the content of the file itself, it will be cleared when written )

        1. When the file does not exist, it will automatically create a file

        2. When the existence of the document, first empty the contents of the file, and then write

        There are three ways write: write / writeline / writelines

Open with (R & lt ' xxx.txt ' , MODE = ' W ' , encoding = ' UTF-. 8 ' ) AS F:
      Print (f.readable ())   # is readable 
     Print (f.writable ())   # whether write 
     f.write ( ' no, no, you do not turn ~ \ the n- ' )
     f.write ( ' No, no, you do not turn ~ \ the n- ' )
     f.write ( ' No, no, you do not turn ~ \ the n- ' )
     f.write ( ' No, no, you do not turn ~ \ r ' )
     f.write ( ' No, no, you do not turn ~ ' )


     L = [ ' no sdffs, sdfs has turned ~ \ n- ' , ' does not sdfsdf, you turn sdfsf ~ \ n- ' , ' no sfad not, you did not turn sa ~ \ n- ' ]
     f.writelines(l)
     # Vertical equivalent 
     for I in L:
         f.write(i)

    a append mode

      1. In the case file does not exist, create a file

      2. When in the presence of a file, do not empty the contents of the file, the file will move the cursor to the end of the file, append the basis of the original

    a pattern unreadable, can only write

Open with (R & lt ' yyy.txt ' , MODE = ' A ' , encoding = ' UTF-. 8 ' ) AS F:
     Print (f.readable ())   # is readable False 
    Print (f.writable ())   # whether writable True 
    f.write ( ' I'm a little tail \ the n- ' )

 

Specific details can refer to: http: //note.youdao.com/noteshare id = 17303322fcbc09a85d9bd5195dec25ae & sub = F9D47851E02B4CB7A341119637EC6BAE?

 

Guess you like

Origin www.cnblogs.com/wangcuican/p/11140055.html
Recommended