Character encoding and file operations

First, the character encoding

1. What is Character Encoding: The people can recognize characters into 01 binary process computer can recognize is the character encoding is character encoding conversion rule table 

2. The common coding table: ASCII, GBK, Unicode, UTF -8
    understand: the history of the coding table

    1. ascii (ASCII): letters, numbers, symbols and computer correspondence between the 01 identifiers
     thinking: how 128 characters label with 01 fully
     binary: 11111111 => 255 => 1bytes (1 bytes) => 8 bits

    2. China: Chinese Character and computer correspondence between the 01 identifiers: gb2312 => GBK (***) => GB18030
     Japan: Shift_JIS
     Korea: Euc-kr

    3. Unicode unicode 
    advantages: unified with 2Bytes represent all of the characters (any country can use)
    Example: a 0000 0000 0010 1010
    Cons:
    1. wasting storage space
    increases 2.io frequency and efficiency program to reduce (fatal)
    Now the computer 
    memory is unicode
    hard-8 are UTF
    (need to know)
    two characteristics of unicode
    1. compatible Palais character
    2. Other countries encoded data from the hard disk into memory when unicode encoded with various other countries have corresponding relationship
    (must master)
    data from the memory to the hard drive stored
    binary data binary digital >>> Unicode encoding (encode) 1. memory >>> utf-8 format
    data from the hard disk in the hard disk into memory
    1. binary data in the hard disk utf-8 format >>> decoding (decode) >>> binary data memory unicode format


    python2
   (py file in accordance with the text file is read into interpreter) default ASCII code (as explained further development python2 unicode time is not yet prevalent)
    to python3
   (py file in accordance with the text file is read into interpreter) default utf-8

    # Ponder: unicode and utf-8 What is the relationship
    unicode: use two bytes to store characters, use two bytes to store letters, occupy more space, to read high efficiency
    utf-8: with 3-6 bytes to store characters, one byte to store letters, less occupied space, low efficiency of reading
    summarized: data memory is stored in unicode press, using a hard disk and cpu utf-8 to access the data
    # "abc you good "

    unicode and utf-8 uses an encoding table unicode, utf-8 is unicode coding table reflecting patterns, variable-length data is stored.
    The advantage of large amounts of data becomes long :( English are present, so less space utf-8) and faster


3. Operation encoding: encoding encode (), decode decode ()
    S = '123 Oh'

    N_B bytes = (S , encoding = 'UTF-. 8')
    Print (N_B)

    B = b'123 \ xe5 \ X91 \ XB5 \ xe5 \ X91 \ XB5 '
    n_s = STR (B, encoding =' GBK ')
    Print (n_s)

    in the original clear when the string or binary format is:

    # u string encoded into the string b

    print (u 'Hello' .encode ( 'UTF-. 8'))
    # u decoded string to string b
    print (b '\ xe4 \ xbd \ xa0 \ xe5 \ xa5 \ xbd '.
    Header: 
    # Coding: UTF-. 8
    1. because all encodings support English characters, the file header to be able to properly enter into force

    when the stored data may python2
    x = 'on' u # add u to the front in the data stored in unicode
    Print of the type (the X-)
    Print the X-
    based software python2 interpreter development, as long as the Chinese, in front of all need to add a u
    order is to python2 (when you do not specify the file header, the default data in ASCII store, if you specify the file header then store the file according to the specified format header)

    to python3 the unicode string default binary number format is

    added:
    1.pycharm terminal using a utf-8 format
    2.windows terminal is used gbk

    Garbled: characters do not display properly coding inconsistencies
    (******)
    to ensure that the key is not garbled:
    text file to compile what encoding on what codec
    Bit binary also called 8bit 
    8bit = 1bytes
    1024 Bytes = 1KB
    1024KB = 1MB
    1024MB = 1GB
    1024GB = 1TB
    1024TB 1PB =


Second, file operations

1. What is a file? 
Operating system provides a simple user interface to operate complex hardware (hard disk) of

2. Why manipulate files?
Person or application needs to permanently save data

3. How?
By python-source operating documents
unescaping r
f.open
  r'D: \ day 06 \ db.txt ' , encoding =' utf-8 ' 
  file should be opened
  r read-only mode
  w write-only mode is
  a mode append
  operation manner in files
  t t need to specify the text file in the use of If you do not specify a default encoding parameter is the default operating system on
  b binary operations must not specify the encoding parameter

  1. The application must be in order to operate the computer hardware via the operating system operating profile
    f = open (r'D: \ day 06 \ db.txt ', encoding =' utf-8 ') # an open request to the operating system file
    print (f) # f is a file object
    print (f.read ()) # windows operating system default encoding is GBK
    f.read () # want to send a request to read an operating system file content
    f.close () # tell the operating system to close the open file
    Print (reached, f.read ())
  2. file context operations
    with open (r'D: \ day 06 \ db.txt ', encoding =' utf-8 ') as f: # f only a variable name
    Print (f)
    Print (f.read ())

    the MODE parameter can not write do not write the words rt default is read-only text file that default is not to write t t
    with Open (r'D: \ Day 06 \ db. TXT ', MODE =' R & lt ', encoding =' UTF-. 8 ') AS F:
    Print (f.readable ()) # is readable
    print (f.writable ()) # is writable
    print (f.read ( )) #-time all the contents of the file

    with open (r'D:\day 06\1.jpg', mode='rb') as f:
    print (f.readable ()) # is readable
    print (f.writable ()) # is writable
    print (f.read ()) # contents of the file at one time all the
    3.r mode and then opening the file If the file is not directly given (*****)
    file can be written relative path Road King Note that the file must be performed at the same level document file
    with open (r'db.txt ',' r ', encoding =' utf -8 ') AS F:
    Print (f.readable ()) # is readable
    print (f.writable ()) # is writable
    Print (' >>>>>:. 1 ')
    Print (reached, f.read () ) # disposable all the contents of the file
    Print ( '>>>>>: 2')
    Print (reached, f.read ()) # reading a file after the cursor has been then the end of the file, there is no read content read
    print (f.readlines ()) # returns a list of a list element is the corresponding text line by line content
    for i in f:
    Print (f)
    Print (f.readline ()) # only read the file line of
    print (f.readline ())
    Print (f.readline ())
    Print (f.readline ())
    4.w mode: must be used with caution
      in the case 1. The file does not exist, the file is automatically created
      2. When the presence of the first file will clear the file SUMMARY rewritten
    with Open (r'jason.txt ',' W ', encoding =' UTF-. 8 ') AS F:
    Print (f.readable ()) # is readable
    print (f.writable ()) # is writable
    print (f.write ( 'chicken Columbia Zhangjiang first Marshal force \ n-'))
    print (f.write ( 'Hong Qiao chicken brother first Marshal force \ n-'))
    print (f.write ( ' Marshal chicken brother Shanghai first force \ n-'))
    L = [' er chicken Marshal force of the brother Zhangjiang \ n ',' Hong Qiao chicken brother first Marshal force \ n ',' first Shanghai chicken brother Marshal force \ n-']
    Print (f.writelines (L))

    5.a mode
      1. With the file does not exist, it is automatically created
      2. when the presence of the cursor is not empty document file contents, the file will be moved into end of file
    with open (r'jason.txt ',' a ', encoding =' utf-8 ') as f:
     print (f.readable ()) # is readable
    print (f.writable ()) # is writable
    f.write ( 'chicken brother said: I was forced Marshal \ n')

  4. Cursor

    1. How to use the cursor: the cursor associated method 
    2. The cursor associated read and write operations
    according to the cursor position in the removal of a specified portion of the large content of the bytes in the file

    seek (offset, offset)
    offset to bytes
    offset:
    0 - the beginning of the file
    1-- current position
    2 - end of file

 

Guess you like

Origin www.cnblogs.com/linxidong/p/11139604.html
Recommended