7-5 character encoding and document processing

A. Character Encoding

1.1 What is a character encoding

The computer is based on the electrical work, in other words the computer only recognizes a binary number (0, 1), so that the computer can understand human language, that is, how to achieve digital character to the process, how a character corresponding to a specific number the standard, which is called character encoding.

1.2 History of character encoding

1.ASCII code table

Expressed by 8-bit binary numbers (1bytes) an English character, the number of up to 256 (0-255 / 00000000-1111 1111)

2. States to develop their own code table

In order to meet the needs of Chinese and English, the Chinese developed GBK

GBK: 2Bytes represent a Chinese character, 1Bytes represent an English character

Japan enacted the shift_JIS

Korean developed Euc_kr

3. uniform standards

unicode: a unified representation of characters with two Bytes

unicode disadvantages:

1. waste of storage space

2.io number increased operating efficiency program to reduce (fatal)

unicode advantages:

1. compatible character nations

2. Other countries encoded data from the hard disk into memory when the Unicode encoding other countries have a corresponding relationship

For the full article in English speaking, Unicode style more than doubled storage space, so too wasteful, so then appeared utf-8. In utf-8 in the English characters are 1Bytes said of Chinese characters using 3Bytes representation.

status quo:

Memory using Unicode

Hard disk using utf-8

4. encoding and decoding process

Coding (encode):

Data stored by the memory to the hard disk

Binary data format unicode >>>> binary coding (encode) 1. memory >>>>> utf-8 format

Data from the hard disk to the hard disk memory

1. hard binary data in utf-8 format >>>> decoding (decode) >>>>> binary data in memory in Unicode

ps:. 1 encoding solution on how to ensure what with what encoding the file is not written is garbled

        2. The difference of about python2 and python3

  In accordance with the text file pytho2 read into interpreter default ascii code (not prevalent Unicode)

  In accordance with the python3 interpreter reads the text file is used by default utf-8

Header: # condong :( character encoding) eg: (# conding: utf-8)

Written in the beginning of the file, to allow the interpreter to interpret the character encoding specified file

= X ' on ' 
RES1 = x.encode ( ' UTF-8 ' ) # unicode coded to the storage and transmission utf- binary data of 8
print(res1)  # b'\xe4\xb8\x8a'
# Bytes type byte string type you can put it as a binary data
RES2 = res1.decode ( ' UTF-8 ' ) of the hard disk # utf- binary data format decoder 8 into binary data format unicode
print(res2)

II. File Handling

2.1 What is a file

Simple interface to the operating system is exposed to the complexity of our operating hardware (hard disk) of

2.2 Why manipulate files

Person or application you want to permanently store data

2.3 How to file

f=open()

f.read()

f.close()

2.4 How python code file operation

Use the open command such as:

r unescaping
F = Open (R & lt ' D: \ projects the Python \ day07 \ a.txt ' , encoding = ' UTF-. 8 ' ) # open a file, the operating system sends a request to the
# Application in order to operate the computer hardware must be through the introduction of an operating system
print (f) # f is a file object
print (f.read ()) # windows operating system default encoding is gbk python default is UTF - 8
f.read () # retransmission request to the operating system to read the contents of the file
f.close () # tell the operating system to close open files
print(f)
print(f.read())

ps: To open a.txt when the input absolute file path can be a path which is the path name of all files, a relative path may be used, added in the file 'day07' folder, another moment 'day07' file under b.txt folder can be entered directly r'a.txt ', encoding =' utf-8 'to open the file

2.5 context file operation

with open(r'D:\Python项目\day07\a.txt',encoding='utf-8') as f ,\
        Open (R & lt ' D: \ projects the Python \ day07 \ b.txt ' , encoding = ' UTF-. 8 ' ) AS F1:   # F is a variable name you just put it as a remote control 
    Print (F)
     Print ( f.read ())
     Print (f1)
     Print (f1.read ())

2.6 file open mode

t: operating file contents are based on a string as a unit, will automatically help us decode, you must specify the encoding parameters

b: the file operations are in Bytes (binary) as a unit, stored in the hard disk is taken out what to what must not specify the encoding parameter
ps: the file open mode must be used, such as Open and "rt" together, wherein t Mode for text files only, b mode can be used for any file

2.7 file open the way

r: read-only mode

w: write-only mode

a: Append write mode

r modes: read-only mode, if the open cursor jumps to the beginning of the file exists in the file, if the file does not exist will be given

Open with (R & lt ' D: \ Python \ Python practice \ a.txt ' , MODE = ' RT ' , encoding = ' UTF-. 8 ' ) AS F:
     Print (f.readable ())   # is readable True 
     Print ( f.writable ())   # is writable False 
     Print (f.read ())   # -time read all the contents of the file
   

ps: where the mode parameter can not write, do not write the default mode is rt, read-only text file t rt does not write can not write

with open(r'D:\python\python练习\a.txt') as f:
      pass

relative path:

Open with (R & lt ' data type classification .jpg ' , MODE = ' RB ' ) AS F:
     Pass

rb mode:

Open with (R & lt ' C: \ the Users \ Xiaodong \ Desktop \ theme class \ data type classification .jpg ' , MODE = ' RB ' ) AS F:
     Print (f.readable ())   # is readable True 
    Print (F. Writable ())   # is writable False 
    Print (f.read ())   # -time read all the contents of the file
Open with (R & lt ' D: \ Python \ Python practice \ a.txt ' , MODE = ' RT ' , encoding = ' UTF-. 8 ' ) AS F:
     Print (f.readable ())   # is readable 
    Print (F .writable ())   # is writable 
    Print ( " >>> 1: " )
     Print (f.read ())   # -time all contents of the file read 
    Print ( ' >>> 2: ' )
     Print (f. Read ())   # cursor after the file has been read once at the end of the file, read the contents not readable 
    Print (f.readlines ())   #Returns the file line by line is a list of the contents of a list of elements corresponding to the last cursor returns since [] 
    Print (f.readline ())   # only reads the contents of the file a line
 

w mode: write-only mode, open a new file if the file exists, if present, is opened and its contents emptied and then write (caution)

Open with (R & lt ' D: \ Python \ Python practice \ a.txt ' , MODE = ' wt ' , encoding = ' UTF-. 8 ' ) AS F:
     Print (f.readable ())   # is readable 
    Print (F .writable ())   # is writable 
    f.write ( ' learning to make me happy, I love to learn \ the n- ' )
    f.write ( ' learning to make me happy, I love to learn \ the n- ' )
    f.write ( ' learning to make me happy, I love to learn \ the n- ' )
    f.write ( ' learning to make me happy, I love to learn \ the n- ' )
    f.write ( ' learning to make me happy, I love to learn \ the n- ' )
    L = [ ' learn music, I learned \ the n- ' , ' learning to make me happy, I Xi \ the n- ' , ' learn music, I love to learn \ the n- ' ]
    f.writelines (L)   # input multiple rows
  

a mode: Open the file additional write mode if there is, it will automatically create the file, it will open the file does not exist Ruoyi empty the contents and the cursor moves to the last

Open with (R & lt ' D: \ Python \ Python practice \ a.txt ' , MODE = ' A ' , encoding = ' UTF-. 8 ' ) AS F:
    Print (f.readable ())   # is readable False 
    Print ( f.writable ())   # is writable True 
    f.write ( ' I love learning \ the n- ' )

 

Guess you like

Origin www.cnblogs.com/z929chongzi/p/11140410.html