First, the character encoding
1. What is Character Encoding: The people can recognize characters into 01 binary process computer can recognize is the character encoding is character encoding conversion rule table
2. The common coding table: ASCII, GBK, Unicode, UTF -8
understand: the history of the coding table
1. ascii (ASCII): letters, numbers, symbols and computer correspondence between the 01 identifiers
thinking: how 128 characters label with 01 fully
binary: 11111111 => 255 => 1bytes (1 bytes) => 8 bits
2. China: Chinese Character and computer correspondence between the 01 identifiers: gb2312 => GBK (***) => GB18030
Japan: Shift_JIS
Korea: Euc-kr
3. Unicode unicode
advantages: unified with 2Bytes represent all of the characters (any country can use)
Example: a 0000 0000 0010 1010
Cons:
1. wasting storage space
increases 2.io frequency and efficiency program to reduce (fatal)
Now the computer
memory is unicode
hard-8 are UTF
(need to know)
two characteristics of unicode
1. compatible Palais character
2. Other countries encoded data from the hard disk into memory when unicode encoded with various other countries have corresponding relationship
(must master)
data from the memory to the hard drive stored
binary data binary digital >>> Unicode encoding (encode) 1. memory >>> utf-8 format
data from the hard disk in the hard disk into memory
1. binary data in the hard disk utf-8 format >>> decoding (decode) >>> binary data memory unicode format
python2
(py file in accordance with the text file is read into interpreter) default ASCII code (as explained further development python2 unicode time is not yet prevalent)
to python3
(py file in accordance with the text file is read into interpreter) default utf-8
# Ponder: unicode and utf-8 What is the relationship
unicode: use two bytes to store characters, use two bytes to store letters, occupy more space, to read high efficiency
utf-8: with 3-6 bytes to store characters, one byte to store letters, less occupied space, low efficiency of reading
summarized: data memory is stored in unicode press, using a hard disk and cpu utf-8 to access the data
# "abc you good "
unicode and utf-8 uses an encoding table unicode, utf-8 is unicode coding table reflecting patterns, variable-length data is stored.
The advantage of large amounts of data becomes long :( English are present, so less space utf-8) and faster
3. Operation encoding: encoding encode (), decode decode ()
S = '123 Oh'
N_B bytes = (S , encoding = 'UTF-. 8')
Print (N_B)
B = b'123 \ xe5 \ X91 \ XB5 \ xe5 \ X91 \ XB5 '
n_s = STR (B, encoding =' GBK ')
Print (n_s)
in the original clear when the string or binary format is:
# u string encoded into the string b
print (u 'Hello' .encode ( 'UTF-. 8'))
# u decoded string to string b
print (b '\ xe4 \ xbd \ xa0 \ xe5 \ xa5 \ xbd '.
Header:
# Coding: UTF-. 8
1. because all encodings support English characters, the file header to be able to properly enter into force
when the stored data may python2
x = 'on' u # add u to the front in the data stored in unicode
Print of the type (the X-)
Print the X-
based software python2 interpreter development, as long as the Chinese, in front of all need to add a u
order is to python2 (when you do not specify the file header, the default data in ASCII store, if you specify the file header then store the file according to the specified format header)
to python3 the unicode string default binary number format is
added:
1.pycharm terminal using a utf-8 format
2.windows terminal is used gbk
Garbled: characters do not display properly coding inconsistencies
(******)
to ensure that the key is not garbled:
text file to compile what encoding on what codec
Bit binary also called 8bit
8bit = 1bytes
1024 Bytes = 1KB
1024KB = 1MB
1024MB = 1GB
1024GB = 1TB
1024TB 1PB =
Second, file operations
1. What is a file?
Operating system provides a simple user interface to operate complex hardware (hard disk) of
2. Why manipulate files?
Person or application needs to permanently save data
3. How?
By python-source operating documents
unescaping r
f.open
r'D: \ day 06 \ db.txt ' , encoding =' utf-8 '
file should be opened
r read-only mode
w write-only mode is
a mode append
operation manner in files
t t need to specify the text file in the use of If you do not specify a default encoding parameter is the default operating system on
b binary operations must not specify the encoding parameter
1. The application must be in order to operate the computer hardware via the operating system operating profile
f = open (r'D: \ day 06 \ db.txt ', encoding =' utf-8 ') # an open request to the operating system file
print (f) # f is a file object
print (f.read ()) # windows operating system default encoding is GBK
f.read () # want to send a request to read an operating system file content
f.close () # tell the operating system to close the open file
Print (reached, f.read ())
2. file context operations
with open (r'D: \ day 06 \ db.txt ', encoding =' utf-8 ') as f: # f only a variable name
Print (f)
Print (f.read ())
the MODE parameter can not write do not write the words rt default is read-only text file that default is not to write t t
with Open (r'D: \ Day 06 \ db. TXT ', MODE =' R & lt ', encoding =' UTF-. 8 ') AS F:
Print (f.readable ()) # is readable
print (f.writable ()) # is writable
print (f.read ( )) #-time all the contents of the file
with open (r'D:\day 06\1.jpg', mode='rb') as f:
print (f.readable ()) # is readable
print (f.writable ()) # is writable
print (f.read ()) # contents of the file at one time all the
3.r mode and then opening the file If the file is not directly given (*****)
file can be written relative path Road King Note that the file must be performed at the same level document file
with open (r'db.txt ',' r ', encoding =' utf -8 ') AS F:
Print (f.readable ()) # is readable
print (f.writable ()) # is writable
Print (' >>>>>:. 1 ')
Print (reached, f.read () ) # disposable all the contents of the file
Print ( '>>>>>: 2')
Print (reached, f.read ()) # reading a file after the cursor has been then the end of the file, there is no read content read
print (f.readlines ()) # returns a list of a list element is the corresponding text line by line content
for i in f:
Print (f)
Print (f.readline ()) # only read the file line of
print (f.readline ())
Print (f.readline ())
Print (f.readline ())
4.w mode: must be used with caution
in the case 1. The file does not exist, the file is automatically created
2. When the presence of the first file will clear the file SUMMARY rewritten
with Open (r'jason.txt ',' W ', encoding =' UTF-. 8 ') AS F:
Print (f.readable ()) # is readable
print (f.writable ()) # is writable
print (f.write ( 'chicken Columbia Zhangjiang first Marshal force \ n-'))
print (f.write ( 'Hong Qiao chicken brother first Marshal force \ n-'))
print (f.write ( ' Marshal chicken brother Shanghai first force \ n-'))
L = [' er chicken Marshal force of the brother Zhangjiang \ n ',' Hong Qiao chicken brother first Marshal force \ n ',' first Shanghai chicken brother Marshal force \ n-']
Print (f.writelines (L))
5.a mode
1. With the file does not exist, it is automatically created
2. when the presence of the cursor is not empty document file contents, the file will be moved into end of file
with open (r'jason.txt ',' a ', encoding =' utf-8 ') as f:
print (f.readable ()) # is readable
print (f.writable ()) # is writable
f.write ( 'chicken brother said: I was forced Marshal \ n')
4. Cursor
1. How to use the cursor: the cursor associated method
2. The cursor associated read and write operations
according to the cursor position in the removal of a specified portion of the large content of the bytes in the file
seek (offset, offset)
offset to bytes
offset:
0 - the beginning of the file
1-- current position
2 - end of file