First, the text data read and write
(1) using the open () function with rt mode reading of the contents of the text file; (T mode is the default text)
(2) performing a write operation, wt mode, to be operated if the file already exists, and clears the contents of its original cover;
(3) additional content at the end of an existing file, use at mode;
(4) only when the file does not exist, the write operation mode by x
(5) query system default text encoding: sys.getdefaultencoding ()
(6) with statements without management context, remember to manually close the file.
(7) identification newline: UNIX: \ n WINDOWS: \ r \ n MAC: \ r. If newline = None, universal newlines mode is enabled. The line feed is converted into a separate \ n characters read, newline \ n will be converted to the default newline current output. If you do not want to translate this behavior, set the newline = '' can be.
(8) is an optional parameter errors string that specifies how to handle the encoding and decoding of error - which can not be used in the binary mode. 'Ignore', 'replace', etc.
Second, the output redirected to a file
with open('./hello.txt','wt',encoding='utf8') as f: print('Hello World!',file=f)
Third, a different delimiter or printing is completed row closings
End parameter is used to print out hex line break in the output mode.
>>> print('ACME', 50, 99, sep=',')
'ACME',50,99
>>> print('ACME', 50, 99, sep=',', end='!!\n')
'ACME',50,99!!
>>> print(*row, sep=',')
Fourth, the reading and writing binary data
Using open () rb or wb mode function can be realized to read or write the binary data.
When the indexing and iterations, a byte string representing the returned byte integer value rather than a string.
(1) on the binary I / O, and C as an array of structures such objects can be used to write directly, without first byte is converted into an object.
It applies to any object that implements a buffer interface.
import array nums = array.array('i',[1,2,3,4,5,6]) with open('./data.bin','wb') as f: f.write(nums)
(2) the binary data is read directly into the bottom of their memory, just use the file object readinto () method may be.
nums = array.array('i',[0,0,0,0,0,0,0,0,0,0]) with open('./data.bin','rb',) as f: f.readinto(nums) # array('i', [1, 2, 3, 4, 5, 6, 0, 0, 0, 0])
readinto () is populated by an existing buffer, rather than allocating a new object and then they return.
def read_into_buffer(filename): buf = bytearray(os.path.getsize(filename)) with open(filename, 'rb') as f: f.readinto(buf)
Six, perform I / O operations on the string
When the need to simulate a regular file, use StringIO and BytesIO class is the most suitable.
>>> import io >>> s = io.StringIO() >>> s.write('Hello World!!!\r\n') >>> print('This is a test',file=s) >>> s.getvalue() Hello World!!! This is a test >>> s = io.StringIO('Hexxxx HHHH') >>> s.read(4) Hexx >>> s = io.BytesIO() >>> s.write(b'Hello World') >>> s.getvalue() b'Hello World'
Seven, reading and writing compressed data file
gzip and bz2 module for processing compressed files. The default mode is binary. Compression level compressleve keyword specified, the default level 9, the highest level of compression.
import gzip,bz2 with gzip.open('./somefile.gz','rt') as f: text = f.read() with bz2.open('./somefile.bz2','rt') as f: text = f.read() with gzip.open('./somefile.gz','wt') as f: f.write(text ) with bz2.open('./somefile.bz2','wt') as f: f.write(text )
Support open files in binary mode superposition operation.
import gzip f = open('somefile.gz', 'rt') with gz.open(f, 'rt') as g: text = g.read()
Eight, fixed size iterate
Recording or fixed size data blocks iteration
from functools import partial RECORD_SIZE = 32 with open('somefile.txt', 'rt') as f: records = iter(partial(f.read, RECORD_SIZE), b'') for r in records: ...
Nine, do the memory mapping for binary files
(1) a first quasi-binary file
size = 100000 with open('data', 'wb') as f: f.seek(size-1) f.write(b'\x00')
(2) a mapping function
import os import mmap def memory_map(filename, access=mmap.ACCESS_WRITE): size = os.path.getsize(filename) fd = os.open(filename, os.O_RDWR) return mmap.mmap(fd, size, access=access)
(3) read and write operations
>>> m = memory_map('data') >>> len(m) 100000 >>> m[0:10] b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> m[0] 0 >>> m[0:11] = b'Heool World!' >>> m.close()
(4) mmap () mmap returned object can also be used as context manager.
with memory_map('data') as m: print(len(m)) print(m[0:10])
(5) read-only access: mmap.ACCESS_READ; only modify data locally, not back to overwrite the original file: mmap.ACCESS_COPY
(6) for memory mapped to a file and does not cause the entire file into memory. That is, the files are not copied to the memory buffer or some kind of array.
XI processing path name
Find information related to the base file name, directory name, absolute paths.
(1) Get the final part of the path:
>>> os.path.basename(path) # path = /User/firefly/Data => 'Data'
(2) Gets the directory name:
>>> os.path.dirname(path)
(3) a combination of the path:
>>> os.path.join('tmp', 'data')
(4) on Unix, Windows, the parameters of the original user ~ ~ or partially replaced with the user's home directory.
>>> os.path.expanduser(path) # path = '~/Data/firefly/Data/data.csv' => '/Users/beazley/...'
(5) split the file extension:
>>> os.path.splitext(path) # ('~/Data/data', '.csv')
XII detects whether a file exists
Detection of a file or directory exists.
>>> os.path.exists('/etc/passwd')
isfile()、isdir()、islink()、realpath()、
Detect file size, or modification date:
getsize()、getmtime()、
XIII, get a list of directory contents
Use os.listdir () function to obtain a list of files in the directory.
import os names = [ name for name in os.listdir('somedir') if os.path.isfile(os.path.join('somedir',name))] names = [ name for name in os.listdir('somedir') if os.path.isdir(os.path.join('somedir',name))] names = [ name for name in os.listdir('somedir') if name.endswith('.py')] import glob pyfiles = glob.glob('somedir/*.py') from fnmatch import fnmatch pyfiles = [name for name in os.listdir('somedir') if fnmatch(name, '*.py')]
Fourth, bypassing the file name encoding
The os.listdir >>> (B '.') Byte is returned as a file name
>>> with open(b'jalapen\xcc\x83o.txt') as f
Sixteen, add or modify the encoding for the open file
(1) was added as a codec binary object that has been opened
import urllib.request import io u = urllib.request.urlopen('http://www.baidu.com') f = io.TextIOWrapper(u, encoding='utf8') text = f.read()
(2) modify the coding sys.stdout
>>> import sys,io >>> sys.stdout.encoding 'UTF-8' >>> sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='latin-1') >>> sys.stdout.encoding 'latin-1'
XVII byte data is written to a text file
(1) The original byte written to the open mode, a text file, simply write the data byte into buffer to the bottom of the file.
>>> sys.stdout.buffer.write(b'Hello\n')
Ninth, create temporary files and directories
(1) create a temporary file
from tempfile import TemporaryFile with TemporaryFile('w+t') as f: f.write('Hello World\n') f.write('Testing\n') f.seek(0) data = f.read()
(2) to save temporary files created
from tempfile import NamedTemporaryFile with NamedTemporaryFile('w+t',delete=False) as f: print('filename is:', f.name)
(3) create a temporary directory
from tempfile import TemporaryDirectory with TemporaryDirectory() as dirname: print('dirname is:', dirname)
XXI serialize Python objects
(1) target = "file
>>> pickle.dump(object, f)
(2) file = "Object
>>> obj = pickle.load(data)
(3) target = "String
>>> s = pickle.dumps(object)
(4) String = "Object
>>> obj = pickle.loads(s)
Certain types of objects that can not be pickle operations. In general it involves some kind of external system status, such as open files, open network connections, threads, processes, stack frames, etc.
But to circumvent these limitations by providing __getstate __ () and __setstate __ () method.
import time import threading class Countdown: def __init__(self,n): self.n = n self.thr = threading.Thread(target=self.run) self.thr.daemon = True self.thr.start() def run(self): while self.n >0: print('T-minus', self.n) self.n -= 1 time.sleep(5) def __getstate__(self): return self.n def __setstate(self,n): self.__init__(n)
test:
>>> import countdown >>> c = countdown.Countdown(30) >>> T-minus 30 ... ... ... >>> f = open('cstate.p', 'wb') >>> import pickle >>> pickle.dump(c, f) >>> f.close()
Exit Python, reload the file
>>> f = open('cstate.p', 'rb') >>> pickle.load(f) <countdown.Countdown object at 0x10069e2d0> T-minus 19 T-minus 18 ...