[Advanced] Python file and I / O

 

 

First, the text data read and write

  (1) using the open () function with rt mode reading of the contents of the text file; (T mode is the default text)

  (2) performing a write operation, wt mode, to be operated if the file already exists, and clears the contents of its original cover;

  (3) additional content at the end of an existing file, use at mode;

  (4) only when the file does not exist, the write operation mode by x

  (5) query system default text encoding: sys.getdefaultencoding ()

  (6) with statements without management context, remember to manually close the file.

  (7) identification newline: UNIX: \ n WINDOWS: \ r \ n MAC: \ r. If newline = None, universal newlines mode is enabled. The line feed is converted into a separate \ n characters read, newline \ n will be converted to the default newline current output. If you do not want to translate this behavior, set the newline = '' can be.

  (8) is an optional parameter errors string that specifies how to handle the encoding and decoding of error - which can not be used in the binary mode. 'Ignore', 'replace', etc.

 

Second, the output redirected to a file

with open('./hello.txt','wt',encoding='utf8') as f:
    print('Hello World!',file=f)

 

Third, a different delimiter or printing is completed row closings

  End parameter is used to print out hex line break in the output mode.

  >>> print('ACME', 50, 99, sep=',')

  'ACME',50,99

  >>> print('ACME', 50, 99, sep=',', end='!!\n')

  'ACME',50,99!!

  >>> print(*row, sep=',')

  

Fourth, the reading and writing binary data

  Using open () rb or wb mode function can be realized to read or write the binary data.

  When the indexing and iterations, a byte string representing the returned byte integer value rather than a string.

  (1) on the binary I / O, and C as an array of structures such objects can be used to write directly, without first byte is converted into an object.

  It applies to any object that implements a buffer interface.

import array
nums = array.array('i',[1,2,3,4,5,6])
with open('./data.bin','wb') as f:
    f.write(nums)

  (2) the binary data is read directly into the bottom of their memory, just use the file object readinto () method may be.

nums = array.array('i',[0,0,0,0,0,0,0,0,0,0])
with open('./data.bin','rb',) as f:
    f.readinto(nums)
# array('i', [1, 2, 3, 4, 5, 6, 0, 0, 0, 0])

  readinto () is populated by an existing buffer, rather than allocating a new object and then they return.

def read_into_buffer(filename):
    buf = bytearray(os.path.getsize(filename))
    with open(filename, 'rb') as f:
        f.readinto(buf)

 

Six, perform I / O operations on the string

  When the need to simulate a regular file, use StringIO and BytesIO class is the most suitable.

>>> import io
>>> s = io.StringIO()
>>> s.write('Hello World!!!\r\n')
>>> print('This is a test',file=s)
>>> s.getvalue()
Hello World!!!
This is a test

>>> s = io.StringIO('Hexxxx HHHH')
>>> s.read(4)
Hexx

>>> s = io.BytesIO()
>>> s.write(b'Hello World')
>>> s.getvalue()
b'Hello World'

 

Seven, reading and writing compressed data file

  gzip and bz2 module for processing compressed files. The default mode is binary. Compression level compressleve keyword specified, the default level 9, the highest level of compression.

import gzip,bz2
with gzip.open('./somefile.gz','rt') as f:
    text = f.read()

with bz2.open('./somefile.bz2','rt') as f:
    text = f.read()

with gzip.open('./somefile.gz','wt') as f:
    f.write(text )

with bz2.open('./somefile.bz2','wt') as f:
    f.write(text )

  Support open files in binary mode superposition operation.

import gzip
f = open('somefile.gz', 'rt')
with gz.open(f, 'rt') as g:
    text = g.read()

 

Eight, fixed size iterate

  Recording or fixed size data blocks iteration

from functools import partial
RECORD_SIZE = 32
with open('somefile.txt', 'rt') as f:
    records = iter(partial(f.read, RECORD_SIZE), b'')
    for r in records:
        ...

 

Nine, do the memory mapping for binary files

  (1) a first quasi-binary file

size = 100000
with open('data', 'wb') as f:
    f.seek(size-1)
    f.write(b'\x00')

  (2) a mapping function

import os
import mmap

def memory_map(filename, access=mmap.ACCESS_WRITE):
    size = os.path.getsize(filename)
    fd = os.open(filename, os.O_RDWR)
    return mmap.mmap(fd, size, access=access)

  (3) read and write operations

>>> m = memory_map('data')
>>> len(m)
100000
>>> m[0:10]
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> m[0]
0
>>> m[0:11] = b'Heool World!'
>>> m.close()

  (4) mmap () mmap returned object can also be used as context manager.

with memory_map('data') as m:
    print(len(m))
    print(m[0:10])

  (5) read-only access: mmap.ACCESS_READ; only modify data locally, not back to overwrite the original file: mmap.ACCESS_COPY

  (6) for memory mapped to a file and does not cause the entire file into memory. That is, the files are not copied to the memory buffer or some kind of array.

 

XI processing path name

  Find information related to the base file name, directory name, absolute paths.

  (1) Get the final part of the path:

  >>> os.path.basename(path)  #  path = /User/firefly/Data => 'Data'

  (2) Gets the directory name:

  >>> os.path.dirname(path)

  (3) a combination of the path:

  >>> os.path.join('tmp', 'data')

  (4) on Unix, Windows, the parameters of the original user ~ ~ or partially replaced with the user's home directory.

  >>> os.path.expanduser(path)  # path = '~/Data/firefly/Data/data.csv' => '/Users/beazley/...'

  (5) split the file extension:

  >>> os.path.splitext(path)  # ('~/Data/data', '.csv')

 

 XII detects whether a file exists

  Detection of a file or directory exists.

  >>> os.path.exists('/etc/passwd')

  isfile()、isdir()、islink()、realpath()、

  Detect file size, or modification date:

  getsize()、getmtime()、

 

XIII, get a list of directory contents

  Use os.listdir () function to obtain a list of files in the directory.

import os

names = [ name for name in os.listdir('somedir') if os.path.isfile(os.path.join('somedir',name))]

names = [ name for name in os.listdir('somedir') if os.path.isdir(os.path.join('somedir',name))]

names = [ name for name in os.listdir('somedir') if name.endswith('.py')]

import glob
pyfiles = glob.glob('somedir/*.py')

from fnmatch import fnmatch
pyfiles = [name for name in os.listdir('somedir') if fnmatch(name, '*.py')] 

  

Fourth, bypassing the file name encoding

  The os.listdir >>> (B '.')  Byte is returned as a file name

  >>> with open(b'jalapen\xcc\x83o.txt') as f 

 

 Sixteen, add or modify the encoding for the open file

  (1) was added as a codec binary object that has been opened

import urllib.request
import io

u = urllib.request.urlopen('http://www.baidu.com')
f = io.TextIOWrapper(u, encoding='utf8')
text = f.read()

  (2) modify the coding sys.stdout

>>> import sys,io
>>> sys.stdout.encoding
'UTF-8'
>>> sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='latin-1')
>>> sys.stdout.encoding
'latin-1'

 

XVII byte data is written to a text file

  (1) The original byte written to the open mode, a text  file, simply write the data byte into buffer to the bottom of the file.

  >>> sys.stdout.buffer.write(b'Hello\n')

 

Ninth, create temporary files and directories

  (1) create a temporary file

from tempfile import TemporaryFile
with TemporaryFile('w+t') as f:
    f.write('Hello World\n')
    f.write('Testing\n')
    
    f.seek(0)
    data = f.read()

  (2) to save temporary files created

from tempfile import NamedTemporaryFile
with NamedTemporaryFile('w+t',delete=False) as f:
    print('filename is:', f.name)

  (3) create a temporary directory

from tempfile import TemporaryDirectory
with TemporaryDirectory() as dirname:
    print('dirname is:', dirname)

 

 XXI serialize Python objects

  (1) target = "file

  >>> pickle.dump(object, f)

  (2) file = "Object

  >>> obj = pickle.load(data)

  (3) target = "String

  >>> s = pickle.dumps(object)

  (4) String = "Object

   >>> obj = pickle.loads(s)

  Certain types of objects that can not be pickle operations. In general it involves some kind of external system status, such as open files, open network connections, threads, processes, stack frames, etc.

  But to circumvent these limitations by providing __getstate __ () and __setstate __ () method.

import time
import threading

class Countdown:
    
    def __init__(self,n):
        self.n = n
        self.thr = threading.Thread(target=self.run)
        self.thr.daemon = True
        self.thr.start()
        
    def run(self):
        while self.n >0:
            print('T-minus', self.n)
            self.n -= 1
            time.sleep(5)
    
    def __getstate__(self):
        return self.n
    
    def __setstate(self,n):
        self.__init__(n)

  test:

>>> import countdown
>>> c = countdown.Countdown(30)
>>> T-minus 30
...
...
...

>>> f = open('cstate.p', 'wb')
>>> import pickle
>>> pickle.dump(c, f)
>>> f.close()

  Exit Python, reload the file

>>> f = open('cstate.p', 'rb')
>>> pickle.load(f)
<countdown.Countdown object at 0x10069e2d0>
T-minus 19
T-minus 18
...

 

Guess you like

Origin www.cnblogs.com/5poi/p/11512955.html