python basis of the file read and write

Reading and Writing Files

Is the most common file read and write IO operations, python built-in functions to read and write files, usage, and c are compatible.

Before reading and writing files, we have to look at, read and write files on the disk functions are provided by the operating system, modern operating system does not allow normal operation of the program directly to disk, the file is read and write requests the operating system to open a file object (description file), and then, the interface provided by the operating system reads data (file read) from this file object, or to write data to the file object (write file)

Reading file

To open a file with mode read a file object, using python built-in open () function, passing the file name and identifiers:

>>> f = open('/user/demo/test.txt','r')

Identifier 'r' for read, so that we can successfully open a file

If the file does not exist, open () function will throw an IOError error, and will give detailed error code and message, to tell you the file does not exist.

>>> f=open('/Users/michael/notfound.txt', 'r')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/Users/michael/notfound.txt'

If the file is opened successfully, then call read () method can read the contents of files all at once, python read the contents into memory, the object is represented by a str

>>> f.read()
'hello world!'

The final step calls the close () method to close the file, the file must be closed after use, because the file object will occupy the resources of the operating system and the operating system at the same time the number of files that can be opened is limited.

>>> f.close()

Since the file read and write are likely to produce IOError, once the error, back f.close () will not be called, therefore, in order to ensure that the file execute error can correct close the file, we can use try ... finally be realized.

try:
    f = open('/path/demo/file','r')
    print(f.read())
finally:
    if f:
        f.close()  

But every so realistic in too cumbersome, so, with the introduction of the statement to automatically help us call the close () method:

with open('/path/demo/file', 'r')  as f:
    print(f.read())

This is the front of the try ... finally is the same, but the code is more concise, and do not have to call f.close () method.

Note: Use the read () will read the entire contents of the file one time, if your file is particularly large, for example, there 5G, then your memory to burst, so to be safe, we can repeatedly call read (size) method, each read content up to size bytes, additional calls the readline () can read each line of text in the readlines call () to read all the contents of one time, press List return line, therefore, necessary to decide how to call.

If the file is small, read () reads the most convenient time, if you can not determine the file size, repeatedly calling read (size) safer, if the configuration file, call readlines () the most convenient

for line in f.readlines():
    #把末尾的'\n'删掉
    print(line.strip())

binary file

Speaking in front of the default text file is read, and is UTF-8 encoded text files, to read binary files, such as pictures, video, etc., opened with 'rb' mode file can be:

>>> f = open('/users/demo/test.jpg',"rb")
>>> f.read()
b'\xff\xd8\xff\xe1\x00\x18Exif\x00\x00...' # 十六进制表示的字节

Character Encoding

To read non UTF-8 encoded text files, need to open () function passing the encoding parameter, e.g., read GBK encoded files:

>>> f = open('/user/demo/gbk.txt','r',encoding = 'gbk')
>>> f.read()
'测试'

Some non-standard coding encounters a file, you may encounter UnicodeDecodeError, because in the text file may be mixed with some of the illegal character encoding, in which case, open () function also receives a parameter error, indicate if you encounter coding how to deal with after the error, the easiest way is to simply ignore.

>>> f = open('/users/demo/gbk.txt','r',encoding = 'gbk',errors = 'ignore')

Write file

Write files and read documents are the same, the only difference is the call to open () function, the incoming identifier 'w' or 'wb' represents a written document or writing binary files:

>>> f = open("/users/demo/test.txt",'w')
>>> f.write('hello, world!')
>>> f.close()

You can call repeatedly write () to write to the file, but be sure to call f.close () to close the file.

When we write a file, the operating system often do not immediately write data to disk, but cached into memory, free time and then slowly written, only call close () method, the operating system is not only to ensure the all written data is written to disk, forget the consequences of calling close () is part of the data may only be written to disk, the remaining lost, so use with or statement of insurance:

with open('/users/demo/test.txt', 'w') as f:
    f.write('hello, world')

To write a specific encoded text file, to open () function passing the encoding parameter, the string will automatically converted into the specified code.

When writing files in the 'w' mode, if the file already exists, overwrite (write a new post equivalent to delete the file), if we want to append to the end of the file how to do? You can pass 'a' in append mode write.

with open('/users/demo/test.txt', 'a') as f:
    f.write('hello, world')

StringIO

In many cases, the data read and write is not necessarily a file, you can also read and write in memory.

stirngIO the name suggests is to read and write in memory str

Str should write StringIO, we need to create a StringIO, then, like a file can be written:

from io import StringIO
f = StringIO()
f.write("hello")
f.write("   ")
f.write('world')
#获取写入后的str
print(f.getvalue())

To read StringIO, you can use a str initialization StringIO, then, like reading the same document reads:

from io import StringIO
f = StringIO("Hello\nHi\nGoodBye!")
while True:
    s = f.readline()
    if s == '':
        break
    print(s.trip())
#
Hello!
Hi!
Goodbye!

BytesIO

StringIO operation can only be str, if you are working with binary data, you need to use BytesIO.

BytesIO achieve read and write in memory bytes, we create a BytesIO, and then writes some bytes:

from io import BytesIO
f = BytesIO()
f.write("中文".encode('utf-8'))
print(f.getvalue())

Note: Write not str, but after UTF-8 encoded bytes

StringIO and similar, can be initialized with a BytesIO bytes, and then, like reading the same document reads:

from io import BytesIO
f = BytesIO(b'\xe4\xb8\xad\xe6\x96\x87')
f.read()
b'\xe4\xb8\xad\xe6\x96\x87'

The method of operating StringIO and BytesIO str and bytes in memory, so that read and write files having a consistent interface.

Serialization

The program is running, all the variables are in memory, such as defining a dict

dict1 = {name:"lili",age:18}

When I put the name here to modify, change "leilei", but once the program ends, the variable memory occupied by the operating system will be fully recovered, if not to modify the name saved to disk, the next name and initialization is "lili"

Here we may be stored or transmitted into the variable from memory process is called serialization in python called picking, after serialization, we can put the contents of the sequence written to disk or transmitted through the network to other machines. Conversely, re-read from the contents of the variable sequence of the target memory referred to deserialize, i.e. unpicking.

python pickle module provided to implement serialization.

import pickle
d = dict({name:"lili",age:18})
#pickle.dumps()方法把任意对象序列化成一个bytes,然后就可以把bytes写入文件
print(pickle.dumps(d))

#把序列化后的对象写入文件
f = open("dump.txt",'wb')
#参数一:要写入的对象, 参数二:写入文件的对象
pickle.dumps(d,f)
f.close()

#从文件中读取序列化后的对象
f2 = open("dump.txt","rb")
#pickle.load()反序列化对象
d = pickle.load(f)
f.close()

Note: pickle python can only be used, and different versions of python are not compatible with each other, therefore, can only keep some important data with pickle, so if not successfully deserialize it will not matter.

Json

If we need to pass objects between different programming languages, then we must target sequence into a standardized format, such as xml, but a better approach is to json, json show it because that is a string that can be read by all languages , also convenient to disk or network transmission, JSON not only the standard mode, and the speed is faster than xml, can also be read in the web, is very convenient.

JSON type Python type
{} dict
[] list
“string” str
1234.56 int or float
true/false True/False
null None

The python's dict objects into a json

import json

d = dict({name:"lili",age:19})
#使用json.dumps()方法返回一个str,这个str就是标准的json
print(json.dumps(d))

#把json反序列化为一个python对象
jsonStr = '{name:"lili",age:19}'
print(json.loads(jsonStr))

One class to serialize objects into json

import json

class Student(object):
    def __init__(self, name, age, score):
        self.name = name
        self.age = age
        self.score = score

#将student对象转换为dict        
def student2dict(std):
    return {
        'name': std.name,
        'age': std.age,
        'score': std.score
    }        
s = Student('Bob', 20, 88)
#参数一:要传入的对象  参数二:将对象转为dict的函数
json.dumps(s,default=student2dict)

#将dict转为对象
def dict2student(d):
    return Strdent(d['name'],d['age'],d['score'])
jsonStr ='{"age": 20, "score": 88, "name": "Bob"}'
#json反序列化为一个对象
#参数一:json字符串,参数二:dict转为对象的函数
print(json.loads(jsonStr,object_hook=dict2student))

Read and write csv file

Read csv file

csv file itself is a plain text file, this file format is often used as the format of data exchange between different programs.

Demo:

Requirements: file read 001.scv

Note: You can print directly, then define the list

import csv

def readCsv(path):
    #列表
    infoList = []
    #以只读的形式打开文件
    with open(path, 'r')  as f:
        #读取文件的内容
        allFileInfo = csv.reader(f)
        #将获取到的内容逐行追加到列表中
        for row in allFileInfo:
            infoList.append(row)
    return infoList
path = r"C:\Users\xlg\Desktop\001.csv"
info = readCsv(path)
Write csv file

Demo:

Requirements: write to 002.csv file

import csv

#以写的方式打开文件
def writeCsv(path,data)
    with open(path,'w')  as f:
        writer = csv.writer(f)
        for rowData in data:
            print("rowData =", rowData)
            #按行写入
            writer.writerow(rowData)
path = r"C:\Users\xlg\Desktop\002.csv"
writeCsv(path,[[1,2,3],[4,5,6],[7,8,9]])        
Read pdf file

pip is a python package installation and management tools

Before the code shows, first of all the tools and associated installation pdf

. A Enter the following command in cmd: pip List [action: List all the tools mounted pip]

. B install pdfminer3k, continue to enter the following command: pip install pdfminer3k

c. Code demo

#导入系统库
import sys
import importlib
#对importlib做处理,让其加载sys
importlib.reload(sys)

from pdfminer.pdfparser import PDFParser,pdFDocument
from pdfminer.pdfinterp import PDFResourceManager,PDFPageInterpreter#解释器
from pdfminer.converter import PDFPageAggregator#转换器
from pdfminer.layout import LTTextBoxHorizontal,LAParams #布局

from pdfminer.pdfinterp import PDFTextExtractionNotAllowed #是否允许pdf和text转换

#将path文件中的内容读取到topath文件中
def readPDF(path, toPath):
    #以二进制的形式打开pdf文件
    f = open(path, 'rb')
    #创建一个pdf文档分析器
    parser = PDFParser(f)
    #创建pdf文档
    pdfFile = PDFDocument()
    #获取连接分析器
    parser.set_document(pdfFile)
    #获取文档对象
    pdfFile.initialize()
    #检测文档是否提供txt转换
    if not pdfFile.is_extractable:
        #不允许转换
        raise PDFTextExtractionNotAllowed
    else:
        #解析数据
        #数据管理器
        manger = PDFResourceManger()
        #创建一个PDF设备对象
        laparams = PDFPageAggregator(manager,laparams=laparams)
        #解释器对象
        interpreter = PDFPageInterpreter(manger,device)
        #开始循环处理,每次处理一页
        for page in pdfFile.get_pages():
            interpreter.process_page(page)
            layout = device.get_result()
            for x in layout:
                if(isinstance(x, LTTextBoxHorizontal)):
                    with open(toPath, 'a') as f:
                        str = x.get_text()
                        #print(str)
                        f.writer(str + "\n")
path = r"C:\Users\xlg\Desktop\001.pdf"
toPath = r"C:\Users\xlg\Desktop\001.pdf"
readPDF(path,toPath)

Error Handling

The program is running, if an error occurs, can be agreed in advance an error code, so you can know if there is wrong, and the cause of the error, calling the operating system, the return is very common error codes, such as opening a file function open, returns a file descriptor on success [is an integer], returns error -1, but with error code indicates whether the error is very inconvenient, because the function itself should return to normal results and error code mixed together, so call to those who use a lot of code to determine whether the program error.

Therefore, in high-level languages ​​are usually built in a try ... except ... finally ... error-handling mechanism, python is no exception.

#try的机制
try:
    print("try...")
    r = 10/0
    print('result', r)
except ZeroDivisionError as e:
    print("except:", e)
finally:
    print("finally...")
print("END")   

When we think some of the code may be wrong, you can try to run this code, if executed wrong, then the following code will not continue, but the error-handling code to jump directly to that except statement block implementation of End after except, if there is finally blocks the finally block is executed, so far, is finished.

Can be seen from the input, when an error occurs, subsequent statements print ( 'result', r) is not performed, since the except of capturing the ZeroDivisionError, is thus performed, the last statement is finally executed.

There are many types of errors should, if a different type of error occurs, should be handled by a different except statement block, but in fact python error is class, all error types are inherited from BaseException, so use except

Note that, it not only capture this type of error, also other sub-categories also caught

Use try ... except to catch errors and a huge advantage is that you can call across multiple layers. In other words, we do not have to every place that could go wrong went wrong capture, to capture just the right level of error in it, this way, greatly reducing the try ... except ... finally trouble.

Call Stack

If the error is not caught, it would have been thrown up, finally captured python interpreter, print an error message and then the program exits.

def foo(s):
    return 10/int(s)

def bar(s):
    return foo(s)*2

def main():
    bar('0')
main()

Goes wrong, we must analyze information call stack error, so as to locate the error

Recording error

If you do not catch errors, natural lets python interpreter to print out the error stack, but the program is also ended, since we can catch errors, you can right the wrong stack to print out, then analyze cause of the error, while allowing the program to continue execution .

python built-in logging module can be very easy to record the error message:

import logging

def foo(s)return 10/int(s)
def bar(s)
    return foo(s)*2

def mian():
    try:
        bar('0')
    except Exception as e:
        logging.except(e)
main()
print("END")       

The same is wrong, but the program will continue printing the information, and the normal exit

Throw an error

Because the error is class, capturing a mistake is to capture an instance of the class, so that errors are not produced out of thin air, but intentionally created and thrown, pyhton built-in functions will throw many types of errors, write our own the function can also throw an error.

If you want to throw an error, first of all necessary, you can define a wrong class, choose a good inheritance, then throw an error instance raise statement.

class  FooError(valueError)pass
def foo(s)n = int(s)
    if n == 0:
        raise FooError("invalid value :%s"%s)
    return 10/n
foo('0')

Only define our own types of errors when necessary, if you can choose python existing built-in error types, try using the wrong type of built-in python.

Published 31 original articles · won praise 4 · Views 3512

Guess you like

Origin blog.csdn.net/qq_29074261/article/details/80016832