Python I / O operations Comments

Read and write files on the disk functions are provided by the operating system, modern operating system does not allow normal operation of the program directly to disk, read and write files is requesting the operating system to open a file object (usually called a file descriptor), then, the interface provided by the operating system reads data (file read) from this file object, or to write data to the file object (write file).

1. File read

read():

1, reads the entire file, returns a string, the string including all the content file.
2, each row of data separated To To, i.e., each row of data needs to operate, this method is not valid.
3, a one-time read the contents of the file into memory, if enough memory available to use this method.

try:
    f = open('file.txt', 'r')  # 标示符'r'表示读
    print(f.read())
finally:
    if f:
        f.close()  # 文件使用完毕后必须关闭,因为文件对象会占用操作系统的资源
        
## 输出结果
# hello world
# 浣犲ソ 涓栫晫\n(乱码,原文为‘你好 世界’)

There is a problem, when encoded using stored my files are utf-8, but Chinese output in python is garbled. After the file.txt file encoding changed to ASCII, not garbled. Why is this?

This is because the Python code reading ASCII file using the default, and is used in the source file file.txt utf-8 encoded, encoding distortion due to inconsistent. You can specify the encoding to solve the garbage problem in time to open the file.

f = open('file.txt', 'r', encoding='utf-8') 

But every so realistic in too cumbersome, so, Python introduced with statement to automatically help us call the close () method:

with open('file.txt', 'r', encoding='utf-8') as f:
    print(f.read())

This is the front of the try ... finally is the same, but the code is simple better, and not have to call f.close () method.

readline():

1, each read the next line of the file.
2, each row of data may be separated.
3, the main usage scenario is when there is insufficient memory, use the readline () can read each line of data, requires very little memory.

# 下面是readline()方法的使用,“r”表示read
with open('file.txt', 'r', encoding='UTF-8') as f:
    line = f.readline()    # 读取第一行
    while line is not None and line != '':
        print(line.strip())
        line = f.readline()    # 读取下一行
readlines():

1, a one-time read all the lines in the file.
2, each row of data may be isolated, it can be seen from the code, if desired for each row of data is processed, can the readlines () traverse the results obtained.
3, a one-time into memory, if enough memory available to use this method.

If the file is small, read () reads a one-time most convenient; if you can not determine the file size, repeatedly calling read (size) safer; if it is the configuration file, call readlines () the most convenient:

with open('file.txt', 'r', encoding='utf-8') as f:  # 标示符'r'表示读
	for line in f.readlines(): # f.readlines() : ['hello world\n', '你好 世界\\n \n']
	    print(line.strip()) # 把末尾的'\n'删掉

If two successive calls readlines (), as shown in the following code:

f = open('file.txt', 'r', encoding='utf-8')  # 标示符'r'表示读
print(f.readlines())
print(f.readlines())

## 结果
# ['hello world\n', '你好 世界\\n ']
# []

Reason: The file is an iterable, first readlines () executed, returns a list of strings to the file, when readlines () statement is executed, 指针指向后列表的最后一行. So you get down again readlines () is an empty list, eventually leading to not read things to come. So pay special attention to the read (), write (), readlines () These methods will move the file pointer.

2. File writing

with open('/Users/michael/test.txt', 'w') as f:
    f.write('Hello, world!')

To write a specific encoded text file, to open () function passing the encoding parameter, automatically convert a string to specify the encoding.

Careful shoes will find, when writing to the file 'w' mode, if the file already exists, directly covering (corresponding to a deleted file is newly written). If we want to append to the end of the file how to do? You can pass 'a' is added in order to write (the append) mode.

3. Memory read and write

3.1 StringIO

In many cases, the data read and write is not necessarily a file, you can also read and write in memory.

StringIO str read and write the name suggests is in memory. Str should write StringIO, we need to create a StringIO, then, like a file can be written:

from io import StringIO
with open('file.txt', 'r', encoding='utf-8') as f:
    f = StringIO()
    f.write('hello')
    print(f.getvalue()) # getvalue()方法用于获得写入后的str

3.2 BytesIO

StringIO operation can only be str, if you are working with binary data, you need to use BytesIO.

BytesIO achieve read and write in memory bytes, we create a BytesIO, and then writes some bytes:

from io import BytesIO
with open('file.txt', 'r', encoding='utf-8') as f:
    f = BytesIO()
    f.write('中文'.encode('utf-8'))
    f.write('english'.encode('utf-8'))

    print(f.getvalue()) # 结果:b'\xe4\xb8\xad\xe6\x96\x87english'

It should be noted that the writing is not str, but after UTF-8 encoded bytes.

4 manipulating files and directories

import os

print(os.path.abspath('.'))  # 查看当前目录的绝对路径
print(os.path.join('/Users/michael', 'testdir')) # 路径拼接
print(os.path.split('/Users/michael/testdir/file.txt')) # 路径分离
print(os.path.splitext('/path/to/file.txt')) # 获取文件扩展名
os.rename('test.txt', 'test.py') # 文件重命名

## 结果
# E:\CODE\python
# /Users/michael\testdir
# ('/Users/michael/testdir', 'file.txt')
# ('/path/to/file', '.txt')

The sequence of

5.1 What is the sequence of

The program is running, all the variables are in memory, for example, define a dict:

d = dict(name='Bob', age=20, score=88)

Variables can be modified at any time, such as the name changed to 'Bill', but once the program ends, the variable memory occupied by the operating system will be fully recovered. If you do not put the revised 'Bill' stored on the disk, the next re-run the program, and the variable is initialized to 'Bob'.

We 把变量从内存中变成可存储或传输的过程称之为序列化, in Python called picklingin other languages also called serialization, marshalling, flattening, etc., is a meaning.

After serialization, the contents can be serialized is written to disk, or transmitted to other machines via a network.

In turn, 把变量内容从序列化的对象重新读到内存里称之为反序列化that unpickling.

Python pickle module provided to implement serialization.

Use of 5.2 pickle

First, we try to serialize an object and write to the file:

import pickle
d = dict(name='Bob', age=20, score=88)
print(pickle.dumps(d))
# b'\x80\x04\x95$\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04name\x94\x8c\x03Bob\x94\x.'       

pickle.dumps()The method of the 任意对象sequence into a bytes, then, you can put the bytes written to the file. Or with another method of the pickle.dump () is written directly to the target sequence after a file-like Object:

import pickle
d = dict(name='Bob', age=20, score=88)

f = open('dump.txt', 'wb')
pickle.dump(d, f)
f.close()

Look dump.txt written document, the contents of a bunch of garbage, these are saved Python objects inside information.

When the objects we want to read from disk to memory, may read the content of a first bytes, then The pickle.loads () method deserialize the objects, can be directly used the pickle.load () method from a file-like Object directly deserialize the objects. We open another Python command line to deserialize the object just saved:

import pickle
d = dict(name='Bob', age=20, score=88)

f = open('dump.txt', 'wb')
pickle.dump(d, f)
f.close()

f2 = open('dump.txt', 'rb')
d2 = pickle.load(f2)
f2.close()
print(d2)

The contents of variables is back!

Of 这个变量和原来的变量是完全不相干的对象,它们只是内容相同而已course .

Pickle issues and problems specific to serialize all other programming languages, Python is that it can only be used, and may not be compatible version of Python different from each other, therefore, can only be saved unimportant data Pickle, we can not be successfully deserialize it does not matter.

5.3 JSON

If we want to pass objects between different programming languages, it is necessary to serialize objects into a standard format, such as XML, but a better approach is serialized to JSON, because JSON表示出来就是一个字符串,可以被所有语言读取,也可以方便地存储到磁盘或者通过网络传输. JSON is not only a standard format, and faster than XML, and can be read directly in a Web page, it is very convenient.

Subject object is the standard JavaScript language represented in JSON, JSON and Python built-in data types correspond to the following:

JSON type Python type
{} dict
[] list
“string” str
1234.56 int or float
true/false True/False
null None
5.3.1 Python JSON objects go

Python's built-json module provides a very complete conversion of Python objects to JSON format. Let's take a look at how Python objects into a JSON:

import json
d = dict(name='Bob', age=20, score=88)
print(json.dumps(d)) # {"name": "Bob", "age": 20, "score": 88}

dumps () method returns a str, content is the standard JSON. Similar, dump () method writes directly to JSON a file-like Object.

5.3.2 JSON deserializes Python objects

Should deserialize JSON Python objects, with loads()or corresponding load()methods, the former to JSON的字符串反序列化the latter from file-like Object中读取字符串并反序列化:

import json

json_str = '{"age": 20, "score": 88, "name": "Bob"}'
print(json.loads(json_str)) # {'age': 20, 'score': 88, 'name': 'Bob'}

5.3.3 JSON Advanced

The Python dict directly serialize objects of JSON {}, however, many times, we prefer to use the object class represented, such as the Student class definitions, and the sequence of:

import json

class Student(object):
    def __init__(self, name, age, score):
        self.name = name
        self.age = age
        self.score = score

s = Student('Bob', 20, 88)
print(json.dumps(s))

Run the code, to give a relentlessly TypeError:

Traceback (most recent call last):
  ...
TypeError: <__main__.Student object at 0x10603cc50> is not JSON serializable

The cause of the error is not a Student object can be serialized as JSON objects. If even the class instance object can not be serialized to JSON, it is certainly unreasonable! Do not worry, we have a closer look at the list of parameters dumps () method can be found, in addition to the first argument must obj, and dumps () method also provides a lot of optional parameters:

Refer https://docs.python.org/3/library/json.html#json.dumps

These optional parameters is to allow us to customize JSON serialization. The inability of the preceding code sequence into JSON Student class instance, because by default, dumps () method does not know how to become a Student JSON instance of the object {}.

Optional parameters default is to put any object can be turned into a sequence of JSON objects, we only need to write a special function to convert Student, then you can pass in a function:

def student2dict(std):
    return {
        'name': std.name,
        'age': std.age,
        'score': std.score
    }

Thus, Student first examples student2dict () function into dict, then was successfully serialized as JSON:

print(json.dumps(s, default=student2dict))
{"age": 20, "name": "Bob", "score": 88}

However, next time you come across an instance of the Teacher class, still can not be serialized as JSON. We can lazy, the instance of any class becomes dict:

print(json.dumps(s, default=lambda obj: obj.__dict__))

Because usually class的实例都有一个__dict__属性, it is a dict, for instance variable storage. There are a few exceptions, such as the definition of the class __slots__.

By the same token, if we want to deserialize a Student JSON object instance, loads () method first converts a dict object, and then we passed object_hook function is responsible for the dict convert Student instance:

def dict2student(d):
    return Student(d['name'], d['age'], d['score'])
    
## 运行结果如下:

>>> json_str = '{"age": 20, "score": 88, "name": "Bob"}'
>>> print(json.loads(json_str, object_hook=dict2student))
<__main__.Student object at 0x10cd3c190>

Printed is deserialized Student instance object.

reference

  1. https://blog.csdn.net/quiet_girl/article/details/80113591
  2. https://www.liaoxuefeng.com/wiki/1016959663602400/1017623135437088
Published 148 original articles · won praise 136 · Views 250,000 +

Guess you like

Origin blog.csdn.net/DlMmU/article/details/104745137