python learning [file reading and writing]

Preface

In the previous article, python learning - [14th bullet] , I learned about the packages and built-in modules in python. I will continue to learn in this article.File reading and writing in python.

Encoding

Before learning to read and write files, we first understand the encoding method in python:
Insert image description here

字节(Byte)是计量单位,表示数据量多少,是计算机信息技术用于计量存储容量的一种计量单位,通常情况下一字节等于八位。
字符(Character)计算机中使用的字母、数字、字和符号,比如'A'、'B'、'$'、'&'等。
一般在英文状态下一个字母或字符占用一个字节,一个汉字用两个字节表示。


ASCII 码中,一个英文字母(不分大小写)为一个字节,一个中文汉字为两个字节。
UTF-8 编码中,一个英文字为一个字节,一个中文为三个字节。
Unicode 编码中,一个英文为一个字节,一个中文为两个字节。
符号:英文标点为一个字节,中文标点为两个字节。例如:英文句号 . 占1个字节的大小,中文句号 。占2个字节的大小。
UTF-16 编码中,一个英文字母字符或一个汉字字符存储都需要 2 个字节(Unicode 扩展区的一些汉字存储需要 4 个字节)。
UTF-32 编码中,世界上任何字符的存储都需要 4 个字节。

The python interpreter uses Unicode (memory), and the python file (.py) uses UTF-8 (external memory) on the disk.

The default encoding used in python3 is UTF-8;

We can use getdefaultencoding() in the sys module to obtain the default encoding format

import sys
print(sys.getdefaultencoding())
# utf-8

In computer memory, Unicode encoding is used uniformly. When it needs to be saved on the hard disk or needs to be transmitted, it will be converted to UTF-8 encoding:

When we use Notepad to edit, the UTF-8 characters read from the file are converted into Unicode characters and stored in memory; after the editing is completed and saved, Unicode is converted to UTF-8 and saved to the file.

Insert image description here
Let’s look at the default encoding type when opening a python file with Notepad and then saving it as:

Insert image description here
The default encoding format saved in Notepad is ANSI. (To be precise, there is no specific encoding method called ANSI. It is just another name on the Windows operating system. On the Chinese simplified Windows operating system, ANSI is GBK. ), if we directly use python to create a new txt text document file, due to the incompatibility of the encoding format, garbled characters will occur:

Insert image description here
Insert image description here
In order to avoid the problem of incompatible encoding formats, we can modify the encoding format of the written data in the code to UTF-8:

f = open("编码.txt", "w",encoding='UTF-8')#将输入数据的编码格式改为UTF-8
f.write("你好!世界!")
f.close()

Insert image description here
Partners who want to know more about the encoding format can refer to this article:​ ​Click to view​​​.

Common file opening modes

According to the organizational form of file data, files are divided into the following two categories:

 文本文件:存储的是普通“字符”文本,默认为Unicode字符集,可以使用记事本程序打开

   二进制文件: 把数据内容用“字节”进行存储,无法用记事本打开,必须使用专用的软件打开,如图片文件(.png  .jpeg等)以及  .doc文档等。 常见的打开模式:

  r  以只读的模式打开文件,文件的指针将会放在文件的开头

  w   以只写的模式打开文件  如果文件不存在就创建,如果文件存在就覆盖文件的原有内容,文件指针在文件的开头

  a   以追加的模式打开文件,如果文件不存在就创建,文件指针在文件开头;如果文件存在,则在文件末尾追加内容,文件指针在源文件末尾

  b 以二进制的方式打开文件  不能单独使用,需要与共它模式一起使用 如 rb(只读的方式打开二进制文件)或者wb(只写的方式打开二进制文件)

  + 以读写的方式打开文件,不能单独使用,需要和其他模式一起使用 比如 a+

It should be noted that when we read and write files, the 代码最后要对文件进行close()关闭操作:close() operation writes the contents of the buffer to the file, closes the file at the same time, and releases the resources related to the file object.
r read only

Open the file in read-only mode

file_r=open('a.txt','r')
print(file_r.read())#读取文件内容
file_r.close() #关闭文件 只要我们打开了文件,在代码结束时就要写上关闭文件的代码,以释放资源。

Speaking of read-only, here are several common methods of file objects. We can use read-only mode to view the functions of each method:

read([size]) reads size bytes or characters from the file and returns the content. If [size] is omitted, it will read from the beginning of the file to the end of the file at once;

readline()  从文本文件中读取第一行内容;

readlines()  把文本文件中每一行都作为独立的字符串对象,并将这些对象放入列表中返回;

seek(offest,[whence])  把文件指针移动到新的位置,offset表示相对于whence的位置;

tell()    返回文件指针的当前位置;

flush()  把缓冲区的内容写入到文件中,但是不关闭文件​

We first read the entire contents of a.py:

Insert image description here
Then read a.txt using the usual method of reading file objects:

# read([size])   从文件中读取size个字节或者是字符的内容的返回,如果省略[size]的话,就会从文件的开头一次性读取到文件的末尾
file_r=open('a.txt','r')
print(file_r.read(5)) #返回5个字节
file_r.close()

print('------------------------\n')

# readline()  读取文本中的第一行内容
file_rl=open('a.txt')
print(file_rl.readline())
file_rl.close()

print('------------------------\n')

# readlines()   把文本文件中每一行都作为独立的字符串对象,并将这些对象放入列表中返回
file_rls=open('a.txt','r')
print(file_rls.readlines())
file_rls.close()

Insert image description here

seek()

Enter the number of bytes in the brackets of seek(), and the file pointer will move back a few bytes from the beginning of the file.

file_seek=open('a.txt','r')
file_seek.seek(2)
print((file_seek.read()))
file_seek.close()  #llo world

a.txt:
Insert image description here
Note: if the number of bytes we need to read must be the correct number of bytes; for example, when reading Chinese characters, if 2 is entered, and one Chinese character in UTF-8 encoding is 3 bytes , will cause the program to report an error.

tell()

Returns the current position of the file pointer

file=open('a.txt','r')
file.seek(2)
print(file.read())
print(file.tell()) #返回指针当前所在的位置
file.close()

Insert image description here

flush()

Write the contents of the buffer to the file, but do not close the file

file=open('b.txt','a')#采用追加的方式打开文件
file.write('hello')
file.flush()  #把缓冲区的内容写入到文件中,但是不关闭文件
file.write('world')
file.close()

But we distinguish flush() from close(), flush()不会关闭该文件,也就是说可以在flush()代码执行之后可以继续往文件里写入数据;but after close() is executed, no more data can be written to the file, otherwise an error will be reported.

Insert image description here

w write only

Open the file in write-only mode. If the file does not exist, create it. If the file exists, overwrite the original content of the file. The file pointer is at the beginning of the file: open
Insert image description here
Insert image description here
a.txt in read-only mode. We find that in the a.txt file The original content is overwritten.

Here we use the writing method of the file object: write(str) writes the content of the string str to the file

There is also a commonly used method of writing file objects:

writelines(s_list) writes the string list s_list to a text file without adding newlines

s_lis1=['hello','world','hello','python']
file_wl=open('a.txt','w')
file_wl.writelines(s_lis1)
file_wl.close()

Insert image description here
What needs to be noted when using this method is that what is written must be a string list. If it is not a string list, a type error will be reported:
Insert image description here

a opens the file in append mode

Open the file in append mode. If the file does not exist, it will be created. The file pointer will be at the beginning of the file. If the file exists, the content will be appended to the end of the file. The file pointer will be at the end of the source file.

file_a=open('a.txt',"a")
file_a.write('python') #写入'python'
file_a.close()

Insert image description here
Insert image description here

b Open binary file

Binary file: The data content is stored in "bytes" and cannot be opened with Notepad. It must be opened with special software, such as picture files (.png, .jpeg, etc.) and .doc documents, etc.

b cannot be used alone and needs to be used together with other modes: such as rb or wb

# 打开读取源文件
src_file=open('befor.png','rb')
# 打开写入目标文件
target_file=open('after.png','wb')
# 将从源文件中读取的文件写入目标文件
target_file.write(src_file.read())
# 将目标文件和源文件关闭
target_file.close()
src_file.close()

Insert image description here

One sentence per article

Don't fantasize when the sun sets, work hard when the sun rises.

If there are any deficiencies, thank you for correcting me!

Guess you like

Origin blog.csdn.net/weixin_64122448/article/details/133106231