Problems with text files and binary files caused by the file.seek() method

the cause of the problem

There is an explanation of the file.seek() method in the rookie tutorial . Let’s briefly describe the seek() method:

  • The seek(offset, whence) method is used to move the file read pointer to the specified position
  • Parameter offset--the starting offset, that is, the number of bytes representing the offset
  • Parameter whence--optional, the default is 0. 0 means counting from the beginning of the file, 1 means counting from the current position, 2 means counting from the end of the file
  • Return value: The method has no return value

Example

The content of the file runoob.txt is as follows:

1 :www.runoob.com
 2 :www.runoob.com
 3 :www.runoob.com
 4 :www.runoob.com
 5:www.runoob.com

Read the contents of the file in a loop:

#Open file 
fo = open( " runoob.txt " , " r+ " )
 print ( " File name: " , fo.name)

line = fo.readline()
 print ( "The read data is: %s " % line)

#Reset the file read pointer with the beginning 
fo.seek(2, 1 )
line = fo.readline()
 print ( "The read data is: %s " % line)

#Close the file 
fo.close()

Run it and find an error:

D:\Program\python34\python.exe D:/python_workshop/python6/study_file.py
Traceback (most recent call last):
File name: runoob.txt
  File " D:/python_workshop/python6/study_file.py " , line 158, in <module> 
The data read is: 1 :www.runoob.com
    fo.seek(2, 1)

io.UnsupportedOperation: can't do nonzero cur-relative seeks

Process finished with exit code 1

Analyze the reasons

Text files and binary files

This starts with text files and binary files.

In a broad sense (physical sense), binary files include text files, because the computer ultimately stores binary files, so the two are the same thing in a physical sense, but in a narrow sense (logical) sense, the way the two are stored again different.

Text files are also called ASCII files. When this file is stored on the disk, each character (8 bits) corresponds to one byte, which is used to store the corresponding ASCII code, for example:

ASCII code 00110101 00110110 00110111 00111000

               |              |               |             |

     5     6     7     8

A total of 4 bytes are occupied, and the ASCII file can be displayed by characters on the screen.

Binary files are stored in binary codes. For example, the storage method of 5678 is: 00010110 00101110, which only occupies 2 bytes. Although binary files can also be displayed on the screen, their content cannot be read . When the C system processes these files, it does not distinguish between types, and they are regarded as character streams, which are processed by bytes. The start and end of the input and output character streams are only controlled by the program and are not controlled by physical symbols (such as carriage returns). , so this kind of file is also called "streaming file".

difference between reading and writing

When reading a document, python considers 0x1A (26) to be the end of document (EOF), so sometimes when "r" is used to read a binary file, incomplete reading may occur, such as:

The binary file contains the following data arranged from low to high: 7F 32 1A 2F 3D 2C 12 2E 76
If 'r' is used to read, the third byte is read, that is, the end of the file.
If 'rb' is used, follow The binary bits are read, and the read bytes will not be converted into characters, thus avoiding the above errors

When writing a document, write '\n', the windows operating system will implicitly convert '\n' to "\r\n", and then write it to the file; when reading, it will convert "\r \n" is implicitly converted to '\n' and read into the variable, whereas binary files are uninterpreted, processing one character at a time and not converting characters .

r+ and rb+

  • "r+" is to open a file for reading and writing. The file pointer will be placed at the beginning of the file
  • "rb+" opens a file in binary format for reading and writing. The file pointer will be placed at the beginning of the file

The official explanation of python3.4

To change the file object’s position, use f.seek(offset, from_what). The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.

意思是,为了改变文件对象的位置,可以使用f.seek(offset, from_what)方法。最终位置等于参考点位置加上偏移量,参考点是由from_what参数决定的。from_what有三个值,0代表文件的开始位置,1代表使用当前的位置,2代表文件的末尾位置。from_what参数省略时,默认是0。

python官方文档也给了一个实例:

>>> f = open('workfile', 'rb+')                      #我们可以看出,文件workfile是以二进制格式进行读写的
>>> f.write(b'0123456789abcdef')
16
>>> f.seek(5)     # Go to the 6th byte in the file
5
>>> f.read(1)
b'5'
>>> f.seek(-3, 2) # Go to the 3rd byte before the end
13
>>> f.read(1)
b'd'

In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)

在文本文件中,只有在文件开头(即from_what默认是0时)进行偏移是被允许的。不允许在当前所在位置(from_what=1)和文件末尾(from_what=2)时进行偏移。

现在回头看看我们的第一个代码和报错信息,发现我们的读写模式用了"r+",我们的文件是文本文件,在使用fo.seek()时,将from_what的值设置为1,所以报错了。

#打开文件
fo = open("runoob.txt", "r+")
print("文件名为: ", fo.name)

line = fo.readline()
print("读取的数据为: %s" % line)

#重新设置文件读取指针带开头
fo.seek(2, 1)
line = fo.readline()
print("读取的数据为: %s" % line)

#关闭文件
fo.close()
D:\Program\python34\python.exe D:/python_workshop/python6/study_file.py
Traceback (most recent call last):
文件名为:  runoob.txt
  File "D:/python_workshop/python6/study_file.py", line 158, in <module>
读取的数据为: 1:www.runoob.com
    fo.seek(2, 1)

io.UnsupportedOperation: can't do nonzero cur-relative seeks

Process finished with exit code 1

延伸

如果我们把读写模式改为"rb+",以二进制的方式进行读写,那么是不是就可以了,确实可以,但不容忽视的一个小问题是,二进制文件不会对windows下的换行(\r\n, 0x0D 0x0A)进行转化的,我们看到的结果将是:

文件名为:  runoob.txt
读取的数据为: b'1:www.runoob.com\r\n'
读取的数据为: b'www.runoob.com\r\n'

参考

https://blog.csdn.net/timberwolf_2012/article/details/28499615

https://www.zhihu.com/question/19971994

https://docs.python.org/3.4/tutorial/inputoutput.html

http://bbs.fishc.com/thread-60449-1-1.html

https://www.cnblogs.com/kingleft/p/5142469.html

https://www.cnblogs.com/xisheng/p/7636736.html

https://bbs.csdn.net/wap/topics/350127738

https://www.cnblogs.com/pengwangguoyh/articles/3223072.html

https://blog.csdn.net/seu_xuxueqi/article/details/621904

文件读写'r'和'rb'区别--学步园

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324611252&siteId=291194637