Python file read() and readline() counter?

Kapish M :

It looks like python keeps track of each run of read() and readline(). It is incremental, by reach run, and in the end, it does not return any value. How to find this counter, and read a specific line at any time?

EDIT: My goal is to read a large file of a few Gb in size, hundreds of thousands of lines. If this an iterator then it is insufficient, I do not want to load the whole file in the memory. How do I jump to a specific line without having to read unnecessary lines?

A text file with just 3 lines.

# cat sample.txt
This is a sample text file. This is line 1
This is line 2
This is line 3

# python
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('sample.txt', 'r')
>>> file.readline()
'This is a sample text file. This is line 1\n'
>>> file.readline()
'This is line 2\n'
>>> file.readline()
'This is line 3\n'
>>> file.readline()
''
>>> file.readline()
''
>>> file.read()
''
>>> file.read(0)
''
>>> file.read()
''
>>>

# python
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('sample.txt', 'r')
>>> file.read()
'This is a sample text file. This is line 1\nThis is line 2\nThis is line 3\n'
>>> file.read()
''
>>> file.readline()
''
>>>
tobias_k :

A file object in Python is an iterator, iterating over the different lines in the file. You can use readlines() to read all the (remaining) lines at once into a list, or read() to read a single or all (remaining) characters in the file (default is all, use a parameter for the number of chars to read), but the default behaviour (if you iterate the file directly) is the same as with readline, i.e. yielding the next line from the file.

You can combine that with enumerate to get another iterator yielding the line number along with each line (the first line having number 0 unless you specify enumerate's start parameter), or to get a specific line:

>>> f = open("test.txt")
>>> lines = enumerate(f)
>>> next(lines)
(0, 'first line\n')
>>> next(lines)
(1, 'second line\n')
>>> next(lines)
(2, 'third line\n')

>>> f = open("test.txt")
>>> lines = enumerate(f)
>>> next(l for i, l in lines if i == 3)
'fourth line\n'

There's also the seek method, which can be used to jump to a specific character in the file, which is useful for "resetting" the file to the first position (alternatively to re-opening it), but does not help much in finding a specific line unless you know the exact length of each line. (see below)

If you want to "read any line in any order" the simplest way is to actually read all the lines into a list using readlines and then accessing items in that list (provided that your file is not too large).

>>> f = open("test.txt")
>>> lines = f.readlines()
>>> lines[3]
'fourth line\n'
>>> lines[1]
'second line\n'

My goal is to read a large file of a few Gb in size, hundreds of thousands of lines.

Since the only way for Python to know where a line ends, and thus where a particular line starts, is to count the number of \n characters it encounters, there's no way around reading the entire file. If the file is very large, and you have to repeatedly read lines out of order, it might make sense to read the file once one line at a time, storing the starting positions of each line in a dictionary. Afterwards, you can use seek to quickly jump to and then read a particular line.

f = open("test.txt")
total = 1
lines = {}
for i, line in enumerate(f):
    lines[i] = total - 1
    total += len(line)
# jump to and read individual lines
f.seek(lines[3])
print(f.readline())
f.seek(lines[0])
print(f.readline())

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=386782&siteId=1