Built-in module fileinput that is more suitable for reading files than open

One, read from standard input

When your Python script does not pass in any parameters, fileinput will use stdin as the input source by default

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import fileinput

for line in fileinput.input():
    print(line)

The effect is as follows, no matter what you input, the program will automatically read it and print it again, like a repeater.

$ python demo.py 
hello
hello

python
python

Two, open a file separately

To open a file separately, you only need to enter a file name in files

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import fileinput

with fileinput.input(files=('a.txt',)) as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}行: {line}', end='')

Which a.txtreads as follows

hello
world

After execution, the output will be as follows

$ python demo.py
a.txt 第1行: hello
a.txt 第2行: world

It should be noted is that fileinput.input()default mode='r'mode to read the file, if your files are binary, you can use mode='rb'patterns. fileinput has and only these two reading modes.

Three, batch open multiple files

Can also be seen from the above example, I fileinput.inputpass in a function files 参数, it receives a list or tuple contains multiple file names, passing one is reading a file, it is to read the incoming pieces of multiple files .

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import fileinput

with fileinput.input(files=('a.txt', 'b.txt')) as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}行: {line}', end='')

a.txtAnd b.txtthe contents are

$ cat a.txt
hello
world
$ cat b.txt
hello
python

After running the following output due a.txtand b.txtcontent are integrated into a file object file, and therefore fileinput.lineno()only when reading a file, the original file is the real line number.

$ python demo.py
a.txt 第1行: hello
a.txt 第2行: world
b.txt 第3行: hello
b.txt 第4行: python

If you want to read when multiple files, can read numbers really implement the original file, you can use fileinput.filelineno()the method

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import fileinput

with fileinput.input(files=('a.txt', 'b.txt')) as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.filelineno()}行: {line}', end='')

After running, the output is as follows

$ python demo.py
a.txt 第1行: hello
a.txt 第2行: world
b.txt 第1行: hello
b.txt 第2行: python

This usage and glob 模块simply a perfect match

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import fileinput
import glob

for line in fileinput.input(glob.glob("*.txt")):
    if fileinput.isfirstline():
        print('-'*20, f'Reading {fileinput.filename()}...', '-'*20)
    print(str(fileinput.lineno()) + ': ' + line.upper(), end="")

The running effect is as follows

$ python demo.py
-------------------- Reading b.txt... --------------------
1: HELLO
2: PYTHON
-------------------- Reading a.txt... --------------------
3: HELLO
4: WORLD

Fourth, back up files while reading

fileinput.inputThere is one backup 参数, you can specify the suffix of the backup, such as.bak

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import fileinput


with fileinput.input(files=("a.txt",), backup=".bak") as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}行: {line}', end='')

The results run as follows, it will be more of a a.txt.bakfile

$ ls a.txt*
a.txt

$ python demo.py
a.txt 第1行: hello
a.txt 第2行: world

$ ls a.txt*
a.txt  a.txt.bak

Five, standard output redirection replacement

fileinput.inputThere is one inplace 参数, which indicates whether to write the result of standard output back to the file, which is not replaced by default

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import fileinput

with fileinput.input(files=("a.txt",), inplace=True) as file:
    print("[INFO] task is started...") 
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}行: {line}', end='') 
    print("[INFO] task is closed...")

After running, you will find that the print content in the for loop will be written back to the original file. The print outside the for loop remains unchanged.

$ cat a.txt
hello
world

$ python demo.py
[INFO] task is started...
[INFO] task is closed...

$ cat a.txt 
a.txt 第1行: hello
a.txt 第2行: world

Using this mechanism, text replacement can be easily achieved.

#!/usr/bin/env python
#-*- coding:utf-8 -*-
#name: demo.py

import sys
import fileinput

for line in fileinput.input(files=('a.txt', ), inplace=True):
    #将Windows/DOS格式下的文本文件转为Linux的文件
    if line[-2:] == "\r\n":  
        line = line + "\n"
    sys.stdout.write(line)

Attachment: How to realize the exchange of DOS and UNIX formats for program testing, just use vim to enter the following commands

DOS转UNIX：:setfileformat=unix
UNIX转DOS：:setfileformat=dos

Six, common methods

If you just want fileinputas an alternative to openreading the file tool, then the contents of the above is sufficient to meet your requirements.

fileinput.filenam()
Returns the file name currently being read. Before the first line is read, None is returned.
fileinput.fileno()
Returns the "file descriptor" of the current file represented by an integer. When the file is not opened (between the first line and the file), -1 is returned.
fileinput.lineno()
Returns the cumulative line number that has been read. Before the first line is read, 0 is returned. After the last line of the last file is read, the line number of the line is returned.
fileinput.filelineno()
Returns the line number in the current file. Before the first line is read, 0 is returned. After the last line of the last file is read, the line number of the line in this file is returned.

But if you want to do some more complex logic based on fileinput, you may need to use the following methods

fileinput.isfirstline()
If the line just read is the first line of the file, it returns True, otherwise it returns False.
fileinput.isstdin()
It returns True if the last read line is from sys.stdin, otherwise it returns False.
fileinput.nextfile()
Close the current file so that the next iteration will read the first line from the next file (if it exists); lines not read from this file will not be counted in the cumulative line count. The file name will not change until the first line of the next file is read. This function will not take effect until the first line is read; it cannot be used to skip the first file. After the last line of the last file is read, this function will no longer take effect.
fileinput.close()
Close the sequence.

Seven, advanced methods

In the fileinput.input()middle there is a openhook 的参数, it supports user-defined objects pass reading method.
If you do not pass in any hooks, fileinput uses the open function by default.

fileinput built two hooks for you to use for our
1,fileinput.hook_compressed(*filename*, *mode*)
using gzipand bz2module transparently open gzipand bzip2compressed files (by extension '.gz'and '.bz2'to identify). If the file is not an extension '.gz'or '.bz2'file will open in the normal way (ie using the open () and without any decompression operation). Example of use:fi = fileinput.FileInput(openhook=fileinput.hook_compressed)

2、 fileinput.hook_encoded(*encoding*, *errors=None*)

Returns a through open()opening hook each file, using the given encodingand errorsto read the file. Example of use:fi = fileinput.FileInput(openhook=fileinput.hook_encoded("utf-8", "surrogateescape"))

If your own scene is more special, neither of the above two hooks can meet your requirements, you can also customize it.

3. Custom hooks
If I want to use fileinput to read files on the network, I can define hooks like this.

① First use requests to download the file to the local
② Then use open to read it

def online_open(url, mode):
    import requests
    r = requests.get(url) 
    filename = url.split("/")[-1]
    with open(filename,'w') as f1:
        f1.write(r.content.decode("utf-8"))
    f2 = open(filename,'r')
    return f2

③ Pass this function directly to openhook

import fileinput

file_url = 'https://www.csdn.net/robots.txt'
with fileinput.input(files=(file_url,), openhook=online_open) as file:
    for line in file:
        print(line, end="")

④ Print out the robots file of CSDN as expected after running

User-agent: * 
Disallow: /scripts 
Disallow: /public 
Disallow: /css/ 
Disallow: /images/ 
Disallow: /content/ 
Disallow: /ui/ 
Disallow: /js/ 
Disallow: /scripts/ 
Disallow: /article_preview.html* 
Disallow: /tag/
Disallow: /*?*
Disallow: /link/

Sitemap: https://www.csdn.net/sitemap-aggpage-index.xml
Sitemap: https://www.csdn.net/article/sitemap.txt

8. Case

Case 1: Read all lines of a file

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import fileinput
for line in fileinput.input('data.txt'):
  print(line, end="")

Case 2: Read all lines of multiple files

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import fileinput
import glob

for line in fileinput.input(glob.glob("*.txt")):
    if fileinput.isfirstline():
        print('-'*20, f'Reading {fileinput.filename()}...', '-'*20)
    print(str(fileinput.lineno()) + ': ' + line.upper(), end="")

Case 3: Use fileinput to convert CRLF files to LF

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import sys
import fileinput

for line in fileinput.input(files=('a.txt', ), inplace=True):
    #将Windows/DOS格式下的文本文件转为Linux的文件
    if line[-2:] == "\r\n":  
        line = line + "\n"
    sys.stdout.write(line)

Case 4: Cooperate with re to do log analysis: take all rows with dates

#--样本文件--：error.log
aaa
1970-01-01 13:45:30  Error: **** Due to System Disk spacke not enough...
bbb
1970-01-02 10:20:30  Error: **** Due to System Out of Memory...
ccc

#---测试脚本---
#!/usr/bin/env python
#-*- coding:utf-8 -*-

import re
import fileinput
import sys

pattern = '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'

for line in fileinput.input('error.log',backup='.bak',inplace=1):
    if re.search(pattern,line):
        sys.stdout.write("=> ")
        sys.stdout.write(line)

#---测试结果---
=> 1970-01-01 13:45:30  Error: **** Due to System Disk spacke not enough...
=> 1970-01-02 10:20:30  Error: **** Due to System Out of Memory...

Case 5: Using fileinput to achieve a function similar to grep

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import sys
import re
import fileinput

pattern= re.compile(sys.argv[1])
for line in fileinput.input(sys.argv[2]):
    if pattern.match(line):
        print(fileinput.filename(), fileinput.filelineno(), line)

$ ./demo.py import.*re *.py
#查找所有py文件中，含import re字样的
addressBook.py  2   import re
addressBook1.py 10  import re
addressBook2.py 18  import re
test.py         238 import re