[Python]第十一章文件

文章目录

11.1打开文件open()
11.2文本的基本方法

11.2.1读取和写入（read\write）
11.2.2使用管道重定向输出
11.2.3读取和写入行
11.2.4关闭文件
11.2.5使用文件的基本方法

11.3迭代文件内容

11.3.1 每次一个字符（或字节）
11.3.2每次一行
11.3.4使用fileinput实现延迟迭代
11.3.5文件迭代器

目前为止，程序与外部的交互很少，且都是通过input和print进行的。让程序能够与更大的外部世界交互：文件和流。本章介绍的函数和对象让你能够永久存储数据以及处理来自其他程序的数据。

11.1打开文件open()

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
#Open file and return a stream.  Raise IOError upon failure.

用函数open，它位于自动导入的模块io中.
file:
打开当前目录的文件

>>> f = open('somefile.txt')

如果文件位于其他地方，可指定完整的路径。（注意路径反斜杠/）

f=open('G:/Anaconda3/text.txt')
type(f)
_io.TextIOWrapper

mode:

值	描述
r	读取模式（默认）
w	写入模式：让你能够写入文件，并在文件不存在时创建它。既有内容将被删除（截断）
x	独占写入模式：更进一步，在文件已存在时引发FileExistsError异常。
a	附加模式：相比较在写入模式w下打开文件时，既有内容将被删除（截断）并从文件开头处开始写入；附加模式可以在既有文件末尾继续写入。
b	二进制模式（与其他模式结合使用）:如果文件包含非文本的二进制数据，如声音剪辑片段或图像，不希望执行自动转换。为此，只需使用二进制模式（如’rb’）来禁用与文本相关的功能。
t	文本模式（默认，与其他模式结合使用）
+	读写模式（与其他模式结合使用）：可与其他任何模式结合起来使用，表示既可读取也可写入。例如，要打开一个文本文件进行读写，可使用’r+’。

默认模式为’rt’，把文件视为经过编码的Unicode文本，将自动执行解码和且默认使用UTF-8编码。
encoding和errors：
要指定其他编码和Unicode错误处理策略，可使用关键字参数encoding和errors。
这还将自动转换换行字符。默认情况下，行以’\n’结尾。读取时将自动替换其他行尾字符（’\r’或’\r\n’）；写入时将’\n’替换为系统的默认行尾字符（os.linesep）.
newline：
通常，Python使用通用换行模式。在这种模式下，后面将讨论的readlines等方法能够识别所有合法的换行符（’\n’、’\r’和’\r\n’）。如果要使用这种模式，同时禁止自动转换，可将关键字参数newline设置为空字符串，如open(name, newline=’’)。如果要指定只将’\r’或’\r\n’视为合法的行尾字符，可将参数newline设置为相应的行尾字符。这样，读取时不会对行尾字符进行转换，但写入时将把’\n’替换为指定的行尾字符。

11.2文本的基本方法

文件对象的一些基本方法以及其他类似于文件的对象（有时称为流）。

11.2.1读取和写入（read\write）

#以写入模式打开:w
>>>f=open('file.txt','w')#文件不存在的时候会创建的
>>>f.write('hello,')#返回写入了多少个字符
6
>>>f.write('world')#只要不是在close之后再write，之前不会覆盖
5
>>>f.close()#这时文本已经被写入,打开文档可以看见写入的内容
text:hello,world

#以读取模式打开:r (默认)
>>>f=open('file.txt')
>>>f.read(4)
'hell'

#以附加模式打开：a
#如果close后想重新写入文档，则还是以’w’模式打开(之前的内容将被清除)
#如果只想在后面追加，则以’a’附加模式打开（之前的内容将被保留）
>>f=open('file.txt','a')
>>>f.write('\n你好')
3
>>>f.close()
text:   hello,world
		你好

#以读写模式打开:r+
>>>f=open('file.txt','r+')
>>>f.write('\n世界')
3
>>>f.read(4)#如果有再次执行该语句，则会从之前独到的地方往下读，而不是从头
'hell'#close之前只能读取现存文档内容
>>>f.close()
text: hello,world
      你好
      世界

Notice:非读模式下无法使用read()，非写模式下无法使用write()

11.2.2使用管道重定向输出

$ cat somefile.txt | python somescript.py | sort 
（Windows中：type somefile.txt | python somescript.py | sort )

管道字符（|）将一个命令的标准输出链接到下一个命令的标准输入。
这条管道线包含三个命令。

cat somefile.txt：将文件somefile.txt的内容写入到标准输出（sys.stdout）。
python somescript.py：执行Python脚本somescript。这个脚本从其标准输入中读取，并将结果写入到标准输出。
sort：读取标准输入（sys.stdin）中的所有文本，将各行按字母顺序排序，并将结果写入到标准输出。

somescript.py从其sys.stdin中读取数据（这些数据是somefile.txt写入的），并将结果写入到其sys.stdout（sort将从这里获取数据）。

# somescript.py 
import sys
text=sys.stdin.read()
wordcount=len(text.split())
print(wordcount,':',text)
#type(sys.stdin)
#_io.TextIOWrapper

# somefile.txt
we are family!

#执行：
>>>type somefile.txt|python somescript.py
#输出：
3 ： we are family!

Notice:python somescript.py somefile.txt命令将会从控制台输入读取内容，因为这个命令并没有指定程序间的链接

扫描二维码关注公众号，回复： 4768754 查看本文章

EXTEND:随机存取：seek()和tell()
seek找到指定位置

>>>em=open('emp.txt','w')
>>>em.write('0123456789')
>>>em.seek(5)#找到要写入的位置
5
>>>em.write('****')#从找到的位置处开始重新写入，而非从后面接着写
>>>em.close()
>>>em=open('emp.txt')
>>>em.read()
'01234****9'

tell返回到达的位置

>>>em=open('emp.txt')
>>>em.read(3)
'012'
>>>em.read(2)
'34'
>>>em.tell()
5

11.2.3读取和写入行

要读取一行（从当前位置到下一个分行符的文本），可使用方法readline。
要将所有行读取，并以列表的形式返回，可以使用方法readlines。

f=open('cookie.txt')
#可以没有传参，默认读取一行
f.readline()
'# Netscape HTTP Cookie File\n'
#传入非负数，表示最多可以读取个字符(接着上面读)
f.readline(5)
'# htt'
#读取文件中的所有行，并以列表的方式返回它们，可使用方法readlines。（接着上面读）
f.readlines()
['p://curl.haxx.se/rfc/cookie_spec.html\n',
 '# This is a generated file!  Do not edit.\n',
 '\n', '.baidu.com\tTRUE\t/\tFALSE\t3682520781\tBAIDUID\tE539BADFFA31FDF49E09921ACDA3EDFE:FG=1\n',
'.baidu.com\tTRUE\t/\tFALSE\t3682520781\tBIDUPSID\tE539BADFFA31FDF49E09921ACDA3EDFE\n',
 '.baidu.com\tTRUE\t/\tFALSE\t\tH_PS_PSSID\t1448_21117_26350_26922_20927\n',
 '.baidu.com\tTRUE\t/\tFALSE\t3682520781\tPSTM\t1535037133\n',
 'www.baidu.com\tFALSE\t/\tFALSE\t\tBDSVRTM\t0\n',
 'www.baidu.com\tFALSE\t/\tFALSE\t\tBD_HOME\t0\n',
 'www.baidu.com\tFALSE\t/\tFALSE\t2481117075\tdelPer\t0\n']

read():读取整个文档，以字符串格式输出
readline():读取单行文档，以字符串格式输出
readlines():读取文档所有行，每行为一个元素，以列表格式输出

方法writelines与readlines相反：接受一个字符串列表（实际上，可以是任何序列或可迭代对象），并将这些字符串都写入到文件（或流）中。
写入时不会添加换行符，因此你必须自行添加。

ff=open('file.txt','a')
ff.writelines('hello\n world')
ff.close()
text: hello
      world

没有writeline，因为可以使用write.

11.2.4关闭文件

要确保文件得以关闭，可使用一条try/finally语句，并在finally子句中调用close。

# 在这里打开文件
try: 
  # 将数据写入到文件中
finally: 
  file.close()

实际上，有一条专门为此设计的语句，那就是with语句。

with open("somefile.txt") as somefile: 
	do_something(somefile)

该语句允许使用上下文管理器
CASE:简化路径输入
如果每次查看文件不方便拷贝，需要输入很长的路径
$ less D:\ProgramData\Anaconda3\Lib\sqlite3\test\afile
可以写读取文档的简单代码

#refi.py
import sys
def main(file_nane):
	if file_name=='afile':
	dictionary='D:\ProgramData\Anaconda3\Lib\sqlite3\test\afile'

	with open(dictionary) as f:
		print(f.read()
if __name__=='__main__':
	main(sys.argv[1])

每次这样调用即可：
$ python refi.py afile|less

11.2.5使用文件的基本方法

read() readline() readlines() write() writelines()

#write() 
>>>f=open('emp.txt','w')
>>>f.write('hello,\nworld')
>>>f.close()
text: hello,
      world

#read() 
>>>f=open('emp.txt')
>>>print(f.read())
hello,
world
>>>f.close()

#readline()
>>>f=open('emp.txt')
>>>for i in range(2):
...    print(i,f.readline())
>>>f.close()
0 hello,

1 world

#readlines()
>>>f=open('emp.txt')
>>>print(f.readlines())
['hello,\n', 'world']
>>>f.close()

#writelines()
>>>f=open('emp.txt','r+')
>>>lines=f.readlines()#这里没有close后重新打开，导致原来的文档被保留，下文以追加的方式写入，如果这里close后重新open，则以覆盖的方式写入
>>>lines[0]='\nWelcome,\n'
>>>f.writelines(lines)
>>>f.close()
text:
	hello,
	world
	Welcome,
	world

Notice:open一个file后，执行read，并把read的内容赋值给一个变量，再重新把变量write，最后才close,这时写入模式会变成追加模式。

11.3迭代文件内容

之前使用函数运行一次出一次结果，例如读取一行，就要运行一次readline(),如果需要逐行读取，需要用到迭代。将使用一个名为process的虚构函数来表示对每个字符或行所做的处理

def process(string): 
	print('Processing:', string)

11.3.1 每次一个字符（或字节）

def process(string):
    print('processing:',string)

with open('emp.txt') as f:
    char=f.read(1)
    while char:
        process(char)
        char=f.read(1)

改进

def process(string):
    print('processing:',string)

with open('emp.txt') as f:    
    while True:
        char=f.read(1)
        if not char:break
        process(char)
processing: h
processing: e
processing: l
processing: l
processing: o

这里如果有换行符，也会被当做一个字符打印出来，只不是显示为空

11.3.2每次一行

def process(string):
    print('processing:',string)

with open('emp.txt') as f:    
    while True:
        char=f.readline()
        if not char:break
        process(char)
processing: hello,

processing: world

processing: Welcome,

processing: world
#中间的空行实际上是句尾的换行符

11.3.4使用fileinput实现延迟迭代

实现逐行读取，避免使用readlines()太占内存，可以使用while和readline()组合，更优先使用的是for循环，这里用到fileinput.input(filename)

def process(string):
    print('processing:',string)
import fileinput
for line in fileinput.input('file.txt'):
	process(line) 
fileinput.close()

输出结果和上一节是一样的。
Notice:没有fileinput.close()，可能会报错：input() already active

11.3.5文件迭代器

其实文件本身就可以迭代

def process(string):
    print('processing:',string)

with open('file.txt') as f:    
    for line in f:#流本身就可以迭代
        process(line)
processing: you are my

processing:  frienshello

processing:  world

processing: hello

如果不写入文件可以不关闭文件，可以不用上下文管理器with语句

def process(string):
    print('processing:',string)
for line in open('file.txt'):
        process(line)

sys.stdin也是可迭代的

def process(string):
    print('processing:',string)
    
import sys 
for line in sys.stdin: 
	process(line)

可对迭代器做的事情基本上都可对文件做

>>>list(fileinput.input('emp.txt'))
>>>list(open('emp.txt'))
['hello,\n', 'world\n', 'Welcome,\n', 'world']

CASE 对文件序列解包

>>>f=open('emp.txt','w')
>>>print('first',file=f)#这里不会直接打印出来，而是会写入file
>>>print('second',file=f)
>>>print('finally',file=f)
>>>f.close()
>>>lines=list(open('emp.txt'))
>>>lines
['first\n', 'second\n', 'finally\n']
>>>first,second,third=open('emp.txt')#序列解包
>>>first
'first\n'
text:
	first
	second
	finally

Notice:
print()其实有file这个参数
print(…)
print(value, …, sep=’ ‘, end=’\n’, file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.

序列解包，变量存储first,second,third=open(‘emp.txt’)
写入文件后要将其关闭

CASE:处理文档

import sys,os

def convertline(line):
	new_line=line.replace('\n','').replace('\t','')
	linelist=new_line.split(' ')
	new_linelist=[]
	for word in linelist:
		if len(word)!=0:
			new_linelist.append(word)
	return ' '.join(new_linelist)+' '
	
	
	
def main(deal):
	#setup files
	ticker,series=deal.split('_')
	scriptpath=os.getcwd()
	path_to_text=os.path.join(scriptpath,ticker,series)
	
	int_file=path_to_text+'/'+'int.txt'
	prin_file=path_to_text+'/'+'prin.txt'
	eod_file=path_to_text+'/'+'eod.txt'
	
	files=[int_file,prin_file,eod_file]
	#write the files
	
	for file in files:
		sl=[]
		for line in open(file):
			line=line.strip()
			if len(line)!=0:
				sl.append(convertline(line))	
		with open(file,'w') as w:
			w.writelines(sl)



if __name__=='__main__':
	main(sys.argv[1])