1 目录及文件操作

1.1 遍历多层级目录 os.walk()

'''
注意：自带递归，无限遍历。每次的 root 均代表当前目录，files 代表当前目录下的全部文件
dirs 代表当前目录下的全部目录。
'''
import os
path = r'c:\code\py'

for root, dirs, files in os.walk(path):
	for name in files:
		print(os.path.join(root, name))

	for name in dirs:
		print( os.path.join(root, name))

1.2 正则表达式替换多行文本 re.sub()

比 str.replace(re_patten, string) 更强大的字符串正则替换。

1. re.sub() 详细参数解释参考见此，官网英文函数说明见此。

2. 简洁全面的正则表达式介绍。

3. Python官网正则表达式的说明：Regular Expression Syntax。

# 使用 re.sub 正则表达式进行多行替换

import re

inputStr = '''
	<tag>
		<hello>
			<this is a string>
		</hello>
	</tag>
	'''

''' 替换 hello 标签及子元素 '''
# 更严谨的做法 
pattern = re.compile(r'<hello>.*</hello>' ,re.S)
# 不太严谨的做法: pattern = r'<hello>.*\n.*\n.*</hello>'   
newTxt = r'<hello class="456"></hello>'
rst = re.sub(pattern, newTxt, inputStr)
print(rst)

'''
输出：
	<tag>
		<hello class="456"></hello>
	</tag>
'''

'''
警告：
若后面多次出现 <hello> </hello>,
该正则会从第一个<hello>一直匹配到最后一个 </hello>
请注意。
'''

正则表达式简单说明：

若需要 . 匹配换行符，需要使用 re.S 模式，即
pattern = re.compile(r'<hello>.*</hello>' ,re.S)

扫描二维码关注公众号，回复： 11003094 查看本文章

此方法更为严谨，无论 hello 标签内含有多少行内容，均可符合正则条件。

而 pattern = r'<hello>.*\n.*\n.*</hello>' 则比较死板，当出现 hello 标签内部元素不止一行时，便会出错。

.* 的 . 表示该行的任意字符，* 表示任意多个；\n 表示换行符。
<hello>.*\n.*\n.*</hello> 这个正则模式的含义就是要找到符合以下要求的内容：
<hello> + 该行后面的所有任意字符 + 换行符 + 第二行的所有任意字符 + 第二行换行符 + 第三行的前面所有任意字符直到 </hello>。

import re

inputStr = '''
	<tag>
		<hello>
			<this is a string>
		</hello>
	</tag>
	'''

'''仅替换 <this is a string> 中的 string 为 newstr'''

pattern = re.compile(r'(<hello>.*<this is a )string(>.*</hello>)', re.S)
newTxt = r'\g<1>newstr\g<2>'
rst = re.sub(pattern, newTxt, inputStr)
print(rst)


'''
输出是：

	<tag>
		<hello>
			<this is a newstr>
		</hello>
	</tag>
	
'''

正则表达式简单说明：
pattern 中的括号表示分组捕获，多个括号自动从1分组，可交由替换串（参数2 newTxt）索引使用，用于保留被替换串的部分内容。
newTxt 中的 \g<1> 表示输出捕获的第1个分组（即pattern中的第一个括号内容)，\g<2>表示输出捕获的第2个分组（即pattern中的第二个括号内容）

简单理解方法：先把pattern用正则表达式表示出来，再把需要留用的内容用括号括起来。

re.sub() 共有5个参数。其中三个必选参数：pattern, repl, string；两个可选参数：count, flags。可自查手册。

1.3 读取中文文本文件

建议使用 with 语法，省去手动 close() 文件，更安全。

f = open(filename, 'r', encoding='utf-8')
cnt = f.read()
f.close()

# 注意：必须加 encoding= ，否则参数不匹配报错
# 函数原型 open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

#-----------------------------------------------------------

'''
读取文件，建议使用 with 关键词，可自动关闭文件。
无需手动 close() 文件
'''
with open(filename, 'r', encoding='utf-8') as f:
    cnt = f.read()

参考：Python.org 函数 open() 说明。

1.4 Python替换文件(部分)内容


f = open(filename, 'r+', encoding='utf-8')

cnt = f.read()

replaceTxt = cnt.replace(.....)

f.seek(0)        #指示符定位到文件开始
f.truncate()    #清空文件

f.write(replaceTxt)

# 注意：必须设置 seek(0)，否则出现意想不到的错误。

若未设置seek(0)，运行结果可能与预期不一致，参考此文。

huanqing2010

发布了62 篇原创文章 · 获赞 46 · 访问量 19万+

私信关注

【Python】Python实用代码片段_脚手架

1 目录及文件操作

1.1 遍历多层级目录 os.walk()

1.2 正则表达式替换多行文本 re.sub()

1.3 读取中文文本文件

1.4 Python替换文件(部分)内容

猜你喜欢