一 difflib模块
difflib模块:是提供的类和方法用来进行序列的差异化比较,它能够比对文件并生成差异结果文本或者html格式的差异化比较页面
1) Differ:以文本格式显示结果
import difflib
text1 = '''
1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True)
text2 = '''
1. Beautifu is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True)
#以文本方式展示两个文本的不同:
d = difflib.Differ()
result = list(d.compare(text1, text2))
result = " ".join(result)
print(result)
其中 + - 号表示有差异行,?为下标显示
2) HtmlDiff:以html方式显示结果
import difflib
text1 = '''
1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True)
text2 = '''
1. Beautifu is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True)
#以html方式展示两个文本的不同, 浏览器打开:
d = difflib.HtmlDiff()
with open("passwd.html", 'w') as f:
f.write(d.make_file(text1, text2))
用颜色高亮显示文本的增加,删除或者更改
3) context_diff:返回一个差异文本行的生成器
from difflib import context_diff
import sys
s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
sys.stdout.write(line)
对于字符串列表进行比较,可以看出只有第四个元素是相同的
假使s1 = ['eggs\n', 'ham\n', 'guido\n']
为三个元素
则结果为:
每个元素会依次进行比较,而不是按照索引进行比较
4) get_close_matches:返回最大匹配结果的列表
from difflib import get_close_matches
d=get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
print(d)
返回的是与对象相似的所有元素的列表
5) ndiff:返回一个文本格式的差异结果
from difflib import ndiff
diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
'ore\ntree\nemu\n'.splitlines(1))
print(''.join(diff))
6) restore:返回一个由两个比对序列产生的结果
from difflib import ndiff, restore
diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
'ore\ntree\nemu\n'.splitlines(1))
diff = list(diff) # materialize the generated delta into a list
print(''.join(restore(diff, 1)))
假使print(''.join(restore(diff, 2))
因只有两个列表,不能再大于2了,表示的是列表数
二 简单应用
比较 /etc/passwd 和 /tmp/passwd 两个文件内容的不同
import difflib
file1 = '/etc/passwd'
file2 = '/tmp/passwd'
with open(file1) as f1, open(file2) as f2:
text1 = f1.readlines()
text2 = f2.readlines()
d = difflib.HtmlDiff()
with open("passwd.html", 'w') as f:
f.write(d.make_file(text1, text2))
利用difflib模块来比较两个文件的不同,同时使用html格式高亮显示