difflib为python的标准库模块,无需安装
作用是对比文本之间的差异,并且支持输出可读性比较强的HTML文档,与Linux下的diff 命令相似。
difflib模块提供的类和方法用来进行序列所谓差异化比较,能够对比文件并称成差异结果文件或html 格式差异化比较页面
1.对比文件生成差异结果的文本
生成的差异文本中的符号理解
符号 | 含义 |
---|---|
‘-’ | 包含在第一个系列行中,但不包含第二个 |
‘+’ | 包含在第二个系列行中,但不包含第一个 |
’ ’ | 两个系列行一致 |
‘?’ | 存在增量差异 |
‘^’ | 存在差异字符 |
先将文本内容按行分割
text1 = splitlines(keepends=False) #将多行文本按行分割,返回一个列表不保留行尾换行符
text2 = splitlines(keepends=True) #将多行文本按行分割,返回一个列表保留行尾换行符
生成差异结果文件
import difflib #导入模块
diff = difflib.Differ() #生成差异对象
result = diff.compare(text1,text2) #使用差异对象比较两个文本列表的差异,生成差异结果对象
生成的差异对象不能直接查看,需要转化成列表并生成链接文本
mport difflib
text1 = ''' 1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''.splitlines(keepends=True)
print(text1)
text2 = ''' 1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.
'''.splitlines(keepends=True)
print(text2)
diff= difflib.Differ()
result =''.join(list(diff.compare(text1,text2)))
print(result)
运行结果:
[' 1. Beautiful is better than ugly.\n', ' 2. Explicit is better than implicit.\n', ' 3. Simple is better than complex.\n', ' 4. Complex is better than complicated.\n', '\t\t'] #text1 按行分割后生成的列表
[' 1. Beautiful is better than ugly.\n', ' 3. Simple is better than complex.\n', ' 4. Complicated is better than complex.\n', ' 5. Flat is better than nested.\n', ' '] #text2 按行分割后生成的列表
差异文本
1. Beautiful is better than ugly.
- 2. Explicit is better than implicit.
- 3. Simple is better than complex.
+ 3. Simple is better than complex.
? ++
- 4. Complex is better than complicated.
? ^ ---- ^
+ 4. Complicated is better than complex.
? ++++ ^ ^
- + 5. Flat is better than nested.
+
对比文件生成html格式的差异化比较页面
1.将文本内容按行分割
2.生成差异结果文件
diff = difflib.HtmlDiff() #生成差异对象
result = diff.make_file(text1,text2) #对比生成差异结果
3.将差异结果写入html文件中
with open(‘diff.html’,‘w’) as f:
f.write(result)
import difflib
text1 = ''' 1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
'''
text2 = ''' 1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.
'''.
text1=splitlines(keepends=True)
text2=splitlines(keepends=True)
diff = difflib.HtmlDiff()
result = diff.make_file(text1,text2)
with open('diff.html','w') as f:
f.write(result)
使用浏览器打开生成的html文件,不同的地方会用不同颜色显示
文件之间的对比
先要将文件内容读取,然后按上述文本比较方法就可以了
import difflib
filename1 = '/tmp/passwd'
filename2 = '/tmp/passwd1'
with open(filename1) as f1,open(filename2) as f2: #打开文件
content1 = f1.read().splitlines(keepends=True) #打开文件1读取内容并按行分割
content2 = f2.read().splitlines(keepends=True)
d = difflib.HtmlDiff()
htmlContent = d.make_file(content1,content2)
with open('passwdDiff.html','w') as f:
f.write(htmlContent)