Given a folder, use Python to check if there are duplicate files in the given file vb.net tutorial
folder, delete if there are duplicates
The main knowledge points involved are:
Comprehensive application of os module
glob module comprehensive c# tutorial and application
Use filecmp module to compare two files
Step analysis
The logic implemented by the program can be embodied as:
Traverse to get all the files in a given folder, and then go through the python basic tutorial to compare whether the files are the same. If they are the same, delete the latter.
The key to achieving the problem becomes
How to judge whether two files are the same?
Here we can use the filecmp module to take a look at the official introduction document:
filecmp.cmp(f1, f2, shallow=True)
Compare files named f1 and f2 and return True if they seem to be equal, otherwise return False
If shallow is true, then files with the same os.stat() signature will be considered equal. Otherwise, the contents of the files will be compared.
So it can be used like this
# 假设x和y两个文件是相同的
print(filecmp.cmp(x, y))
# True
After solving this problem, we can start writing code!
Python implementation
Import the required libraries and set the target folder path
import os
import glob
import filecmp
dir_path = r'C:\\xxxx'
Then traverse to obtain the absolute path of all files, we can use the wildcard of the glob module combined with the recursive parameter to complete, the framework is as follows:
for file in glob.glob(path + '/**/*', recursive=True):
pass
After traversing to obtain each file or folder, it is necessary to determine whether it is a file. If it is a file, the absolute path may be stored in the list. Here are two more things:
First create an empty list, and then use list.append(i) to add the file path
Then use os.path.isfile(i) to determine whether it is a file, and return True to perform the operation of adding elements
The specific code is as follows
#Python学习交流群:778463939
file_lst = []
for i in glob.glob(dir_path + '/**/*', recursive=True):
if os.path.isfile(i):
file_lst.append(i)
In the previous step, we obtained all the file paths in the target folder, and then we can nest and traverse the path list, where filecmp.cmp performs file judgment and os.remove performs file deletion
for x in file_lst:
for y in file_lst:
if x != y:
if filecmp.cmp(x, y):
os.remove(y)
The code here has implemented the general logic, but there is one detail that needs to be considered: it is possible to loop until the file has been deleted by the previous judgment, causing os.remove(file) to report an error because the file does not exist
Therefore, you can use os.path.exists to judge the existence of the file, as shown below:
for x in file_lst:
for y in file_lst:
if x != y and os.path.exists(x) and os.path.exists(y):
if filecmp.cmp(x, y):
os.remove(y)
In this way, a simple file deduplication small program is completed, the complete code is as follows:
import os
import glob
import filecmp
dir_path = r'C:\xxxx'
file_lst = []
for i in glob.glob(dir_path + '/**/*', recursive=True):
if os.path.isfile(i):
file_lst.append(i)
for x in file_lst:
for y in file_lst:
if x != y and os.path.exists(x) and os.path.exists(y):
if filecmp.cmp(x, y):
os.remove(y)
Write at the end
Through the Python automated script production process in this article, we can once again experience the power of Python office automation