Use Python to automatically clean up duplicate files in the computer, as long as 10 lines of code are enough

Given a folder, use Python to check if there are duplicate files in the given file vb.net tutorial
folder, delete if there are duplicates

The main knowledge points involved are:

Comprehensive application of os module

glob module comprehensive c# tutorial and application

Use filecmp module to compare two files

Step analysis

The logic implemented by the program can be embodied as:

Traverse to get all the files in a given folder, and then go through the python basic tutorial to compare whether the files are the same. If they are the same, delete the latter.

The key to achieving the problem becomes

How to judge whether two files are the same?

Here we can use the filecmp module to take a look at the official introduction document:

filecmp.cmp(f1, f2, shallow=True)

Compare files named f1 and f2 and return True if they seem to be equal, otherwise return False

If shallow is true, then files with the same os.stat() signature will be considered equal. Otherwise, the contents of the files will be compared.

So it can be used like this

# 假设x和y两个文件是相同的
print(filecmp.cmp(x, y))
# True

After solving this problem, we can start writing code!

Python implementation

Import the required libraries and set the target folder path

import os
import glob
import filecmp

dir_path = r'C:\\xxxx'

Then traverse to obtain the absolute path of all files, we can use the wildcard of the glob module combined with the recursive parameter to complete, the framework is as follows:

for file in glob.glob(path + '/**/*', recursive=True):
    pass

After traversing to obtain each file or folder, it is necessary to determine whether it is a file. If it is a file, the absolute path may be stored in the list. Here are two more things:

First create an empty list, and then use list.append(i) to add the file path

Then use os.path.isfile(i) to determine whether it is a file, and return True to perform the operation of adding elements

The specific code is as follows

#Python学习交流群:778463939

file_lst = []

for i in glob.glob(dir_path + '/**/*', recursive=True):
    if os.path.isfile(i):
        file_lst.append(i)

In the previous step, we obtained all the file paths in the target folder, and then we can nest and traverse the path list, where filecmp.cmp performs file judgment and os.remove performs file deletion

for x in file_lst:
    for y in file_lst:
        if x != y:
            if filecmp.cmp(x, y):
                os.remove(y)

The code here has implemented the general logic, but there is one detail that needs to be considered: it is possible to loop until the file has been deleted by the previous judgment, causing os.remove(file) to report an error because the file does not exist

Therefore, you can use os.path.exists to judge the existence of the file, as shown below:

for x in file_lst:
    for y in file_lst:
        if x != y and os.path.exists(x) and os.path.exists(y):
            if filecmp.cmp(x, y):
                os.remove(y)

In this way, a simple file deduplication small program is completed, the complete code is as follows:

import os
import glob
import filecmp

dir_path = r'C:\xxxx'

file_lst = []

for i in glob.glob(dir_path + '/**/*', recursive=True):
    if os.path.isfile(i):
        file_lst.append(i)

for x in file_lst:
    for y in file_lst:
        if x != y and os.path.exists(x) and os.path.exists(y):
            if filecmp.cmp(x, y):
                os.remove(y)

Write at the end

Through the Python automated script production process in this article, we can once again experience the power of Python office automation

Guess you like

Origin blog.csdn.net/chinaherolts2008/article/details/112910790