How to find duplicate files in the system, quickly free up disk space?

Whether Windows PC or Linux computer, the use of the process, more or less will leave a lot of duplicate files. These documents not only take up our disk, but also a drag on our system, so it is necessary to get rid of these duplicate files.

6 Ways This article will introduce the system to find duplicate files, allowing you to quickly free up hard disk space!

1. Use the diff command to compare files

In our usual operation among the comparison between the two documents is probably the easiest way to use the diffcommand. the output of the diff command to use <and >symbols show the differences between two files, use this feature we can find the same file.

When two files differ, diff command output point of difference:

$ diff index.html backup.html
2438a2439,2441
> <pre>
> That's all there is to report.
> </pre>

If your diff command no output, it means the same two files:

$ diff home.html index.html
$

However, the disadvantage is that it diff command can only compare two files, if we want to compare multiple files, so that the two compare the efficiency of two must be very low.

2. Use the checksum

Checksum command cksumbased on a predetermined algorithm to calculate the content of the file a long number (e.g. 2,819,078,353,228,029). Although the calculated result is not absolutely unique, but the content does not lead to the same file checksum same possibilities with Chinese soccer team into the World Cup almost.

$ cksum *.html
2819078353 228029 backup.html
4073570409 227985 home.html
4073570409 227985 index.html

In our above operation, we can see the second and third file checksums are the same, so we can say that these two documents are the same.

3. Use the find command

Although the find command has no options to find duplicate files, but it can be used to search for files by name or type and run cksum command. Specific operation is as follows.

$ find . -name "*.html" -exec cksum {} \;
4073570409 227985 ./home.html
2819078353 228029 ./backup.html
4073570409 227985 ./index.html

4. Use the command fslint

fslintCommand can be used to specifically find duplicate files. But here Note that we need to give it a starting position. If we need to run a large number of files, the command may take a long time to complete the search.

$ fslint .
-----------------------------------file name lint
-------------------------------Invalid utf8 names
-----------------------------------file case lint
----------------------------------DUPlicate files   <==
home.html
index.html
-----------------------------------Dangling links
--------------------redundant characters in links
------------------------------------suspect links
--------------------------------Empty Directories
./.gnupg
----------------------------------Temporary Files
----------------------duplicate/conflicting Names
------------------------------------------Bad ids
-------------------------Non Stripped executables

Tips : We fslint must be installed on your system, you also need to add it to the search path:

$ export PATH=$PATH:/usr/share/fslint/fslint

5. Use command rdfind

rdfindCommand will also find duplicate files (same content). Called "redundant data search," the command to determine which files based on file date is the original document, select Delete duplicates this helpful to us because it will remove the newer file.

$ rdfind ~
Now scanning "/home/alvin", found 12 files.
Now have 12 files in total.
Removed 1 files due to nonunique device and inode.
Total size is 699498 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
It seems like you have 2 files that are not unique
Totally, 223 KiB can be reduced.
Now making results file results.txt

We can also run in dryrun in.

$ rdfind -dryrun true ~
(DRYRUN MODE) Now scanning "/home/alvin", found 12 files.
(DRYRUN MODE) Now have 12 files in total.
(DRYRUN MODE) Removed 1 files due to nonunique device and inode.
(DRYRUN MODE) Total size is 699352 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
(DRYRUN MODE) It seems like you have 2 files that are not unique
(DRYRUN MODE) Totally, 223 KiB can be reduced.
(DRYRUN MODE) Now making results file results.txt

rdfind command also ignores some empty file (-ignoreempty) and follow symbolic links option (-followsymlinks) and the like. The following detailed explanation of its common options.

Options significance
-ignoreempty Ignore empty file
-minsize Ignore less than a certain size file
-followsymlink Follow symbolic links
-removeidentinode Delete reference the same inode file
-checksum Checksum type identifier to be used
-deterministic Decide how to sort files
-makesymlinks Convert duplicate file is a symbolic link
-makehardlinks Replacing duplicate files with hard links
-makeresultsfile Create a results file in the current directory
-outputname The results provide the name of the file
-deleteduplicates Delete / unlink duplicate files
-sleep Set Sleep Time (ms) to read the file
-n,-dryrun The operation display should be executed, but do not execute

Here we need to note, rdfind command provides the use of -deleteduplicates trueset delete duplicate files option. As the name suggests, it will automatically use this option to delete duplicate files.

$ rdfind -deleteduplicates true .
...
Deleted 1 files.    <==

Of course, that we must also be installed rdfind command on the system.

6. Use the command fdupes

fdupesCommands can also be very easily identify duplicate files, and provides a number of useful options. In the simplest operation, it will duplicate files together, as follows:

$ fdupes ~
/home/alvin/UPGRADE
/home/alvin/mytwin

/home/alvin/lp.txt
/home/alvin/lp.man

/home/alvin/penguin.png
/home/alvin/penguin0.png
/home/alvin/hideme.png

-rOptions on behalf of recursion, said it would use the following recursive manner in each directory to find duplicate files. However, in Linux there are many duplicate files are very important (such as the user's .bashrc and .profile files), if the system is deleted will cause an exception.

# fdupes -r /home
/home/shark/home.html
/home/shark/index.html

/home/dory/.bashrc
/home/eel/.bashrc

/home/nemo/.profile
/home/dory/.profile
/home/shark/.profile

/home/nemo/tryme
/home/shs/tryme

/home/shs/arrow.png
/home/shs/PNGs/arrow.png

/home/shs/11/files_11.zip
/home/shs/ERIC/file_11.zip

/home/shs/penguin0.jpg
/home/shs/PNGs/penguin.jpg
/home/shs/PNGs/penguin0.jpg

/home/shs/Sandra_rotated.png
/home/shs/PNGs/Sandra_rotated.png

fdupes command common options shown in the following table:

Options significance
-r --recurse Recursion
-R --recurse Recursive directory specified
-s --symlinks-H --hardlinks Follow symbolic links directory
-H --hardlinks The hard link as a duplicate link
-n --noempty Ignore empty file
-f --omitfirst Matching each of the first file is omitted
It --nohidden Ignore hidden files
-1 --smeline The same list of matching a single row
-S --size The size of the display duplicate files
-m --summarize Summary duplicate file information
-q --quiet Progress Indicator
-d --delete Prompt the user to save the file
-N --noprompt Ineffective when used in conjunction with --delete, retained the first file in the collection
-I --immediate Remove duplicates them in the face
-p --permissions SONCIDER file permissions will not have a different owner / group permission bits or as duplicates
-o --order=WORD According to the specification of the order file WORD
-i --reverse Reverse reverse order while sorting
-v --version Display version fdupes
-h --help Display help

summary

Linux system provides us with a lot of tools to locate and remove duplicate files, use these tools to quickly find the disk where the duplicate files and delete them. The hope that we can bring help to share ~

-----------------

Xu Liang , the world's top 500 foreign Linux development engineers, Linux evangelist, I welcome the attention of the public number " Liang Xu Linux " is full of dry goods!
→ "dry technology push"
→ "exclusive information sharing"
→ "clearing up the community."
If you are interested in my topic content, you can focus on my blog: lxlinux.net

Guess you like

Origin www.cnblogs.com/yychuyu/p/12455007.html