Whether Windows PC or Linux computer, the use of the process, more or less will leave a lot of duplicate files. These documents not only take up our disk, but also a drag on our system, so it is necessary to get rid of these duplicate files.
6 Ways This article will introduce the system to find duplicate files, allowing you to quickly free up hard disk space!
1. Use the diff command to compare files
In our usual operation among the comparison between the two documents is probably the easiest way to use the diff
command. the output of the diff command to use <
and >
symbols show the differences between two files, use this feature we can find the same file.
When two files differ, diff command output point of difference:
$ diff index.html backup.html
2438a2439,2441
> <pre>
> That's all there is to report.
> </pre>
If your diff command no output, it means the same two files:
$ diff home.html index.html
$
However, the disadvantage is that it diff command can only compare two files, if we want to compare multiple files, so that the two compare the efficiency of two must be very low.
2. Use the checksum
Checksum command cksum
based on a predetermined algorithm to calculate the content of the file a long number (e.g. 2,819,078,353,228,029). Although the calculated result is not absolutely unique, but the content does not lead to the same file checksum same possibilities with Chinese soccer team into the World Cup almost.
$ cksum *.html
2819078353 228029 backup.html
4073570409 227985 home.html
4073570409 227985 index.html
In our above operation, we can see the second and third file checksums are the same, so we can say that these two documents are the same.
3. Use the find command
Although the find command has no options to find duplicate files, but it can be used to search for files by name or type and run cksum command. Specific operation is as follows.
$ find . -name "*.html" -exec cksum {} \;
4073570409 227985 ./home.html
2819078353 228029 ./backup.html
4073570409 227985 ./index.html
4. Use the command fslint
fslint
Command can be used to specifically find duplicate files. But here Note that we need to give it a starting position. If we need to run a large number of files, the command may take a long time to complete the search.
$ fslint .
-----------------------------------file name lint
-------------------------------Invalid utf8 names
-----------------------------------file case lint
----------------------------------DUPlicate files <==
home.html
index.html
-----------------------------------Dangling links
--------------------redundant characters in links
------------------------------------suspect links
--------------------------------Empty Directories
./.gnupg
----------------------------------Temporary Files
----------------------duplicate/conflicting Names
------------------------------------------Bad ids
-------------------------Non Stripped executables
Tips : We fslint must be installed on your system, you also need to add it to the search path:
$ export PATH=$PATH:/usr/share/fslint/fslint
5. Use command rdfind
rdfind
Command will also find duplicate files (same content). Called "redundant data search," the command to determine which files based on file date is the original document, select Delete duplicates this helpful to us because it will remove the newer file.
$ rdfind ~
Now scanning "/home/alvin", found 12 files.
Now have 12 files in total.
Removed 1 files due to nonunique device and inode.
Total size is 699498 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
It seems like you have 2 files that are not unique
Totally, 223 KiB can be reduced.
Now making results file results.txt
We can also run in dryrun in.
$ rdfind -dryrun true ~
(DRYRUN MODE) Now scanning "/home/alvin", found 12 files.
(DRYRUN MODE) Now have 12 files in total.
(DRYRUN MODE) Removed 1 files due to nonunique device and inode.
(DRYRUN MODE) Total size is 699352 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
(DRYRUN MODE) It seems like you have 2 files that are not unique
(DRYRUN MODE) Totally, 223 KiB can be reduced.
(DRYRUN MODE) Now making results file results.txt
rdfind command also ignores some empty file (-ignoreempty) and follow symbolic links option (-followsymlinks) and the like. The following detailed explanation of its common options.
Options | significance |
---|---|
-ignoreempty | Ignore empty file |
-minsize | Ignore less than a certain size file |
-followsymlink | Follow symbolic links |
-removeidentinode | Delete reference the same inode file |
-checksum | Checksum type identifier to be used |
-deterministic | Decide how to sort files |
-makesymlinks | Convert duplicate file is a symbolic link |
-makehardlinks | Replacing duplicate files with hard links |
-makeresultsfile | Create a results file in the current directory |
-outputname | The results provide the name of the file |
-deleteduplicates | Delete / unlink duplicate files |
-sleep | Set Sleep Time (ms) to read the file |
-n,-dryrun | The operation display should be executed, but do not execute |
Here we need to note, rdfind command provides the use of -deleteduplicates true
set delete duplicate files option. As the name suggests, it will automatically use this option to delete duplicate files.
$ rdfind -deleteduplicates true .
...
Deleted 1 files. <==
Of course, that we must also be installed rdfind command on the system.
6. Use the command fdupes
fdupes
Commands can also be very easily identify duplicate files, and provides a number of useful options. In the simplest operation, it will duplicate files together, as follows:
$ fdupes ~
/home/alvin/UPGRADE
/home/alvin/mytwin
/home/alvin/lp.txt
/home/alvin/lp.man
/home/alvin/penguin.png
/home/alvin/penguin0.png
/home/alvin/hideme.png
-r
Options on behalf of recursion, said it would use the following recursive manner in each directory to find duplicate files. However, in Linux there are many duplicate files are very important (such as the user's .bashrc and .profile files), if the system is deleted will cause an exception.
# fdupes -r /home
/home/shark/home.html
/home/shark/index.html
/home/dory/.bashrc
/home/eel/.bashrc
/home/nemo/.profile
/home/dory/.profile
/home/shark/.profile
/home/nemo/tryme
/home/shs/tryme
/home/shs/arrow.png
/home/shs/PNGs/arrow.png
/home/shs/11/files_11.zip
/home/shs/ERIC/file_11.zip
/home/shs/penguin0.jpg
/home/shs/PNGs/penguin.jpg
/home/shs/PNGs/penguin0.jpg
/home/shs/Sandra_rotated.png
/home/shs/PNGs/Sandra_rotated.png
fdupes command common options shown in the following table:
Options | significance |
---|---|
-r --recurse | Recursion |
-R --recurse | Recursive directory specified |
-s --symlinks-H --hardlinks | Follow symbolic links directory |
-H --hardlinks | The hard link as a duplicate link |
-n --noempty | Ignore empty file |
-f --omitfirst | Matching each of the first file is omitted |
It --nohidden | Ignore hidden files |
-1 --smeline | The same list of matching a single row |
-S --size | The size of the display duplicate files |
-m --summarize | Summary duplicate file information |
-q --quiet | Progress Indicator |
-d --delete | Prompt the user to save the file |
-N --noprompt | Ineffective when used in conjunction with --delete, retained the first file in the collection |
-I --immediate | Remove duplicates them in the face |
-p --permissions | SONCIDER file permissions will not have a different owner / group permission bits or as duplicates |
-o --order=WORD | According to the specification of the order file WORD |
-i --reverse | Reverse reverse order while sorting |
-v --version | Display version fdupes |
-h --help | Display help |
summary
Linux system provides us with a lot of tools to locate and remove duplicate files, use these tools to quickly find the disk where the duplicate files and delete them. The hope that we can bring help to share ~
-----------------
Xu Liang , the world's top 500 foreign Linux development engineers, Linux evangelist, I welcome the attention of the public number " Liang Xu Linux " is full of dry goods!
→ "dry technology push"
→ "exclusive information sharing"
→ "clearing up the community."
If you are interested in my topic content, you can focus on my blog: lxlinux.net