Sometimes a copy of the file is equivalent to a huge waste of disk space and will cause problems when you want to update the file. The following command is used to identify six of these documents.
In a recent post, we looked at how to identify and locate the file hard links (ie, point to the same hard disk and content sharing inode). In this article, we will look to find the same content , but not linked to the command file.
Hard links are useful because they enable multiple files stored in the local file system, but will not take up extra hard disk space. On the other hand, sometimes a copy of the file is equivalent to a huge waste of disk space when you want to update the file there will be a risk of causing distress. In this article, we will look at a variety of ways to identify these files.
With the diff command compares files
May compare two files The easiest way is to use the diff
command. Output appears different from your file. <
And the >
symbol represents the parameter passed over when the first ( <
) or second ( >
if there is additional line of text) file. In this example, in backup.html
the additional lines of text.
$ diff index.html backup.html
2438a2439,2441
> <pre>
> That's all there is to report.
> </pre>
复制代码
If diff
there is no output that represent the same two files.
$ diff home.html index.html
$
复制代码
diff
The only drawback is that it can only compare two files and you must specify the files to compare, this post can be found in some commands multiple duplicate files for you.
Use checksum
cksum
(Checksum) command calculating a checksum file. The checksum is a text content into a long number (e.g. 2,819,078,353,228,029) the mathematical reduction. While the checksum is not entirely unique, but the probability of different contents of the file checksums but the same minimal.
$ cksum *.html
2819078353 228029 backup.html
4073570409 227985 home.html
4073570409 227985 index.html
复制代码
In the above example, you can see and check produce the same second and third files how the same may be a default.
Using the find command
While the find
command and there is no option to find duplicate files, it can still be used to find files by name or type and run the cksum
command. E.g:
$ find . -name "*.html" -exec cksum {} \;
4073570409 227985 ./home.html
2819078353 228029 ./backup.html
4073570409 227985 ./index.html
复制代码
Use fslint command
fslint
Command can be specifically used to find duplicate files. Note that we gave it a starting position. If it needs to traverse a significant number of documents, which take time to complete. Notice how it is listed duplicate files and look for other problems, such as empty directory and bad ID.
$ fslint .
-----------------------------------file name lint
-------------------------------Invalid utf8 names
-----------------------------------file case lint
----------------------------------DUPlicate files <==
home.html
index.html
-----------------------------------Dangling links
--------------------redundant characters in links
------------------------------------suspect links
--------------------------------Empty Directories
./.gnupg
----------------------------------Temporary Files
----------------------duplicate/conflicting Names
------------------------------------------Bad ids
-------------------------Non Stripped executables
复制代码
You may need to install on your system fslint
. You may also need to add it to your command search path:
$ export PATH=$PATH:/usr/share/fslint/fslint
复制代码
Use rdfind command
rdfind
Command will look for duplicate (same content) files. Its name means "Repeat Search", and it can be based on file date to determine which file is the original - This is useful because it removes the newer file when you choose to delete the copy.
$ rdfind ~
Now scanning "/home/shark", found 12 files.
Now have 12 files in total.
Removed 1 files due to nonunique device and inode.
Total size is 699498 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
It seems like you have 2 files that are not unique
Totally, 223 KiB can be reduced.
Now making results file results.txt
复制代码
You can dryrun
run this command mode (in other words, just another change report may be made).
$ rdfind -dryrun true ~
(DRYRUN MODE) Now scanning "/home/shark", found 12 files.
(DRYRUN MODE) Now have 12 files in total.
(DRYRUN MODE) Removed 1 files due to nonunique device and inode.
(DRYRUN MODE) Total size is 699352 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
(DRYRUN MODE) It seems like you have 2 files that are not unique
(DRYRUN MODE) Totally, 223 KiB can be reduced.
(DRYRUN MODE) Now making results file results.txt
复制代码
rdfind
Command also provides similar ignore empty document ( -ignoreempty
) and follow symbolic links ( -followsymlinks
) function. Check the man page for an explanation.
-ignoreempty ignore empty files
-minsize ignore files smaller than speficied size
-followsymlinks follow symbolic links
-removeidentinode remove files referring to identical inode
-checksum identify checksum type to be used
-deterministic determiness how to sort files
-makesymlinks turn duplicate files into symbolic links
-makehardlinks replace duplicate files with hard links
-makeresultsfile create a results file in the current directory
-outputname provide name for results file
-deleteduplicates delete/unlink duplicate files
-sleep set sleep time between reading files (milliseconds)
-n, -dryrun display what would have been done, but don't do it
复制代码
Note rdfind
command provides -deleteduplicates true
the option to delete the copy settings. I hope this small problem on the command syntax will not annoy you. ;-)
$ rdfind -deleteduplicates true .
...
Deleted 1 files. <==
复制代码
You will probably need to be installed on your system rdfind
command. Test it to become familiar with how to use it might be a good idea.
Use the command fdupes
fdupes
Command also makes it easy to identify duplicate files. It also provides a number of useful options - for example, to iteration -r
. In this case, it would like to duplicate files are grouped together:
$ fdupes ~
/home/shs/UPGRADE
/home/shs/mytwin
/home/shs/lp.txt
/home/shs/lp.man
/home/shs/penguin.png
/home/shs/penguin0.png
/home/shs/hideme.png
复制代码
This is an example of using iterative Note that many duplicate files are important (users .bashrc
and .profile
files) and should not be deleted.
# fdupes -r /home
/home/shark/home.html
/home/shark/index.html
/home/dory/.bashrc
/home/eel/.bashrc
/home/nemo/.profile
/home/dory/.profile
/home/shark/.profile
/home/nemo/tryme
/home/shs/tryme
/home/shs/arrow.png
/home/shs/PNGs/arrow.png
/home/shs/11/files_11.zip
/home/shs/ERIC/file_11.zip
/home/shs/penguin0.jpg
/home/shs/PNGs/penguin.jpg
/home/shs/PNGs/penguin0.jpg
/home/shs/Sandra_rotated.png
/home/shs/PNGs/Sandra_rotated.png
复制代码
fdupe
Many options are listed command is as follows. Use fdupes -h
command or read the man page for details.
-r --recurse recurse
-R --recurse: recurse through specified directories
-s --symlinks follow symlinked directories
-H --hardlinks treat hard links as duplicates
-n --noempty ignore empty files
-f --omitfirst omit the first file in each set of matches
-A --nohidden ignore hidden files
-1 --sameline list matches on a single line
-S --size show size of duplicate files
-m --summarize summarize duplicate files information
-q --quiet hide progress indicator
-d --delete prompt user for files to preserve
-N --noprompt when used with --delete, preserve the first file in set
-I --immediate delete duplicates as they are encountered
-p --permissions don't soncider files with different owner/group or
permission bits as duplicates
-o --order=WORD order files according to specification
-i --reverse reverse order while sorting
-v --version display fdupes version
-h --help displays help
复制代码
fdupes
Another command you may need to install and use for some time to become familiar with the command of its many options.
to sum up
Linux systems provide a positioning and (potentially) can remove duplicate files of a series of good tools, and lets you specify the search area, and when options for handling duplicate files when you discovered.
via: www.networkworld.com/article/339…
Author: Sandra Henry-Stocker topics: lujun9972 Translator: tomjlw proofread: wxy
This article from the LCTT original compiler, Linux China is proud
Reproduced in: https: //juejin.im/post/5cfe74985188254ee433c032