Linux Command uniq remove duplicate entries

This introduction uniq command, uniq is linux command pipeline in a family, its main function is to remove duplicates.

Before introducing the uniq command, let's create a new file in the following cases need to use /tmp/uniq.txt, reads as follows

 

By default, uniq only retrieve adjacent to duplicate data so heavy. In /tmp/uniq.txt Although "onmpw web site" has three, but one of the other two are not adjacent, so just go to a heavy, empathy "error php function" is also the case.

In view of the above retrieval mechanism, so that under normal circumstances to uniq and sort commands used together.

# sort 1.txt | uniq
alpha css web cat linux command error php function hello world onmpw web site recruise page site repeat no data wello web site

Now look is not all duplicates have been through the deduplication process.

Well, after a small test chopper, let's get started on the option uniq command of a brief introduction.

-c  number of repetitions of each row of data statistical

sort 1.txt | uniq -c
1 alpha css web 1 cat linux command 2 error php function 1 hello world 3 onmpw web site 1 recruise page site 1 repeat no data 1 wello web site

We see the "error php function" appears twice, "onmpw web site" appears three times. The rest are not duplicates it to 1.

-i  ignore case

1.txt add a row of data "Error PHP function"

cat 1.txt

alpha css web
cat linux command
error php function
hello world
onmpw web site
onmpw web site
wello web site
Error PHP function
recruise page site
error php function
repeat no data
onmpw web site
sort 1.txt | uniq –c

1 alpha css web
1 cat linux command
2 error php function
1 Error PHP function
1 hello world
3 onmpw web site
1 recruise page site
1 repeat no data
1 wello web site

We look at the results, uniq default is case-sensitive. Use -i can ignore capitalization issues

 

sort 1.txt | uniq –c –i
1 alpha css web 1 cat linux command 3 error php function 1 hello world 3 onmpw web site 1 recruise page site 1 repeat no data 1 wello web site

Now look at is not the case has been ignored.

-u  output only data without duplication

sort 1.txt | uniq –iu

alpha css web
cat linux command
hello world
recruise page site
repeat no data
wello web site

That did not, the result of "error php function" and "onmpw web site" have not been output.

-w N  represents the start retrieving only the first character of N characters to re-sentence.

sort 1.txt | uniq –iw 2

alpha css web
cat linux command
error php function
hello world
onmpw web site
recruise page site
wello web site

Here we let uniq only the first two characters to search, repeat and former recruit two characters are re, so these two lines also considered to be repeated.

-f N  represents the first N fields skip, repeat start retrieving data from the first N + 1 fields. A tab or a space character as the delimiter.

 

sort 1.txt | uniq –icf 2

1 alpha css web
1 cat linux command
3 error php function
1 hello world
4 onmpw web site
1 repeat no data
1 wello web site

We can see in the results, which is slightly over the previous two fields, from the beginning of the third field sentenced to heavy. The same "recruise page site" and "onmpw web site" in the third field, it is considered to be the same data. But as we see, "wello web site" and "onmpw web site" not only the same as the third field, the second is the same. So why it is not included in the "onmpw web site" duplicate data in it. For this problem to be back in front of that, uniq detected only adjacent data is a duplicate.

 

To solve this problem also needs to proceed on the sort order. Remember the -k option to sort the command of it, yes, we will use it to solve.

sort –k 2 1.txt | uniq –icf 2

1 alpha css web 1 cat linux command 1 repeat no data 1 recruise page site 3 error php function 4 onmpw web site 1 hello world

We see, is not resolved.

-s N expressed skip the first N characters, this option is not on the example we cite here, and this option -f N usage almost. Just skip the front of the -f N is N fields; -s is to skip the first N characters.

-d  only the data of the first stripe are duplicates.

sort 1.txt | uniq -idw 2

repeat no data
error php function
onmpw web site

Only the results of these three. Why "repeat no data" of this data, where attention -w 2 of the application.

-D  for duplicates of all output

sort 1.txt | uniq –iDw 2

repeat no data
recruise page site
error php function
error php function
Error PHP function
onmpw web site
onmpw web site
onmpw web site

Well, all the usual options uniq command on the already finished are introduced. About uniq More detailed information can use the command info uniq.

I hope this article to be helpful.

Guess you like

Origin www.cnblogs.com/lee-qi/p/11440518.html