linux uniq command usage

uniq command:
  The specified file or standard ASCII input uniqueness checking to determine whether the text file line recurring, used to analyze the log; see tcp connections each state, ip or domain connections, etc. ranking scene, generally sort command in combination with use.
 
Format:
uniq [option] ... [file 1] [File 2]
uniq has been deleted from the sorted text file file1 duplicate rows, file2 output to standard output or, as is often the filter, with the pressure test the pipe. Before using uniq command, you must ensure that the operation has been sort text file sort, and uniq if run with no arguments, will remove duplicate rows.
 
Common parameters:
[root@bqh-118 ~]# uniq --help

Usage: uniq [option] ... [file]
screened from adjacent input file, or standard input matching lines and written to the output file or standard output.

Matching lines will be consolidated at the first occurrence is without any options.

Long option must be used when the parameter for short options too necessary use.
-c, --count before each row represents a respective row plus mesh prefix number of occurrences
-d, --repeated duplicate rows output only, or more than 2 times 2 times.
-D, --all-repeated [= delimit -method to display all the duplicate rows
delimit-method = {none (default ), prepend, separate}
empty behavior limits
-f, --skip-fields = N skipped when compared first N columns
-i, --ignore-case in a case-insensitive comparison when
-s, --skip-chars = N skipped when compared to the first N characters
-u, --unique show only a single row
-z , --zero-terminated with '\ 0' as a line terminator, instead of the new wrapping
-w, --check-chars = N the content of each subsequent row is not a control character N
--help display this help and exit
--version display version information and exit

If the domain for the first null character (usually including spaces and tabs), then a non-null character, null character before the character field will be skipped.

Tip: uniq does not check for duplicate rows unless they are adjacent rows.
If you would like to enter sort, uniq not use the "sort -u".

test:

No arguments only on the same row adjacent to heavy content

[root@bqh-118 ~]# cat qc.log
192.168.43.117
192.168.43.119
192.168.43.118
192.168.43.118
192.168.43.117
192.168.43.117
192.168.43.119
192.168.43.110
[root@bqh-118 ~]# uniq qc.log
192.168.43.117
192.168.43.119
192.168.43.118
192.168.43.117
192.168.43.119
192.168.43.110

Let's sort through duplicate adjacent lines:

[root@bqh-118 ~]# sort qc.log 
192.168.43.110
192.168.43.117
192.168.43.117
192.168.43.117
192.168.43.118
192.168.43.118
192.168.43.119
192.168.43.119

uniq come with sort weight:

[root@bqh-118 ~]# sort qc.log |uniq
192.168.43.110
192.168.43.117
192.168.43.118
192.168.43.119
[root@bqh-118 ~]# sort -u qc.log 
192.168.43.110
192.168.43.117
192.168.43.118
192.168.43.119

Of course, we can also go through sort -u file to achieve weight

Deduplication Count:

[root@bqh-118 ~]# sort qc.log |uniq -c
      1 192.168.43.110
      3 192.168.43.117
      2 192.168.43.118
      2 192.168.43.119
[root@bqh-118 ~]# sort qc.log 
192.168.43.110
192.168.43.117
192.168.43.117
192.168.43.117
192.168.43.118
192.168.43.118
192.168.43.119
192.168.43.119

Check duplicate entries:

[root@bqh-118 ~]# sort qc.log |uniq -d
192.168.43.117
192.168.43.118
192.168.43.119

View all duplicate entries:

[root@bqh-118 ~]# sort qc.log |uniq -D
192.168.43.117
192.168.43.117
192.168.43.117
192.168.43.118
192.168.43.118
192.168.43.119
192.168.43.119

Case-insensitive, remove duplicate entries:

[root@bqh-118 ~]# cat qc1.log 
apple
APple BANAN banan grape orange Orange
bqh jyw
bqh1 jyw [root@bqh-118 ~]# uniq -i qc1.log apple BANAN grape orange
bqh jyw
bqh1 jyw

Skip the first column:

[root@bqh-118 ~]# uniq -f1 qc1.log 
apple
bqh  jyw
bqh1 jyw

Skip the first character of each line:

[root@bqh-118 ~]# uniq -s1 qc1.log 
apple
APple
BANAN
banan
grape
orange
bqh  jyw
bqh1 jyw

Case: deal with it qc2.log contents of the file, the domain name will be taken out and counted the sorting process according to the domain name.

[root@bqh-118 ~]# cat  qc2.log
http://www.baidu.com
http://www.xiaobai.com
http://www.etiantian.org
http://www.jyw.com
http://www.jyw.com
http://www.xiaobai.com
http://www.etiantian.org
http://www.jyw.com
http://www.baidu.com
http://www.baidu.com
http://www.jyw.com
http://www.etiantian.org
[root@bqh-118 ~]# awk -F / '{print $3}' qc2.log|sort|uniq -c|sort -r
      4 www.jyw.com
      3 www.etiantian.org
      3 www.baidu.com
      2 www.xiaobai.com

Method two: cut method

[root@bqh-118 ~]# cut -d / -f3 qc2.log |sort -r|uniq -c
      2 www.xiaobai.com
      4 www.jyw.com
      3 www.etiantian.org
      3 www.baidu.com
[root@bqh-118 ~]# cut -d / -f3 qc2.log |sort -r|uniq -c|sort -r
      4 www.jyw.com
      3 www.etiantian.org
      3 www.baidu.com
      2 www.xiaobai.com

Of course, there are other methods, here on a brief commonly used method.

Guess you like

Origin www.cnblogs.com/su-root/p/10994482.html