Whether it is used for work or deal with one of a variety of interview, linux sort are necessary to master the linux basic commands. Especially linux sort -k command, is often get confused, simply look carefully at the sort command
sort files sort content
grammar:
sort [-bcdfimMnr][-o<输出文件>][-t<分隔字符>][+<起始栏位>-<结束栏位>][--help][--verison][文件]
Options:
-b:忽略每行前面开始的空格字符,空格数量不固定时,该选项几乎是必须要使用的("-n"选项隐含该选项,测试发现都隐含)
-c:检查文件是否已经按照顺序排序,如未排序,会提示从哪一行开始乱序
-C:类似于"-c",只不过不输出任何诊断信息。可以通过退出状态码1判断出文件未排序
-d:只处理英文字母、数字及空格,忽略其他的字符
-f:将小写字母视为大写字母
-h:使用易读性数字(例如:2K、1G)
-i:除了040至176之间的ASCII字符外(八进制0-177),忽略其他的字符(忽略无法打印的字符如退格/换页键/回车)
-k:以哪个区间 (field) 来进行排序
-m:将几个排序好的文件进行合并,只是单纯合并,不做排序
-M:将前面3个字母依照月份的缩写进行排序
-n:依照数值的大小排序
-o<输出文件>:将排序后的结果存入指定的文件
-r:降序
-u:忽略相同行
-t<分隔字符>:指定分隔符,默认的分隔符为空白字符和非空白字符之间的空字符
Parameter is not presented directly to the example, first look at the raw data ordering cat sort.log
a mac 2000 500 2K
d winxp 4000 300 3G
e bsd 1000 600 4M
b linux 1000 200 5K
f SUSE 4000 300 6M
g winxp 500 300 3G
c win7 2000 100 7G
c Debian 600 200 8K
1, where to print the column began to be out of order sort -c sort.log; echo $?
sort: sort.log:4: disorder: b linux 1000 200 5K 1
sort -C sort.log; echo $?
1
Wherein a return result indicating that the file is not already sorted file 2, default sort (for entire row of ASCII characters in ascending order) sort sort.log
a mac 2000 500 2K
b linux 1000 200 5K
c Debian 600 200 8K
c win7 2000 100 7G
d winxp 4000 300 3G
e bsd 1000 600 4M
f SUSE 4000 300 6M
g winxp 500 300 3G
3, high-energy came, people confused k grammar, syntax k's first look
[ FStart [ .CStart ] ] [ Modifier ] [ , [ FEnd [ .CEnd ] ][ Modifier ] ]
This syntax can be one of the comma ( ",") is divided into two parts, part of Start and End Start and End portions which are made of three parts, of which part is Modifier options section similar to n and r can be omitted FStart , Fend, represents the use of the domain, and CStart said in FStart field from the first few characters began to count, "the first character of the sort," Similarly, CEnd represents the end of the first few characters of the sort is the last character, .CStart, .CEnd may be omitted, respectively, from the beginning of this domain to the domain of the tail end of this domain, CEnd set to 0, also showing the tail end of the domain. I said rumor, a few examples of it
3.1 pairs of the third column is sorted, if not n, in accordance with the ASCII character sort sort -t $'\t' -k 3 sort.log
b linux 1000 200 5K
e bsd 1000 600 4M
c win7 2000 100 7G
a mac 2000 500 2K
d winxp 4000 300 3G
f SUSE 4000 300 6M
g winxp 500 300 3G
c Debian 600 200 8K
After adding 3.2 n, sorted according to the value sort -t $'\t' -k 3n sort.log
g winxp 500 300 3G
c Debian 600 200 8K
b linux 1000 200 5K
e bsd 1000 600 4M
a mac 2000 500 2K
c win7 2000 100 7G
d winxp 4000 300 3G
f SUSE 4000 300 6M
3.3 Fend is not specified, a plurality of front to back ordering may -k, not forward from the back to front, a plurality of -k, data are consistent with the expected sort -t $'\t' -k 3n -k 1 sort.log
g winxp 500 300 3G
c Debian 600 200 8K
b linux 1000 200 5K
e bsd 1000 600 4M
a mac 2000 500 2K
c win7 2000 100 7G
d winxp 4000 300 3G
f SUSE 4000 300 6M
Back to front, a plurality of -k, the third column are the same, according to a first column in descending order, the data in line with expectations sort -t $'\t' -k 3n -k 1r sort.log
g winxp 500 300 3G
c Debian 600 200 8K
e bsd 1000 600 4M
b linux 1000 200 5K
c win7 2000 100 7G
a mac 2000 500 2K
f SUSE 4000 300 6M
d winxp 4000 300 3G
Replaced from front to back sort -t $'\t' -k 1 -k 3n sort.log
a mac 2000 500 2K
b linux 1000 200 5K
c Debian 600 200 8K
c win7 2000 100 7G
d winxp 4000 300 3G
e bsd 1000 600 4M
f SUSE 4000 300 6M
g winxp 500 300 3G
sort -t $'\t' -k 1 -k 3nr sort.log
a mac 2000 500 2K
b linux 1000 200 5K
c Debian 600 200 8K
c win7 2000 100 7G
d winxp 4000 300 3G
e bsd 1000 600 4M
f SUSE 4000 300 6M
g winxp 500 300 3G
By sort -t $'\t' -k 1 -k 3n sort.log
and sort -t $'\t' -k 1 -k 3nr sort.log
returned results found in the first column are equal, regardless of which three are arranged in the positive sequence, or in reverse order, the results are the same, does not take effect described -k behind when the specified FEendsort -t $'\t' -k 1,1 -k 3nr sort.log
a mac 2000 500 2K
b linux 1000 200 5K
c win7 2000 100 7G
c Debian 600 200 8K
d winxp 4000 300 3G
e bsd 1000 600 4M
f SUSE 4000 300 6M
g winxp 500 300 3G
3.4 Scope immediately following the options (such as "-k3n" of "n" and "-k2nr" of "n", "r") after field called private option, use a dash to write outside the field of options ( such as "-n", "- r" ) as a global option. When the option is not assigned a private field, the field will inherit the global ordering options, including but not limited to all options "bfnrhM" addition "b" option, the remaining option specifies whether or FEnd in FStart are equivalent, for "b" option to specify the role fstart fstart, specified in FEnd acting on FEnd sort -t $'\t' -k1r,2 sort.log
, one can see, two are arranged flashback
g winxp 500 300 3G
f SUSE 4000 300 6M
e bsd 1000 600 4M
d winxp 4000 300 3G
c win7 2000 100 7G
c Debian 600 200 8K
b linux 1000 200 5K
a mac 2000 500 2K
3.5 Note n option is specified when sorted by value, due to the "n" option is only an identification number and a minus "-" when ordering does not recognize the character when encountered, will lead to an immediate end to the sort of the key, n option will not under cross-domain to compare default, sort it will conduct a "final ranking" , to conduct a sort of default in accordance with the rules of the entire line, this sort as "the last of the sort."
sort -t $'\t' -k3n sort.log
In the third column are equal, the entire line will be arranged in ascending order according to the last ASCII
g winxp 500 300 3G
c Debian 600 200 8K
b linux 1000 200 5K
e bsd 1000 600 4M
a mac 2000 500 2K
c win7 2000 100 7G
d winxp 4000 300 3G
f SUSE 4000 300 6M
sort -t $'\t' -k3,4n -s sort.log
, Added after -s, not the final sort (1000 phase Consequently, e b in the front edge), but retain the original ordering
g winxp 500 300 3G
c Debian 600 200 8K
e bsd 1000 600 4M
b linux 1000 200 5K
a mac 2000 500 2K
c win7 2000 100 7G
d winxp 4000 300 3G
f SUSE 4000 300 6M
3.6 sorted by the first n characters of a domain sort -t $'\t' -k2.3,2.3 sort.log
, sorted according to the third character in the second column
c Debian 600 200 8K
a mac 2000 500 2K
e bsd 1000 600 4M
b linux 1000 200 5K
c win7 2000 100 7G
d winxp 4000 300 3G
g winxp 500 300 3G
f SUSE 4000 300 6M
4, -h legibility using numbers (e.g.: 2K, 1G) sort -t $'\t' -k5h sort.log
a mac 2000 500 2K
b linux 1000 200 5K
c Debian 600 200 8K
e bsd 1000 600 4M
f SUSE 4000 300 6M
d winxp 4000 300 3G
g winxp 500 300 3G
c win7 2000 100 7G
- sort -u and sort | uniq difference if -k option to specify sort, are not equivalent, uniq default is the entire line to be heavy
sort -t $'\t' -k2,2 -u sort.log
e bsd 1000 600 4M
c Debian 600 200 8K
b linux 1000 200 5K
a mac 2000 500 2K
f SUSE 4000 300 6M
c win7 2000 100 7G
d winxp 4000 300 3G
sort -t $'\t' -k2,2 sort.log|uniq
e bsd 1000 600 4M
c Debian 600 200 8K
b linux 1000 200 5K
a mac 2000 500 2K
f SUSE 4000 300 6M
c win7 2000 100 7G
d winxp 4000 300 3G
g winxp 500 300 3G
sort -t $'\t' -k2,2 -u sort.log
The second column to be heavy, and sort -t $'\t' -k2,2 sort.log|uniq
will be de-emphasis whole row (of course, also possible to re uniq according to the second column)
sort finishing finished, Daniel welcome advice