Sort positive multi-column sorting, reverse-order

linux sort multiple columns being sorted, reverse-order

Transfer from https://segmentfault.com/a/1190000005713784 Posted 2016-06-14 

sort in Linux is a very commonly used commands, ordering the tube

sort of each line of the file as a unit, with each other, the comparison principle is followed by ASCII code value is compared to the first character from the back, and finally their output in ascending order.

Usage: sort [option] ... [file] ...

Long option must be used when using the parameters short options too. Order options:

-b, --ignore-leading-blanks ignore leading blanks
-d, --dictionary-order consider only blanks and alphanumeric characters
-f, --ignore-case fold lower case to upper case characters
-g, --general-numeric-sort compare according to general numerical value
-i, --ignore-nonprinting consider only printable characters
-M, --month-sort compare (unknown) < `JAN' < ... < `DEC'
-n, --numeric-sort compare according to string numerical value
-r, --reverse reverse the result of comparisons

Other options:

-c, --check check whether input is sorted; do not sort
-k, --key=POS1[,POS2] start a key at POS1, end it at POS2 (origin 1)
-m, --merge merge already sorted files; do not sort
-o, --output=FILE write result to FILE instead of standard output
-s, --stable stabilize sort by disabling last-resort comparison
-S, --buffer-size=SIZE use SIZE for main memory buffer
-t, --field-separator=SEP use SEP instead of non-blank to blank transition
-T, --temporary-directory=DIR use DIR fortemporaries, Not the TMPDIR or $ / tmp; 
Multiple Options Multiple Directories the Specify
 -u, --unique with -C, Check for strict Ordering; 
the without - C, The First Output only equal RUN AN of
 the -Z, --zero End-terminated with Lines 0  byte , not NEWLINE
 - help display this help and exit
 --version output version information and exit

 

sort -u option

Removing duplicate rows in the output line

[ericshenMacPro@root duweixin]$ cat duweixin.net.txt
banana
apple
pear
orange
pear
[ericshenMacPro@root duweixin]$ sort duweixin.net.txt
apple
banana
orange
pear
pear
[ericshenMacPro@root duweixin]$ sort -u duweixin.net.txt
apple
banana
orange
pear

 

pear due to repeated -u option is ruthless deleted.

sort -r option

sort default sort is ascending, descending order if you want to change, add -r.

[ericshenMacPro@root duweixin]$ cat duweixin.net.txt
1
3
5
2
4
[ericshenMacPro@root duweixin]$ sort duweixin.net.txt
1
2
3
4
5
#倒序加-r
[ericshenMacPro@root duweixin]$ sort -r duweixin.net.txt
5
4
3
2
1

 

sort -o option

Because the default is to sort the output to standard output, so it is necessary to write the results to a file using redirection, shaped like a sort oldfile> newfile
However, if you want to sort the results output to the original file, use the redirection may die .

[ericshenMacPro@root duweixin]$ sort -r duweixin.txt > duweixin.txt
[ericshenMacPro@root duweixin]$ cat duweixin.txt
[ericshenMacPro@root duweixin]$

Look, even the duweixin cleared.

Plus the -o option to solve this problem, rest assured you will write the results to the original file.

[ericshenMacPro@root duweixin]$ cat number.txt
1
3
5
2
4
[ericshenMacPro@root duweixin]$ sort -r number.txt -o number.txt
[ericshenMacPro@root duweixin]$ cat number.txt
5
4
3
2
1

 

sort -n option

Have you encountered cases of 10 to 2 small. I met anyway. This occurs because the sorting program will sort the numbers by character, the sort program will first compare 1 and 2, apparently a little, so it will be 10 in 2 in front of myself. This is the sort of consistent style.

If we want to change this situation, you must use the -n option to tell the sort, "to be sorted numerically"!

[ericshenMacPro@root duweixin]$ cat duweixin.net.txt
1
10
19
11
2
5
[ericshenMacPro@root duweixin]$ sort duweixin.net.txt
1
10
11
19
2
5
[ericshenMacPro@root duweixin]$ sort -n duweixin.net.txt
1
2
5
10
11
19

sort -t option and -k options

If the content of a file like this:

[ericshenMacPro@root duweixin]$ cat facebook.txt
banana:30:5.5
apple:10:2.5
pear:90:2.3
orange:20:3.4

This file has three columns, from column to column separated by a colon, the first column indicates the type of fruit, the second column indicates the number of fruit, the third column indicates the fruit prices.

Well, I think the number of fruit to be sorted, it is the second column to sort, sort how to achieve?

Fortunately, sort -t option is provided, behind the operator can set the interval. (-D option is not thought of cut and paste, the resonance ~ ~)

After specifying the break character, it can be used to specify the number of columns -k.

[ericshenMacPro@root duweixin]$ sort -n -k 2 -t : facebook.txt
apple:10:2.5
orange:20:3.4
banana:30:5.5
pear:90:2.3

 

We use as a colon symbol interval, and performs for the second column value in ascending order, the result is very satisfactory.

Other common sort options

- f will be lowercase letters are converted to uppercase for comparison, that is ignoring the case

 - c checks whether the file has been sorted, if the relevant information out of order, the output of the first row out of order, and finally return 1

 - C checks whether the file has been sorted, if out of order, does not output the contents, returns only 1

 - M will sort by month, for example, less than JAN FEB etc.

 - b ignore all of the blanks in front of each row, from The first visible character start comparing. 

Sometimes learning script, you will find the sort command followed by a bunch of similar -K1, 2 , or -K1. 2 
stuff -k3.4, some incredible. Today, we have to get it --k options!

 

Multi-column sorting

First, the preparatory material

The first field is the company name, the second field is the number of companies, the third domain is the average wage of employees.

$ CAT duweixin.net.txt 
Google 110  5000 
baidu 100  5000 
guge 50  3000 
SOHU 100  4500

Second, I want this to sort files alphabetically by company, that is, according to the first field to sort :( this duweixin.net.txt file has three fields)

$ Sort -t '' -k . 1 duweixin.net.txt 
baidu 100  5000 
Google 110  5000 
guge 50  3000 
SOHU 100  4500

The direct use of -k 1 is set on it. (In fact, here is not strict, you'll know later)

Third, I want duweixin.net.txt sorted according to the number of companies

$ Sort -n -t '' -k 2 duweixin.net.txt 
guge 50  3000 
baidu 100  5000 
SOHU 100  4500 
Google 110  5000

However, there is a problem here, and that is the same as the number of companies and sohu baidu, are 100 people, this time how to do it? By default rule, in ascending order beginning from the first domain, in the front row baidu sohu.

Fourth, I want facebook.txt sorted according to the number of companies, the number of employees in accordance with the same sort of average wage in ascending order:

$ sort -n -t ‘ ‘ -k 2 -k 3 duweixin.net.txt
guge 50 3000
sohu 100 4500 baidu 100 5000 google 110 5000 

Plus a -k2 -k3 to solve the problem. To drop, sort support this setting, that setting the priority field sort, first sort to a second domain, if the same, then the third sort fields.

Fifth, I want facebook.txt in accordance with wages in descending order, if the same number of employees, the number of companies sorted in ascending order

$ sort -n -t ‘ ‘ -k 3r -k 2 duweixin.net.txt
baidu 100 5000
google 110 5000 sohu 100 4500 guge 50 3000 

There are a number of tips used here, you look at the back 3 -k secretly added a lowercase letter r. Think about it, in combination with our last article, it can get an answer? Announced: The role of r and -r option is the same, that is expressed in reverse order. Because the default sort is ascending order, so the need to add here r represents the third domain (the average wage of employees) is sorted in descending order. Here you can add n, it means the time to sort this domain, to be sorted according to numerical values, give you an example:

$ sort -t ‘ ‘ -k 3nr -k 2n duweixin.net.txt
baidu 100 5000
google 110 5000 sohu 100 4500 guge 50 3000 

Remove the front of the -n option, but will add it to each of the -k option.

Sixth, the specific syntax -k option

We should continue down, then we have to point the theoretical knowledge. You need to understand the -k option syntax, as follows:

[ FStart [ .CStart ] ] [ Modifier ] [ , [ FEnd [ .CEnd ] ][ Modifier ] ]

The syntax can be one of the comma ( ",") is divided into two parts, Start and End section section.

Instill give you an idea, that is, "If you do not set the End section, then that End is set to end of the line." This concept is important, but often you do not pay attention to it.

Start section also consists of three parts, which Modifier part of what we said before similar option part n and r. We focus here FStart and C.Start Start section.

C.Start also be omitted omitted if it starts from the beginning of this domain. The previous example -k 2 -k 3 is omitted, and an example of C.Start myself.

FStart.CStart, which represents the domain FStart is used, and CStart said in FStart field began to count from the first few characters of "the first character of the sort."

Similarly, in the End section, you can set FEnd.CEnd, if you .CEnd omitted indicates the end of the last character "domain tail", i.e., the local domain. Or, if you will CEnd set to 0 (zero), also signify the end of the "domain tail."

Seven from the second letter of the English name of the company to begin sorting:

$ sort -t ‘ ‘ -k 1.2 duweixin.net.txt
baidu 100 5000
sohu 100 4500 google 110 5000 guge 50 3000 

Using -k 1.2, which represents the first domain of a second character string to the start until the last character of this sort field. You will find baidu because the second letter is a rather top of the list. google sohu and the second character is O, but sohu google O h of the front, so that both the second and third rows, respectively. guge the only placing him fourth.

Eight, only to be sorted for the second letter of the company name in English, if the same descending order according to wages and salaries:

$ sort -t ‘ ‘ -k 1.2,1.2 -k 3,3nr duweixin.net.txt
baidu 100 5000 google 110 5000 sohu 100 4500 guge 50 3000 

Because only the second letter of sorts, so we use the representation -k 1.2,1.2, showing us "only" the second letter of the sort. (If you ask, "how I use -k 1.2 does not work?" Of course not, because you omit the End section, which means that you will play the string until the last character in this field from the second letter sorting ). For the wages were sorted, we also use the -k 3,3, this is the most accurate representation, we represent "only" sort of the field, because if you omit the back 3, we became "the first 3 domain content to start the last field position of the sort, "the.

Nine, which is also part of the modifier options can be used?

It can be used b, d, f, i, n or r.

Where n and r sure you are already familiar with.

b represents a blank check symbol ignored this field.

d represents a domain of the lexicographically ordering (i.e., considering only the blank and letters).

f represents the domain of this sort to ignore case.

i omit "unprintable characters" just sort for printable characters. (Some non-printable ASCII character is, the alarm is such a, b is a backspace, n being newline, r is a carriage return, etc.)

Ten, thinking thinking and -u -k examples of joint use:

$ cat duweixin.net.txt
google 110 5000
baidu 100 5000
guge 50 3000 sohu 100 4500 

This is the most primitive duweixin.net.txt file.

$ sort -n -k 2 duweixin.net.txt
guge 50 3000
baidu 100 5000 sohu 100 4500 google 110 5000 $ sort -n -k 2 -u duweixin.net.txt guge 50 3000 baidu 100 5000 google 110 5000 

When setting numerical ordering staff to the domain and then add -u, sohu line has been deleted! -U recognition domain with only the original set -k, found to be identical, it will follow the same row deleted.

$ sort  -k 1 -u duweixin.net.txt
baidu 100 5000
google 110 5000 guge 50 3000 sohu 100 4500 $ sort -k 1.1,1.1 -u duweixin.net.txt baidu 100 5000 google 110 5000 sohu 100 4500 

This example also empathy, beginning the character of g guge would not survive.

$ sort -n -k 2 -k 3 -u duweixin.net.txt
guge 50 3000
sohu 100 4500 baidu 100 5000 google 110 5000 

what! Here set up a two-tier ordering priority cases, the use -u would not delete any rows. The original -u -k will weigh all the options, all the same will be deleted, as long as there is a different will not easily removed:) (do not believe, you can add your own try line sina 100 4500)

XI, the most bizarre sort:

$ sort -n -k 2.2,3.1 duweixin.net.txt
guge 50 3000
baidu 100 5000 sohu 100 4500 google 110 5000 

A second character to the second field begins the third field first character of the end portion sorted.

The first line, 03 will extract, extracting the second line 005, third line 004 extraction, 105 the fourth line extraction.

And because the sort considered less than 0 and less than 00 000 less than 0000 ....

So 03 certainly in the first. 105 is definitely the last. But why 005 was in front of 00 4 it? (You can do your own experiments to think about.)

The answer is revealed: the original "cross-domain setting an illusion", the second character sort compares only the second field to the second part of the last character of the domain, and without the beginning of the third domain of character included in the comparison range. When they find 00 and 00 are the same, sort will automatically compare the first field to go. Of course, the sohu baidu in front. It can be confirmed by an example:

$ sort -n -k 2.2,3.1 -k 1,1r duweixin.net.txt
guge 50 3000 sohu 100 4500 baidu 100 5000 google 110 5000

Guess you like

Origin www.cnblogs.com/zhangmingda/p/12469739.html