Sorting commands sort, uniq and character conversion commands tr and regular expressions (detailed to shock my mother for a whole year!)

Sorting commands sort, uniq and character conversion commands tr and regular expressions

One, sort command

Sort the contents of the files by row, or according to different data types

  • format:
sort [选项] 参数 
cat file | sort 选项
  • Common options and their functions
Common options Role/function
-f Ignore case, for example, A and a are regarded as the same encoding
-b Ignore the first space character part
-M Sort by three-character month, such as JAN, DEC
-n Sort by number
-r Reverse sort
-u It is uniq, only one row of the same data is displayed
-t Specify the separator, use the [Tab] key to separate by default
-k Specify sort field
-The Export the sorted results to the specified file

Example 1:

①Personal accounts are recorded under /etc/passwd, use cat /etc/passwd | sort to sort, the default sort is [first character], and it is sorted in the form of [text], so it is sorted from a to the last

Insert picture description here

② Sort by the third column. When -n is not added, you can see that it is not sorted according to the size of the number, which is a mess

Insert picture description here

③If you want to output the sorted results to other files, you can use -o [file location], no need to create a new file again, it will automatically create a 11.txt

sort -t ':' -k 3 -n /etc/passwd -o 11.txt

Insert picture description here

Check the output result

Insert picture description here

④After adding -n to tell sort to sort by numbers, it is very clear

Insert picture description here

⑤Plus -r for reverse sorting

Insert picture description here

Example 2:

① Create a 1.txt file and sort it with sort. It can be known that the priority is blank line>number>letter order>lowercase letter>uppercase letter, on this basis (space>tab)

Insert picture description here

Use cat -A to clearly see the relevant special symbols

Insert picture description here

②Use sort -f to sort as follows, priority: blank line>Tab>space>number>letter order>uppercase letter>lowercase letter

Insert picture description here

③Using sort -u to de-duplicate, you can notice that only ABC and 123 achieve de-duplication. Because the lines containing spaces and Tabs in front are not the same data

Insert picture description here

Supplement: Count the size of all files in the directory

①To sort all files in the folder by size, and only look at the 5 largest ones, you can use the following command

du -a | sort -rn | head -5

Command explanation:

du -a counts the size of all files in the directory

sort -nr is sorted by number size and is reverse sorting, that is, reverse sorting

No option after head will display the first 10 by default, and only the first 5 will be displayed by adding -5

Insert picture description here

Note: -n means sort by number, only recognize the number but not the unit. The number in this example is the file size, and the unit is the default KB, so this command cannot use du -ah, which will make the sort result appear 5M less than 20K Happening.

②If you need to display in MB, you can use the command

du -am | sort -rn | head -5

Insert picture description here

③If you need to save the statistical results in other files, you can use the command

du -am | sort -rn | head > du4.txt

Insert picture description here

④Save in du5.txt file, and then view

du -am | sort -rn -o du5.txt 
head du5.txt

Insert picture description here

Two, uniq command

Used to report or ignore consecutive repeated lines in a file, often used in conjunction with the sort command

  • format:
uniq [选项] 参数 
cat file | uniq 选项
  • Common options and their functions
Common options function
-c Count and delete repeated lines in the file
-d Show only duplicate rows
-u Show only lines that appear once
-i Ignore differences in uppercase and lowercase characters

Example:

①uniq -c demo

ABC actually appears three times, but only 2 times are counted here because it is not in the adjacent row.

Insert picture description here

②uniq -d and uniq -u demo

Insert picture description here

Three, tr command

Commonly used to replace, compress and delete characters from standard input

  • format:
tr [选项] [参数]

Common options and their functions

Common options function
-c Characters in character set 1 are reserved, and other characters (including newline \n) are replaced with character set 2
-d Delete all characters belonging to character set 1
-s Compress repetitive strings into one string; replace character set 1 with character set 2
-t Character set 2 replaces character set 1, and the result is the same if no option is added.

Parameters and their functions

parameter function
Character set 1 Specify the original character set to be converted or deleted.
When performing the conversion operation, you must use the parameter "Character Set 2" to specify the target character set for conversion.
However, the parameter "Character Set 2" is not required when performing the delete operation;
Character set 2 Specify the target character set to be converted.

Example:

①Use the command below to turn all lowercase letters into uppercase letters in the information output by last.

last | tr 'a-z' 'A-Z'

Insert picture description here

②Using tr -c, keep the specified characters in character set 1 (or special symbols such as line breaks) unchanged, and replace other characters with the last specified characters

echo -e "abc\nabcdab" | tr -c "ab\n" "0"

Insert picture description here

③The usage of tr -d deletes the specified characters, and the parameter character set 2 is not required.

In fact, double quotation marks "", single quotation marks" and no quotation marks can be executed, and the result is the same

echo "hello world" | tr -d "od"

Insert picture description here

④ The usage of tr -s compresses the repeated string into one character, or after compression, replace the compressed character with string 2

echo "hellllo worrrllllld" | tr -s 'lr'
echo "hellllo worrrllllld" | tr -s 'lr' '78'

Insert picture description here

For example, you can also replace the colon ":" in the path variable PATH with a newline character "/n"

echo $PATH | tr -s ":" "\n"

Insert picture description here

⑤Combine the sort command to complete the sorting of the array

arr=(8 5 2 9 7 1)
echo ${arr[*]} | tr ' ' '\n' | sort -n 

Insert picture description here

Supplement: Remove the line-breaking symbol "^M" in the Windows file when it is moved to the Linux system

So when we want to use text files in Windows in Linux systems, we can remove the ^M line-breaking symbol left by DOS files. The most troublesome thing for people in Linux&Windows systems is this, that is, under DOS. The line break symbol ^M will be automatically added at the end of each line. We can use this tr to remove ^M!, ^M can be replaced by \r.

Example:

Drag the file in the Windows system to xshell, and then use cat -A or cat -e to view its special symbols

Insert picture description here

To view the special characters that the tr command can handle, use the following command to view

man tr

Insert picture description here

Delete the "^M" character in the DOS file

cat tr.txt | tr -s "\r" "\n" > tr1.txt
或
cat tr.txt | tr -d "\r" > tr1.txt

Insert picture description here

Four, regular expressions

1. What is a regular expression

Regular expressions are usually used in judgment statements to check whether a string satisfies a certain format

Regular expressions are composed of ordinary characters and metacharacters

Common characters include uppercase and lowercase letters, numbers, punctuation marks and some other symbols

Metacharacters refer to special characters with special meaning in regular expressions, which can be used to specify the appearance mode of its leading character (that is, the character before the metacharacter) in the target object

2. Basic regular expressions

  • Supported tools: grep, egrep, sed, awk
基础正则表达式常见元字符:
\ :转义字符,用于取消特殊符号的含义,例:\!、\n、\$等

^ :匹配字符串开始的位置,例:^a、^the、^#、^[a-z]
 
$ :匹配字符串结束的位置,例:word$、^$匹配空行

. :匹配除\n之外的任意的一个字符,例:go.d、g..d

* :匹配前面子表达式0次或者多次,例:goo*d、go.*d

[list] :匹配list列表中的一个字符,例:go[ola]d,[abc]、[a-z]、[a-z0-9]、[0-9]匹配任意一位数字

[^list] :匹配任意非list列表中的一个字符,例:[^0-9]、[^A-Z0-9]、[^a-z]匹配任意一位非小写字母

\{n\} :匹配前面的子表达式n次,例:go\{2\}d、'[0-9]\{2\}'匹配两位数字

\{n,\} :匹配前面的子表达式不少于n次,例:go\{2,\}d、'[0-9]\{2,\}'匹配两位及两位以上数字

\{n,m\} :匹配前面的子表达式n到m次,例:go\{2,3\}d、'[0-9]\{2,3\}'匹配两位到三位数字

注:egrep、awk使用{n}、{n,}、{n,m}匹配时“{}”前不用加“\”

3. Extend the regular expression

  • Supported tools: egrep, awk
扩展正则表达式元字符:
+ :匹配前面子表达式1次以上,例:go+d,将匹配至少一个o,如god、good、goood等

? :匹配前面子表达式0次或者1次,例:go?d,将匹配gd或god

() :将括号中的字符串作为一个整体,例1:g(oo)+d,将匹配oo整体1次以上,如good、gooood等

| :以或的方式匹配字条串,例:g(oo|la)d,将匹配good或者glad

Example:

① Use regular expressions to match email addresses

用户名@ :^([a-zA-Z0-9_\-\.\+]+)@
子域名 :([a-zA-Z0-9_\-\.]+)
.顶级域名(字符串长度一般在2到5) :\.([a-zA-Z]\{2,5\})$

egrep '^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$' youxiang.txt
awk '/^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/{print $0}' youxiang.txt

②If it is required that no special symbols can be used in front of @:

egrep '^([a-zA-Z0-9_\-\.\+]+)([a-zA-Z0-9])@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$' youxiang.txt

Insert picture description here

③Match the relevant mobile phone number

The phone number has 11 digits. If you want to match the phone number starting with 187, that is to say, the last 8 digits are a combination of random numbers

egrep、awk使用{n}、{n,}、{n,m}匹配时“{}”前不用加“\”
egrep "^187[0-9]{8}$" haoma.txt
如果用grep,需要加“\”
grep "^187[0-9]\{8\}$" haoma.txt 

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_35456705/article/details/112133474