sort sorting; uniq weight removal; tr standard input editing; cut content editing; Shell--regular expression

1. The sort command

  • Sort the contents of the files by row, or according to different data types

① Format

sort [选项] 参数
cat file | sort 选项

②Common options

-f:忽略大小写,默认会大写字母排在前面
-b:忽略每行前面的空格
-n:按照数字进行排序
-r:反向排序
-u:等同uniq,表示相同的数据仅显示一行
-t:指定字段分隔符,默认使用[tab]键分隔
-k:指定排序字段
-o <输出文件>:将排序后的结果转存至指定文件

③Example

Insert picture description here
Insert picture description here
Insert picture description here

2. The uniq command

  • Used to report or ignore consecutive repeated lines in a file, often used in conjunction with the sort command

① Format

uniq [选项] 参数
cat file | uniq 选项

②Common options

-c:进行计数,并删除文件中重复出现的行
-d:仅显示连续的重复行
-u:仅显示出现一次的行

③Example

Insert picture description here
Insert picture description here
Insert picture description here

3. The tr command

  • Commonly used to replace, compress and delete characters from standard input

① Format

tr [选项] [参数]

②Common options

-c:保留字符集1的字符,其他的字符(包括换行符\n)用字符集2替换
-d:删除所有属于字符集1的字符
-s:将重复出现的字符串压缩为一个字符串,用字符集2 替换 字符集1
-t:字符集2 替换 字符集1,不加选项同结果

③Parameter

  • Character set 1:
    Specify the original character set to be converted or deleted. When performing a conversion operation, the parameter "Character Set 2" must be used to specify the conversion operation, and the parameter "Character Set 2" must be used to specify the target character set of the conversion. But when executing the delete operation, the parameter "character set 2" is not required
  • Character set 2:
    Specify the target character set to be converted

④Example

Insert picture description here

4. The cut command

  • Display the specified part of the line, delete the specified field in the file

① Format

cut 选项 参数
cat file | cut 选项

②Common options

-f :通过指定哪一个字段进行提取。cut命令使用"tab"作为默认的字段分隔符
-d : "TAB”是默认的分隔符,使用此选项可以更改为其他的分隔符
--complement :此选项用于排除所指定的字段
--output-delimiter :更改输出内容的分隔符

③Example

Insert picture description here

5. Regular expressions

Usually used in judgment statements to check whether a string meets a certain format

Regular expressions are composed of ordinary characters and metacharacters

Common characters include uppercase and lowercase letters, numbers, punctuation marks and some other symbols

Metacharacters refer to special characters with special meaning in regular expressions. They can be used to specify the appearance of the leading character (the character before the metacharacter) in the target object.

①Common metacharacters in basic regular expressions (support tools: egrep, awk, sed, grep)

\ :转义字符,用于取消特殊符号的含义,例: \!、\n、\$等

^ :匹配字符串开始的位置,例: ^a、 ^the、 ^#、^[a-z]

$ :匹配字符串结束的位置,例: word$、 ^$匹配空行

. :匹配除\n之外的任意的一个字符,例: go.d、 g..d

* :匹配前面子表达式0次或者多次,例: goo*d、 go.*d

[list] :匹配list列表中的一个字符,例: go[ola]d, [abc]、 [a-z]、 [a-z0-9]、 [0-9]匹配任意一位数字

[^list] :匹配任意非list列表中的一个字符,例: [^0-9]、 [^A-20-9]、 [^a-z]匹配任意一位非小写字母

\{n\} :匹配前面的子表达式n次,例: go\{2\}d、 '[0-9]\{2\} '匹配两位数字

\{n,\} :匹配前而的子表达式不少于n次,例: go\{2, \}d、'[0-9]\{2, \}'匹配两位及两位以上数字

\{n,m\} :匹配前面的子表达式n到m次,例: go\{2,3\}d、 ' [0-9]\{2,3\}'匹配两位到三位数字

Note: when egrep and awk use {n}, {n, small, {n, m} to match, there is no need to add "\" before "{}"

②Extend regular expression metacharacters (supported tools: egrep, awk)

+ :匹配前面子表达式1次以上,例: go+d, 将匹配至少一个o, 如god、 good、 goood等

? :匹配前面子表达式0次或者1次,例: go?d, 将匹配gd或god

() :将括号中的字符串作为一个整体,例1: g(oo)+d," 将匹配oo整体1次以上,如good、gooood等

| :以或的方式匹配字条串,例: g (oo|la)d," 将匹配good或者glad

③Example

①: Match email address
Insert picture description here

egrep "^[a-zA-Z][a-zA-Z0-9\.\-]{4,}[a-zA-Z0-9]{1,}[@][A-Za-z0-9]+[\.][a-zA-Z]+[\.]*[a-zA-Z]*$" yx.txt

Insert picture description here
②: Match mobile phone numbers starting with 13 and 15
Insert picture description here

grep "^1[35][0-9|?]\{9\}$" ph.txt

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_53496478/article/details/114872263