Regular expression of shell
table of Contents
- 1. Regular expression of shell
-
- 1. The sort command ---- sort the contents of the file by line unit, or according to different data types
- 2. The uniq command---used to report or ignore consecutive repeated lines in a file, often combined with the sort command
- Three, tr command --- commonly used to replace, compress and delete characters from standard input
- Two, regular expression
1. Regular expression of shell
1. The sort command ---- sort the contents of the file by line unit, or according to different data types
Syntax format:
sort [选项] 参数
cat file | sort 选项
常用选项:
-f:忽略大小写
-b:忽略每行前面的空格
-n:按照数字进行排序
-r:反向排序
-u:等同于uniq,表示相同的数据仅显示一行
-t:指定字段分隔符,默认使用[Tab]键分隔
-k:指定排序字段
-o <输出文件>:将排序后的结果转存至指定文件
sort -n test.txt
sort -u test.txt
sort -t ":" -k3 -n /etc/passwd
du -ah | sort -nr -o du.txt
2. The uniq command—used to report or ignore consecutive repeated lines in a file, often combined with the sort command
语法格式:
uniq [选项] 参数
cat file | uniq 选项
常用选项:
-c:进行计数,并删除文件中重复出现的行
-d:仅显示重复行
-u:仅显示出现一次的行
Three, tr command-commonly used to replace, compress and delete characters from standard input
语法格式:
tr [选项] [参数]
常用选项:
-c:保留字符集1的字符,其他的字符用(包括换行符\n)字符集2替换
-d:删除所有属于字符集1的字符
-s:将重复出现的字符串压缩为一个字符串;用字符集2 替换 字符集1
-t:字符集2 替换 字符集1,不加选项同结果。
参数:
字符集1:指定要转换或删除的原字符集。当执行转换操作时,必须使用参数“字符集2”指定转换的目标字符集。但执行删除操作时,不需要参数“字符集2”;
字符集2:指定要转换成的目标字符集。
[root@localhost ~]#echo "abc" | tr 'a-z' 'A-Z'
ABC
#将abc替换成ABC
[root@localhost ~]#echo abccabacca | tr -c "ab\n" "0"
ab00aba00a
#保留ab字符,将其他字符替换成00
[root@localhost ~]#echo 'hello world' | tr -d 'od'
hell wrl
#删除od字符
[root@localhost ~]#echo "thissss is a text linnnnnnne." | tr -s 'sn'
this is a text line.
#将重复出现的s n字符压缩成一个字符
[root@localhost ~]#cat 123.txt
aa
bb
[root@localhost ~]#cat 123.txt | tr -s "\n"
aa
bb
#删除空行
[root@localhost ~]#echo $PATH | tr -s ":" "\n"
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/root/bin
#把路径变量中的冒号":",替换成换行符"\n"
#删除Windows文件“造成”的'^M'字符:
cat 22.txt | tr -s "\r" "\n" > new_22.txt
或cat 22.txt | tr -d "\r" > new_22.txt
Linux中遇到换行符("\n")会进行回车+换行的操作,回车符反而只会作为控制字符("^M")显示,不发生回车的操作。而windows中要回车符+换行符("\r\n")才会回车+换行,缺少一个控制符或者顺序不对都不能正确的另起一行。
[root@localhost ~]#rz -E
rz waiting to receive.
[root@localhost ~]#cat -A aa.txt
aa^M$
^M$
^M$
^M$
#在window创建一个文件,放入linux里面cat -A aa.txt 看到每个空格显示^M$.格式会发生变化
yum install -y dos2unix
dos2unix 33.txt #借用这个工具也可以改变格式,需要安装
Array sort
abc=(3 5 8 7 9 2 1)
echo ${abc[*]} | tr ' ' '\n' | sort -n
Two, regular expression
1. Function
Regular expressions are usually used in judgment statements to detect whether a string meets a certain format
2. Regular expression composition
Regular expression is composed of ordinary characters and metacharacters
Common characters include uppercase and lowercase letters, numbers, punctuation marks and some other symbols
Metacharacters refer to special characters with special meaning in regular expressions. They can be used to specify the appearance mode of its leading character (the character before the metacharacter) in the target object
3. Common metacharacters in basic regular expressions (supported tools: grep, egrep, sed, awk)
Basic regular expression common metacharacters | description |
---|---|
\ | The escape character is used to cancel the meaning of special symbols. Example: !, \n, $, etc. |
^ | The position where the match string begins. Example: a, the, #, [az] |
$ | The position where the matching string ends. Example: wordKaTeX parse error: Expected group after'^' at position 2:, ^ ̲ matches blank lines |
. | Match any character except \n. Example: go.d, g...d |
* | Match the preceding sub-expression 0 or more times, for example: goo*d, go.*d |
[list] | Match a character in the list, for example: go[ola]d, [abc], [az], [a-z0-9], [0-9] match any digit |
[^list] | Match any character in a non-list list, for example: [^0-9], [^A-Z0-9], [^az] match any non-lowercase letter |
{n} | Match the preceding sub-expression n times, for example: go{2}d,'[0-9]{2,}' matches two digits |
{n} | Match the preceding sub-expression no less than n times, for example: go{2,}d,'[0-9]{2,}' match two or more digits |
{n,m} | Match the preceding sub-expression n to m times, for example: go{2,3}d,'[0-9]{2,3}' match two to three digits |
Note: when egrep and awk use {n}, {n,}, {n,m} to match, "{}" does not need to be added before "\"
4. Extended regular expression metacharacters (supported tools: egrep, awk)
Extended regular expression metacharacters | description |
---|---|
+ | Match the previous sub-expression more than once, for example: go+d, will match at least one o, such as god, good, goood, etc. |
? | Match the previous sub-expression 0 or 1 time, for example: go?d, it will match gd or god |
() | Take the string in parentheses as a whole, for example: g(oo)+d, it will match oo as a whole more than once, such as good, gooood, etc. |
| | Match the string of words in an or manner, for example: g(oo|la)d, will match good or glad |