RE regular expressions and grep, awk and sed tools Detailed usage

1. What is a regular expression

Popular, the method is to deal with the regular expression string, he is to be treated in units of behavior strings, regular expressions, assisted by a number of special symbols, allows users to easily achieve "search / delete / instead of "a particular string handler! A substantially regular expression "expression", as long as the tool supports this expression, then the program can be used as a tool string handling the regular expression used. Such as vi, grep, awk, sed and much more, because they have to support regular expressions, so these tools you can use special characters in regular expressions to process strings. But such as cp, ls, etc. instruction not support regular expressions, so you can only use wildcards Bash itself only.

2. RE characters and special symbols

1. Special symbol
Special symbols Meaning representatives
[: Scooping] Represent all uppercase and lowercase characters and numbers, i.e. 0-9 A - Z a-z
[:alpha:] Represents any case English characters, i.e. A-Z a-z
[:lower:] Representative lowercase characters, i.e., a-z
[:upper:] On behalf of uppercase English characters, namely A-Z
[:digit:] It represents a number, that is, 0-9
[:blank:] On behalf of the Tab key or the spacebar
[:cntrl:] It represents all the keys on the keyboard, such as: CR, LF, Tab, Del etc.
[:graph:] In addition to representatives of other keys and the spacebar Tab key
[:print:] It represents punctuation, such as: ''; and so # $!?
[:space:] It represents any character can produce white keys, such as: space key, Tab, CR, etc.
[: Xdigit] Representative into 16-bit digital type, comprising: 0-9, AF, af numeric character

注意:前五条是常用的,其它一般不怎么用或者能用其它方式代替

Example 2. special symbols

We do go online to find examples of some of the data

Hello! / Hi!
Good-bye, "Mike".
See you tomorrow.
It’s time for class.
Open your books and turn to page 20.
Could you say it again? 
Where’s the company? 
Which is the right size? 
Do you know where I’ve put my glasses? 
Is this your pen? I found it under the desk. 
Which is your bag? 
The one on your right. 
Are these books all yours? 
She must be a model, isn’t she? 
I really don’t known.
#I have no idea about it.
What’s your family name?
Rose, let me introduce my friend to you.
Nice to meet you, too.
toooooot 
What day is it today?
It’s January the 15th, 1999.
It’s the year of 1999.
However, this dress is about $ 3183 dollars.
我们将数据写入一个文件里面
[root@localhost tmp]# vim test.text

Start matching

[root@localhost tmp]# grep "[:alnum:]" test.text         #我们不能直接使用这些特殊符号
grep: 字符类的语法是 [[:space:]],而非 [:space:]
[root@localhost tmp]# grep "[[:alnum:]]" test.text        #我们需要用[]将其括起来

Here Insert Picture Description

3.RE character and usage
RE characters Meaning and usage examples
^ Meaning: the beginning of the line to be matched
[root@localhost tmp]# grep -n '^#' test.text      #匹配到以#开头的行,并打印出行号
-n 参数表示显示行号

Here Insert Picture Description

RE characters Meaning and usage examples
$ End of the line to be matched: Meaning
[root@localhost tmp]# grep -n '!$' test.text      #匹配到以#开头的行,并打印出行号

Here Insert Picture Description

RE characters Meaning and usage examples
. Meaning: an arbitrary character
[root@localhost tmp]# grep -n 'e.e' test.text      
#匹配到的字符串可以是(ere)(eve),就是两个e中间一定有且仅有一个其它字符,空字符也算。
#但不能是(ee) 如:See you tomorrow.  See虽然有两个e,但是不符合就没有出现

Here Insert Picture Description

RE characters Meaning and usage examples
\ Meaning: the escape character, the special significance of special symbols removed
[root@localhost tmp]# grep -n '\$' test.text     
#    $   本来是匹配结尾的但是这里转义就能匹配到

Here Insert Picture Description

RE characters Meaning and usage examples
* Meaning: a character will repeat the previous zero to infinity times
[root@localhost tmp]# grep -n 'to*' test.text   
#  匹配到单个t(零次o) 匹配到too (多次o)  

Here Insert Picture Description

RE characters Meaning and usage examples
[] Meaning: [] represents a character which is matched to the inside
[root@localhost tmp]# grep -n 'o[nmt]' test.text  
# [nmt]代表n或m或t 所以会匹配到(on)(om)(ot)  

Here Insert Picture Description

[root@localhost tmp]# grep -n '[0-9]' test.text  
#[0-9]就相当于[0123456789]   

Here Insert Picture Description

RE characters Meaning and usage examples
[^] Meaning: that matched character except inside
[root@localhost tmp]# grep -n 'on[^e]' test.text 
#[^e] 表示除了e之外其它字符都可以即匹配不到one
#空字符也算

Here Insert Picture Description

RE characters Meaning and usage examples
{n,m} Meaning: a character before repeated n times to m
[root@localhost tmp]# grep -n 'to\{1,3\}' test.text 
#重复o 一到三次
#要跟* 区分 *表示重复零次到无穷多次

Here Insert Picture Description
Here Insert Picture Description

3.grep

1.grep talk to some of the regular mix of parameters
-A   n  	把匹配成功行之后的n行也同时列出。 A 就是 after 的首字母就是之后的意思
-B   n  	把匹配成功行之前的n行也同时列出。B 就是 before 的首字母就是之前的意思
-C  n  	把匹配成功行的前后n行也同时列出。
-o   		只显示匹配到的字符
-c   		统计数量    (一般是跟其它参数一起用)
-n   		显示行号     (前面说过了)
-l			只要文件名   
2.gerp extended regular

With extended regular must egrep or grep -E

RE characters Meaning and usage examples
+ Meaning: The previous character is repeated once or more than once
Meaning: The previous character zero or one repeat
| Meaning: Represents or
() Meaning: Groups string
()+ Meaning: Before a group of string repeated one or more times
这里面+  和 ?  我就不多说了,注意要比较和* 的区别
而 |  的用法也特别简单比如:
 [root@localhost tmp]# echo "tooooabsdsadooo"  | grep -E 'too|ab'     #表示匹配too或者ab
tooooabsdsadooo             #too  和 ab  将被标红
[root@localhost tmp]# echo "goodaaaglad"  | grep -E 'g(oo|la)d'          #表示匹配到good或者glad
goodaaaglad                  #good和glad将被标红
[root@localhost tmp]# echo "goodgabcabcabcglad"  | grep -E 'g(abc)+g'  #是匹配到gabcabcabcg
goodgabcabcabcglad        #找的是以g开头g结尾,中间是一个以上的abc

Write match the date format YYYY-MM-DD regular expression (to digest)

[root@localhost tmp ~]# echo "2019-12-30" |grep -E '[1-9][0-9]{3}-((0[1-9])|(1[0-2]))-((0[1-9])|([12][0-9])|(3[01]))'
2019-12-30
[root@localhost tmp ~]# echo "1919-12-30" |grep -E '[1-9][0-9]{3}-((0[1-9])|(1[0-2]))-((0[1-9])|([12][0-9])|(3[01]))'
1919-12-30

Regular high-level components: Greed | non-greedy (like understanding)
Greed is, as much as possible to match
non-greedy matching is less as far as possible, only (that is, the number of times) back plus some expressed quantifier ?, such as: .*? +?
grep or egrep default are greedy, do not support non-greedy mode.
To realize the need to use non-greedy -P parameter, which uses the Perl language environment RegularHere Insert Picture Description

4.awk

1.awk Profile

awk way text and data processing is such that it progressive scan files from the first row to the last row, look for the line that matches a particular pattern, and you want to do on these lines. If not specified processing operation, put the matching line displayed on the screen, if no mode is specified, then all the rows are specified operation process.

2.awk usage

awk [Options] 'Commands' of the filenames
awk [Options] -f-awk-Script File of the filenames awk -f means to read from a script file command
options:
-Fthe content of each treatment, may specify a sub-defined delimiter, default delimiter is whitespace characters (space or tab)
the Command:

BEGIN{}            	处理匹配到内容之前的动作
{}             				处理匹配到内容之中的动作
END{}            		处理匹配到内容之中的动作
3. Works

(1)awk是一行一行处理的,每次处理时,使用一行作为输入,并将这一行赋给内部变量$0,以换行符结束
(2)每行会被分隔符分成多个字段,每个字段存储在已编号的变量中,从$1开始,最多达100个字段
(3)初始时,FS赋为空白字符,所以默认为以空格为分隔符
(4)awk打印字段时,将以内置的方法使用 print 函数打印,awk 在打印出的字段间加上空格。这个空格是内部的一个变量 OFS 输出字段的分隔符, 逗号 , 会和 OFS 进行映射,通过 OFS 可以控制这个输出分隔符的值。
(5)awk输出之后,将从文件中获取另一行,并将其存储在$0中,覆盖原来的内容,然后将新的字符串分隔成字段并进行处理。该过程将持续到所有行处理完毕

4.与字段有关的内部变量
$0 : awk变量 $0 保存当前正在处理的行内容
NR : 当前正在处理的行是 awk 总共处理的行号。
FNR: 当前正在处理的行在其文件中的行号。
NF :每行被处理时的总字段数
$NF: 当前处理行的分隔后的最后一个字段的值
FS : 字段分隔符,默认空格
OFS : 输出字段分隔符,默认是一个 空格
ORS 输出记录分隔符, 默认是换行符.
[root@localhost tmp ~]# awk 'BEGIN{FS=":"} {print $NF}' /etc/passwd     #以:(冒号)为分隔符,输出每一行的最后一个字段的值
[root@localhost tmp]# awk 'BEGIN{FS=":"; OFS="+++"} /^root/{print $1,$2,$3,$4}' /etc/passwd
root+++x+++0+++0                  #输出时以+++隔开
[root@localhost tmp]# awk 'BEGIN{ORS="  "} {print $0}' /etc/passwd #将文件的所有行合并为一行
5.awk模式和动作

awk语句是由模式和动作组成的
模式可以是正则表达式,比较表达式等,而动作一般都是打印print
1.正则表达式:

awk '/正则表达式/'     filename      整行匹配   /正则表达式/  或 !/正则表达式/ 或 $0 ~ /正则表达式/
awk '字段 ~ /正则表达式/'     filename      某一字段匹配 字段 ~ /正则表达式/ 或 字段 !~ /正则表达式/
[root@localhost tmp]# awk -F: '/alice/' /etc/passwd      #只要有anlice就行
[root@localhost tmp]# awk -F: '$NF !~ /alice/' /etc/passwd      #最后一个字段不是anlice

2.比较表达式

== 等于 
!= 不等于                      
< 小于 
> 大于
<= 小于 
>= 大于等于

== 等于 != 不等于 两边可以是数字也可以是字符串

[root@localhost tmp]# awk -F: '$NF == "alice" ' /etc/passwd      #最后一个字段是anlice (别忘了字符串要引起来)
[root@localhost tmp]# awk -F: '$3== 0' /etc/passwd      #第三个字段是0
[root@localhost tmp]# awk -F: '$3 > 0 ' /etc/passwd      #第三个字段大于0

3.条件表达式:if if else
4.逻辑表达式:
&& 与 同时满足
|| 或 满足其中一个
! 非 取反
5.范围模式
起始表达式, 终止表达式

[root@localhost tmp]# awk -F: '/^bin/,/adm/ {print $0 }' /etc/passwd
#从开头是 bin 的行开始匹配成功一直到含有 adm 的行结束匹配(中间的行也都打印出来)

6.外部变量的引用
一般是用awk -v 参数

[root@localhost tmp]# test=hello
[root@localhost tmp]# echo "hello world" | awk -v var=$test '$1 == var {print $1}'
hello

5.sed

sed operating mode with awk, like all processed line by line
sed also supports regular expressions, also supports extended regular, but use the -r parameter to use extended regular. So in actual use generally -r switch, even if you are not using extended regular nor error.
Each row sed default output file, regardless of whether the content of this line matches the pattern, if it will be matched to the output once again. Use shielded default output parameter -n
-i parameter to change the contents of the file inside, do not add this parameter, only when the output changes, but in fact there is no document change

1. Most ---- search using the actual replacement
# 搜索每一行,找到有 root 的,把第一个替换为yjssjm
sed  -rn  's/root/yjssjm/'   filename
# 搜索每一行,找到所有的 root 字符,进行全局替换为 yjssjm
sed  -rn  's/root/yjssjm/g'   filename     
# i  是同时忽略大小写
sed  -rn  's/root/yjssjm/gi'    filename #也就是可以将Root ROot 等替换成yjssjm
# 找到含有 root 的进行删除
sed  -rn  '/root/d'   filename
# 可以使用不同的 字符 作为界定符号,注意进行转义
sed  -rn  '\#root#d'  filename
# 第 1 行到第 3 行都删除
sed  -r  '1,3  d'    filename
2.sed commonly used commands

1. Replace command: s

# 将所有的两位数字后面加.5
[root@localhost tmp]# echo "77      1"|sed -r 's/[0-9][0-9]/&.5/'
77.5      1
#/()/\1/  注意格式
[root@localhost tmp]# echo "nowrite" | sed -r 's/(no)write/\1写/'
no写

2. Append command: a (later added to the line matching)

[root@localhost tmp]# echo "aaaaaaa" | sed -r 'a\111111111'
aaaaaaa
111111111

3. Insert Command: i (inserted in front of the line of the matched)

[root@localhost tmp]# echo "aaaaaaa" | sed -r 'i\111111111'
111111111
aaaaaaa

4. Modify command: c

[root@localhost tmp]# echo "aaaaaaa" | sed -r 'c\111111111'
111111111
3. Multiple editing options: -e
sed -re '1,3 d' -re 's/root/yjssjm/' filename
等同于
sed -r '1,3 d; s/root/yjssjm/' filename
或者多行命令
sed -r '1,3 d' filename
sed -r 's/root/yjssjm/'dfilename
4. common operating

Use in the file: set list will be able to see some special symbols are not displayed
, such as: $ indicate the beginning of a blank line, represents a carriage return at the end of $

删除配置文件中被注释的行
sed -ri '/^#/d' filename
删除配置文件中空行
sed -ri '/^$/d' filename
删除开头的一个或者多个空格或者Tab键
sed -ri '/^[ \t]*#/d' filename
给文件3到7行添加注释
sed -r '3,7s/^/#/' filename
Published 31 original articles · won praise 91 · views 10000 +

Guess you like

Origin blog.csdn.net/baidu_38803985/article/details/105023004