[My Linux, I call the shots! ] Wildcards and regular expressions Hood

Contents:
(a) understand the role of wildcards and regular
(ii) the use of wildcards
(c) the use of regular expressions
(IV) use extended regular expressions


(A) understand the role of wildcards and regular
(1.1) in our daily work, we will use the wildcards or regular expressions. Wildcards are a special statement, mainly with an asterisk (*) and question mark (?), Fuzzy search for files. When looking for a folder, you can use it to replace one or more real characters; when the characters do not really know or do not bother to enter the full name, often using wildcards instead of one or more real characters. A regular expression is a concept in computer science, the regular expression is normally used to retrieve, replace those text matches a pattern, a regular expression is a logical formula of string operations, is to use pre-defined specific combining characters, and these particular character, form a "string rule", this "rule string" is used to express a filtering logic of the string.
(1.2) either wildcard or regular expressions, fuzzy matching its functions are used to match a certain type of things, not one match a specific value. Wildcards are generally used for the shell, regular expressions are generally used for other languages.


(Ii) the use of wildcards
(2.1) First, the first is the "[]" brackets [list], matches any single character in the list. For example a [XYZ] b, between a and b must have only one character, but only x or y or Z, such as: AXB, AYB, AZB
(2.2) and the second is "[c1-c2] "it is used to indicate a range of characters, match any single character c1-c2, as in [0-9] or [az]. For example, "a [0-9] b" must also shows only one character such as 0 to 9: a0b, a1b, a2b, a3b, a4b, a5b, a6b, a7b, a8b, a9b
Note: we need to match a single letter, and no case, then we can use the "[a-zA-Z] " to be represented.
(2.3) The third is the "[! C1-c2] or [^ c1-c2]", the match is not any character of c1-c2. For example a [! 0-9] b, a [^ 0-9] b represents only a character between a and B, and are not among the characters 0-9, to meet the requirements are: ACB, the adb
(2.4) example: we create a rh124 on vms002 host directory, and then create the relevant file in rh124 directory: 11111, a111, a_111, a22 , lwang, lWang, rh124. Then when we first query character between a through z, the second non-numeric characters, the characters following is arbitrary.
Mkdir RH124 #
# 11111 Touch a_111 A111 A22 lwang lWang RH124
when # ls [az] [^ 0-9 ] * --- a query to the first character, the second character is non-numeric z between the back of the characters are arbitrary
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(2.5) example: next we need to find a format for the first character in the rh124 z between a directory, or a second character is "-" characters is three or z any one of the following character is arbitrary. So that we can meet the requirements of the filename a-1
Touch A-1 #
# LS [az] [az] *
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(2.6) The fourth case is precisely specified character "[[: upper:]] ", "[[: lower:]]", because we use [az] may be matched, when a to z and between the characters a through z, the case can not be an exact match, we can use "[[: upper:]] " denotes uppercase characters, we use "[[: lower:]] " represents the pure lowercase characters.
# Ls [[: upper:] ] * --- all queries pure capital letters at the beginning of the destination file name
# ls [[: lower:] ] * --- query the beginning of all pure lowercase letters of the file name
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(2.7) The first course four characters specify precisely there are other ways to represent a particular character: "[[: alpha:] ]" represents the only match the letter, "[[: alnum:] ]" denotes the matching letters and numbers, " [[: digit:]] "indicates the match purely digital.
(2.8) Example: Now our system does not marry users, we first create a user marry, marry and create the specified home directory (Figure 1-5) in the root. We then deleted marry home directory, then we switch to marry because users find no home directory, so after the handover is an abnormal state (FIG. 1-6), in this case we are / etc / default / useradd configuration file, query template file to the user's home directory in / etc / skel directory in (Figure 1-7), we will all template file / etc / skel are copied to marry both home directory and modify the owner and group related information, a case can be properly switched marry the user (FIG. 1-8).
# Useradd -d / marry marry --- create a user marry, marry and create the specified home directory at the root
# rm -rf / marry / --- delete the home directory marry
# Vim / etc / default / useradd --- query / etc / useradd file in the default / directory
# cp -a /etc/skel/.[^.]* / marry / --- The / etc / skel / directory in all in "." at the beginning, since the second character is "[]" and it must be non-existence. "" No, after the file with any character all copied to / marry home directory, skel represents skeleton frame (Figure 1-8)
# chown -R marry.marry / marry / --- owner and group modify / marry home directory are marry (Figure 1-8)
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(2.9) the fifth is the "?" question mark , matching any one character. For example, in rh124 directory, we query "[az] ????", it represents the first character is a query-letter file name will be behind any of the four characters composed.
Note: "?" Question mark that can not be matched to a point representing the number of hidden files. "." That means that if there is ".aa" file systems now, we use "???" can not match out of this hidden file, and if we want to match the kind of hidden files should be turned on global wildcard processing.
The first character # ls [az] ???? --- query letters constituted, the file name will be behind any of the four characters consisting of
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(2.10) is the sixth "*" asterisk, which matches any length any character. For example, we will all file names beginning with a letter and followed by any character files are deleted
# touch aaa bb cc aa2 --- create the following four files
# rm -rf a * --- all file names the beginning of a letter, followed by any character files are deleted
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(2.11) is the seventh "\" backslash, represents the escape character, sometimes we vsftp install the software in the current system, we may use "# yum install vsftp *" for installation, but because we are in execution system commands, the first thing is to run in the shell process before reaching YUM repository to find work related packages. That is when we perform "vsftp *", the shell will first "vsftp *" were shell resolve to find whether there is compliance with "vsftp *" file format, if now there is a file in our current directory current directory vsftp123 , the shell will this time "vsftp *" resolved to "vsftp123", and then to YUM repositories to find the packages "vsftp123" be installed, and this situation is not what we want. So when we execute the command to install the package in the shell, is generally recommended to use the escape character "# yum install vsftp \ *" perform such installation is a better format, so you can prevent shell we use wildcards to be situation resolved generated.
# Yum install vsftp \ * --- escape character used to escape the wildcard, the wildcard to prevent the shell analyzing
# yum install 'vsftp *' --- single quotes may be used to escape, to prevent the shell wildcard parsing
(2.12) Note that, when we create a file, the file name can not contain "/" because there are "/" is created on behalf of a directory.
# Touch rh124 / cc --- this time "rh124 / cc" does not mean a file name, but rather to create a file in the cc rh124 / directory
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood


(三)正则表达式的使用
(3.1)正则表达式是用来匹配字符串的,针对文件内容的文本过滤工具,大都用到正则表达式,如vim、grep、awk、sed等。正则表达式和我们上面说的通配符实现的效果都是一样的,是为了实现查询信息的模糊匹配。
(3.2)第一个“^”表示开头,例如我们先将/etc/passwd文件拷贝到当前目录中,然后查询passwd文件中以root字符开头的行,此时可以使用“^”来进行标识。
# grep ^root passwd
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.3)第二个“$”表示行末,我们先将passwd文件中的相关行进行设计一下,然后查找每一行行末是“bash”字符的行。
# grep bash$ passwd---查询行末字符是bash字符的所有行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.4)第三个“\<”或者“\b”表示锚定的是单词的开头,我们先来创建一个aa.txt的文件,然后我们查询“\<tom”以tom字符开头的所有行。
# grep '\<tom' aa.txt---查询以tom字符开头的所有行
# grep '\btom' aa.txt---查询以tom字符开头的所有行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood

[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.5)第四个“\>”或者“\b”表示锚定的是单词的末尾,在aa.txt文件中,我们查询“tom\>”以tom字符结束的所有行(图1-17)。如果我们希望查询出所有以tom为单词独立存在的行时,我们可以同时使用“\<”和“\>”符号(图1-18)。
# grep 'tom\>' aa.txt---查询“tom\>”以tom字符结束的所有行
# grep 'tom\b' aa.txt---查询“tom\b”以tom字符结束的所有行
# grep '\<tom\>' aa.txt---查询出所有以tom为单词独立存在的行
# grep '\btom\b' aa.txt ---查询出所有以tom为单词独立存在的行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.6)示例:现在我们希望查询SELinux中对端口设置的上下文信息,如果需要过滤具体的端口号的信息,则可以使用“\<\>”来指定独立的单词信息,例如过滤出只包含80端口上下文的行,如果我们只是使用“grep 80”过滤出的信息是不正确的(图1-19),我们应该使用“grep '\<80\>'”才是正确的(图1-20)。
# semanage port -l | grep 80---查询当前系统中所有包含80端口上下文的信息
# semanage port -l | grep '\<80\>'---查询当前系统中只含有80端口上下文的信息行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.7)第五个“.”表示单个任意字符,和通配符中的“?”问号的意义一致。例如我们想要匹配出aa.txt文件中to单词后跟任意一个字符的所有符合要求的行(图1-21)。如果我们希望“.”符号没有模糊查询的意思,就代表它本身的字符的意思,则我们可以使用“\”作为转义符,这样就可以直接查询出包含“to.”字样的行(图1-22)。
# grep 'to.' aa.txt---查询出所有符合to单词后还会跟一个任意字符的行
# grep 'to\.' aa.txt---使用转义符,直接查询包含“to.”字符的行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.8)第六个“[]”,表示的是匹配指定范围内的任意单个字符。
(3.9)第七个“[^]”,表示的是匹配指定范围外的任意单个字符。
(3.10)分组概念
(3.10.1)第八个“\(\)”,表示的是分组。例如“\(ab\)*”表示ab单词可以出现0次、1次或任意次。我们创建一个test.txt文件(图1-22-1),然后我们使用“\(ab\)*”将符合条件的都筛选出来(图1-22-2)。
# grep "(ab)*" test.txt
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.10.2)我们创建一个文件test3.txt,然后编辑如下的内容,我们查询“l..e”与“l..e”之间有任意字符任意次的所有符合条件的行,此时我们发现test3.txt文件中的第1行至第4行的内容都被筛选出来了(图1-22-4)。此时我们如果希望出现的行中前后两个字符是完全一致的才符合要求并显示,即test3.txt文件中的第1行和第3行显示出来,此时我们需要使用后项引用的方式来完成要求(图1-22-5)。
分组:\(\)
后项引用:
\1:引用第一个左括号以及与之对应的右括号所包括的所有内容
\2:引用第二个左括号以及与之对应的右括号所包括的所有内容
\3:引用第三个左括号以及与之对应的右括号所包括的所有内容
# grep 'l..e.*l..e' test3.txt---查询“l..e”与“l..e”之间有任意字符任意次的所有符合条件的行
# grep '\(l..e\).*\1' test3.txt---查询例如“like”开头与“like”结尾的前后对应行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.10.3)示例:我们在当前目录下创建一个inittab的文件,然后我们查询文件中行中出现了任意一个数字,在行尾结束时也出现了这个相同的数字的行,将这个行显示在屏幕上。
# grep '\([0-9]\).*\1$' inittab---其中“\([0-9]\)”表示行中出现的任意一个数字,“\1$”表示在行尾结束时也出现了这个相同的数字
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(3.11)我们在使用grep命令的时候可以使用“-A”、“-B”、“-C”参数,其中-A表示的是after后面,其中-B表示的是before前面,其中-C表示的是context上下文。
# grep -A 2 '^core id' /proc/cpuinfo---表示core id字符开头行的后面的2行
# grep -B 2 '^core id' /proc/cpuinfo---表示core id字符开头行的前面的2行
# grep -C 2 '^core id' /proc/cpuinfo---表示core id字符开头行的上下文各2行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood


(四)扩展正则表达式的使用
(4.1)以上我们所使用的正则表达式在进行查询的时候可以配合grep命令进行使用“grep 表达式 file”。不过有些正则表达式grep命令并不支持,此时我们应该使用“grep -E 表达式 file”或者“egrep 表达式 file”启用扩展的正则表达式进行查询。有时候还有一些正则表达式是扩展的正则表达式也解决不了的,此时我们应该使用“grep -P 表达式 file”即调用perl语言中的正则表达式进行查询。分割线扩展正则表达式。
注意:egrep -o表示的是仅仅输出查询出的字符
(4.2)第一个“?”表示它前面出现的字符,出现0次或者1次。“to.?”表示的意思是to后会跟一个任意的字符,但是这样任意的字符可能出现0次,也可能出现1次,所以此时aa.txt 文件中包括“to”在内的所有行都是符合要求的。此时由于使用的“?”问号,所以我们需要使用扩展的正则表达式egrep进行匹配查询。
# egrep 'to.?' aa.txt---其中to后会跟一个任意的字符,但是这样任意的字符可能出现0次,也可能出现1次
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.3)第二个“+”表示它前面的字符出现1次或者多次。此时我们查询“to.+”表示的意思是在to单词后面有一个任意字符,同时这个任意字符出现可能是1次,也可能出现多次,所以在aa.txt文件中除了第一行不符合要求,其他的行都是符合要求的。同时我们需要使用扩展的正则表达式egrep进行匹配查询。
# egrep 'to.+' aa.txt---也称贪婪匹配,在to单词后面有一个任意字符,同时这个任意字符出现可能是1次,也可能出现多次
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.4)第三个“*”表示它前面的字符出现任意次数。此时我们查询“to.*”表示匹配的是to单词后有一个任意字符,并且这个任意字符出现任意次,包括0次、1次、任意次。所以此时aa.txt文件中所有行都是符合匹配的要求的。同时我们需要使用扩展的正则表达式egrep进行匹配查询。
# egrep 'to.*' aa.txt---查询匹配to单词后有一个任意字符,并且这个任意字符出现任意次,包括0次、1次、任意次
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.5)在模式匹配的过程中我们有两个概念,第一个是贪婪匹配,第二个是懒惰匹配,默认是工作在贪婪模式中。其中贪婪匹配表示的是尽可能多的向后面进行匹配,例如“to.+”表示的是to单词后会有一个任意字符,并且这个任意的字符至少是1个,最多可以任意的个数,所以匹配的时候符合要求的行会尽可能的向后进行匹配,同时我们需要使用扩展的正则表达式egrep进行匹配查询(图1-26)。而懒惰匹配表示的是在符合要求的情况下尽可能少的向后进行匹配,例如“to.+?”表示的是to单词后会有一个任意字符,“+”表示并且这个任意字符至少匹配一个,最多可以匹配任意的个数,“?”表示前面的部分可以出现0次或者1次,所以此时就会按照最少符合要求的情况进行懒惰匹配,同时我们需要使用扩展的正则表达式“grep -P”进行匹配查询(图1-27)。以上的应用也是非常广泛的,有时候我们在网站进行信息抓取的时候我们希望从<p>标志位开始的抓取,到</p>标志位结束,此时如果我们使用贪婪匹配的模式进行抓取,那么我们抓取的信息便包含a和b两段内容,如果我们使用懒惰匹配的模式进行抓取,那么我们抓取的信息就只会包含a段的内容(图1-28)。
# egrep 'to.+' aa.txt---贪婪匹配,在to单词后面有一个任意字符,同时这个任意字符出现可能是1次,也可能出现多次,同(3.9)
# grep -P 'to.+?' aa.txt---懒惰匹配,调用Perl语言支持的正则表达式,然后在匹配的过程中匹配最少符合要求的字符信息
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.6)第四个是在使用grep -P时的“{n,m}”,在使用常规grep时的“{n,m}”,表示的是匹配次数在n到m之间,包括边界;其中grep -P时的“{n}”或者常规grep时的“{n}”表示必须匹配n次;grep -P时的“{n,}”或者常规grep时的“{n,}”表示匹配n次及以上。
# grep -P 'tom{2}' aa.txt---查询m出现两次的所有符合字段
# grep -P 'tom{2,}' aa.txt---查询m出现次数在两次及两次以上的所有符合字段
# grep 'tom{2,}' aa.txt---查询m出现次数在两次及两次以上的所有符合字段,由于使用的是常规grep,所以需要用“{2,}”表示
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.7)第五个是可以使用指定的字符表示特定类型的一类字符。
[[:alpha:]]:表示所有字母
[[:alnum:]]:表示字母与数字字符
[[:ascii:]]:表示ASCII字符
[[:blank:]]:表示空格或制表符
[[:cntrl:]]:表示ASCII控制符
[[:digit:]]:表示数字
[[:graph:]]:表示可见字符,非控制、非空格字符
[[:lower:]]:表示小写字母
[[:print:]]:表示可打印字符
[[:punct:]]:表示标点符号字符
[[:space:]]:表示空白字符,包括垂直制表符
[[:upper:]]:表示大写字母
[[:xdigit:]]:十六进制数字
(4.8)查询实例
(4.8.1)示例:查询IP地址,目前在我们的/var/log/messages文件主要保存的是系统的日志信息,其中也会有包含IP地址的字符信息,我们的需求是将其中所有IP地址格式的信息全部过滤出来。由于我们知道IP地址的格式可以是192.168.26.101,也可以是1.1.1.1,所以此时我们可以使用“[0-9]{1,3}”表示IP地址的一段信息,使用“{3}”表示数字和点组成的信息重复3次,最后再加上一段数字,此时我们便可以得到这样一个表示IP地址格式的正则表达式:([0-9]{1,3}.){3}[0-9]{1,3}
# less /var/log/messages---查看包含系统日志的文件
# egrep '([0-9]{1,3}.){3}[0-9]{1,3}' /var/log/messages---查询出日志中所有包含IP地址的所有字符信息的行
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.8.2)示例:找出/boot/grub2/grub.cfg文件中1-255之间的数字。此时我们可以使用的正则表达式为:\<([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\>
# egrep '\<([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\>' /boot/grub2/grub.cfg
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.8.3)示例:粗略查询所有符合IP地址格式要求的字符串,例如0.0.0.0至255.255.255.255这样的格式,此时我们可以按照如下的方式进行查询。
# ifconfig | egrep -o '\<([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\>\.\<([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\>\.\<([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\>\.\<([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\>'
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.8.4)示例:精确查询所有格式为IP地址,并且符合五类IP地址中A类、B类、C类IP地址的所有适合的字段。
A类:1-127
B类:128-191
C类:192-223
# ifconfig | egrep '\<([1-9]|[1-9][0-9]|1[0-9]{2}|2[01][0-9]|22[0-3])\>(\.\<([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])\>){2}\.\<([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-4])\>'
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood
(4.9) in general we represent part of the regular expression when the expression in single quotation marks are suggested adding the expression to the cause. If we file a toz if the pre-existing file name, then we do not add the case of single quotes in the expression system, the first will be "to?" Queries sent to the shell which shell resolve, at this time will sehll corresponds "to?" look in the system and resolved to "toz", then "toz" sent to egrep are resolved, this time in aa.txt file is the query does not come out any information. It is generally necessary to indicate when part of the regular expression expression is suggested adding single quotes so you can prevent shell parsing occurs.
# Egrep 'to?' Aa.txt --- inquiries aa.txt appear in a word to any character, any character that appear 0 or 1 times
aa.txt --- and more use single quotes # egrep to \? consistent results
[My Linux, I call the shots!  ] Wildcards and regular expressions Hood

------ This concludes the article, thanks for reading ------

Guess you like

Origin blog.51cto.com/13613726/2460788