linux: regex grep command

Basic Syntax
A regular expression, often called a pattern, is used to describe or match a sequence of strings that conform to a certain syntactic rule.

1. Choose: |

  • | Vertical separators indicate choices, for example "boy|girl" can match "boy" or "girl"   

2. Quantity limitation: + ? * 

  • + means that the preceding character must appear at least once (1 or more times), for example, "goo+gle", can match "gooogle", "goooogle", etc.;
  • ? means that the preceding character appears at most once (0 or 1), for example, "colou?r", which can match "color" or "colour";
  • *The asterisk indicates that the preceding character may not appear, or it may appear one or more times (0 times, or 1 time, or multiple times), for example, "0*42" can match 42, 042, 0042, 00042, etc.

3. Scope and priority
() Parentheses can be used to define the scope and priority of the pattern string, which can be simply understood as whether to treat the pattern string within the parentheses as a whole. For example, "gr(a|e)y" is equivalent to "gray|grey", (precedence is reflected here, the vertical separator is used to select a or e instead of gra and ey), "(grand)?father "Matches father and grandfather (experience the range here, ? matches the parenthesis content as a whole).

4. Grammar (partial)
Regular expressions have many different styles. Here are some commonly used regular expression matching rules for perl and python programming languages ​​and grep or egrep as subsets of PCRE: (due to markdown table parsing Problem, the following vertical separators are replaced by full-width characters, please switch back to half-width characters when actually using)

PCRE (Perl Compatible Regular Expressions Chinese meaning: perl language compatible regular expressions) is a regular expression library written in C language, written by Philip Hazel. PCRE is a lightweight library, much smaller than regular expression libraries like Boost. PCRE is very easy to use, but also very powerful, outperforming POSIX regular expression libraries and some classic regular expression libraries.

5. Character description

  • \ Marks the next character as a special character, or a literal character. For example, "n" matches the character "n". "\n" matches a newline. The sequence "\\" matches "\" and "\(" matches "(".
  • ^ matches the beginning of the input string.
  • $ matches the end of the input string.
  • {n} n is a non-negative integer. Match a certain number of n times. For example, "o{2}" cannot match the "o" in "Bob", but can match the two o's in "food".
  • {n,} n is a non-negative integer. Match at least n times. For example, "o{2,}" would not match the "o" in "Bob", but would match all o's in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
  • {n,m} m and n are non-negative integers, where n<=m. Match at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". "o{0,1}" is equivalent to "o?". Note that there can be no spaces between the comma and the two numbers.
  • * matches the preceding subexpression zero or more times. For example, zo* matches "z", "zo", and "zoo". * Equivalent to {0,}.
  • + Matches the preceding subexpression one or more times. For example, "zo+" matches "zo" and "zoo", but not "z". + is equivalent to {1,}.
  • ? Matches the preceding subexpression zero or one time. For example, "do(es)?" can match "do" or "do" in "does". ? is equivalent to {0,1}.
  • ? When the character immediately follows any one of the other qualifiers (*,+,?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. The non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much of the searched string as possible. For example, for the string "oooo", "o+?" would match a single "o", and "o+" would match all "o"s.
  • . matches any single character except '\n'. To match any character including "\n", use a pattern like "(.|\n)".
  • (pattern) Match pattern and get the substring of this match. This substring is used for backreferences. To match parentheses characters, use "\(" or "\)".
  • x|y matches x or y. For example, "z|food" can match "z" or "food". "(z|f)ood" matches "zood" or "food".
  • [xyz] Character class. Matches any one of the included characters. For example, "[abc]" can match "a" in "plain". Among them, only the backslash \ maintains special meaning and is used to escape characters. Other special characters such as asterisks, plus signs, various brackets, etc. are treated as ordinary characters. If the caret ^ appears in the first place, it represents a set of negative characters; if it appears in the middle of the string, it is only used as a normal character. Hyphen - If appearing in the middle of a string it represents a character range description; if appearing in the first place only as a normal character.
  • [^xyz] Set of negate characters. Matches any character not listed. For example, "[^abc]" can match "plin" in "plain".
  • [az] Character range. Matches any character in the specified range. For example, "[az]" matches any lowercase alphabetic character in the range "a" to "z".
  • [^az] Excluded character range. Matches any arbitrary character not in the specified range. For example, "[^az]" matches any character that is not in the range "a" to "z".

6. Priority The
priority is from top to bottom, from left to right, and decreases in turn:

 

operator  illustrate
Escapes
(), (?:), (?=), [] Brackets and Brackets
*、+、?、{n}、{n,}、{n,m} 限定符
^、$、\任何元字符 定位点和序列
选择

grep模式匹配命令

一、 基本操作
grep命令用于打印输出文本中匹配的模式串,它使用正则表达式作为模式匹配的条件。grep支持三种正则表达式引擎,分别用三个参数指定:

参数 说明

  • -E POSIX扩展正则表达式,ERE
  • -G POSIX基本正则表达式,BRE
  • -P Perl正则表达式,PCRE

在通过grep命令使用正则表达式之前,先介绍一下它的常用参数:

参数 说明

  • -b 将二进制文件作为文本来进行匹配
  • -c 统计以模式匹配的数目
  • -i 忽略大小写
  • -n 显示匹配文本所在行的行号
  • -v 反选,输出不匹配行的内容
  • -r 递归匹配查找
  • -A n n为正整数,表示after的意思,除了列出匹配行之外,还列出后面的n行
  • -B n n为正整数,表示before的意思,除了列出匹配行之外,还列出前面的n行
  • --color=auto 将输出中的匹配项设置为自动颜色显示

 

使用正则表达式

使用基本正则表达式,BRE

位置

查找/etc/group文件中以"shiyanlou"为开头的行

$ grep 'shiyanlou' /etc/group

$ grep '^shiyanlou' /etc/group

数量

# 将匹配以'z'开头以'o'结尾的所有字符串

$ echo 'zero\nzo\nzoo' | grep 'z.*o'

# 将匹配以'z'开头以'o'结尾,中间包含一个任意字符的字符串

$ echo 'zero\nzo\nzoo' | grep 'z.o'

# 将匹配以'z'开头,以任意多个'o'结尾的字符串

$ echo 'zero\nzo\nzoo' | grep 'zo*'

#注意:其中\n为换行符

选择

# grep默认是区分大小写的,这里将匹配所有的小写字母

$ echo '1234\nabcd' | grep '[a-z]'

# 将匹配所有的数字

$ echo '1234\nabcd' | grep '[0-9]'

# 将匹配所有的数字

$ echo '1234\nabcd' | grep '[[:digit:]]'

# 将匹配所有的小写字母

$ echo '1234\nabcd' | grep '[[:lower:]]'

# 将匹配所有的大写字母

$ echo '1234\nabcd' | grep '[[:upper:]]'

# 将匹配所有的字母和数字,包括0-9,a-z,A-Z

$ echo '1234\nabcd' | grep '[[:alnum:]]'

# 将匹配所有的字母

$ echo '1234\nabcd' | grep '[[:alpha:]]'

下面包含完整的特殊符号及说明:

特殊符号       说明

  • [:alnum:] 代表英文大小写字母及数字,亦即 0-9, A-Z, a-z
  • [:alpha:]  代表任何英文大小写字母,亦即 A-Z, a-z
  • [:blank:]  代表空白键与 [Tab] 按键两者
  • [:cntrl:]    代表键盘上面的控制按键,亦即包括 CR, LF, Tab, Del.. 等等
  • [:digit:]    代表数字而已,亦即 0-9
  • [:graph:] 除了空白字节 (空白键与 [Tab] 按键) 外的其他所有按键
  • [:lower:]  代表小写字母,亦即 a-z
  • [:print:]   代表任何可以被列印出来的字符
  • [:punct:]  代表标点符号 (punctuation symbol),亦即:" ' ? ! ; : # $...
  • [:upper:] 代表大写字母,亦即 A-Z
  • [:space:]  任何会产生空白的字符,包括空白键, [Tab], CR 等等
  • [:xdigit:]  代表 16 进位的数字类型,因此包括: 0-9, A-F, a-f 的数字与字节

注意:之所以要使用特殊符号,是因为上面的[a-z]不是在所有情况下都管用,这还与主机当前的语系有关,即设置在LANG环境变量的值,zh_CN.UTF-8的话[a-z],即为所有小写字母,其它语系可能是大小写交替的如,"a A b B...z Z",[a-z]中就可能包含大写字母。所以在使用[a-z]时请确保当前语系的影响,使用[:lower:]则不会有这个问题。

# 排除字符

$ $ echo 'geek\ngood' | grep '[^o]'

注意:当^放到中括号内为排除字符,否则表示行首。

 

使用扩展正则表达式,ERE

要通过grep使用扩展正则表达式需要加上-E参数,或使用egrep。

数量

# 只匹配"zo"

$ echo 'zero\nzo\nzoo' | grep -E 'zo{1}'

# 匹配以"zo"开头的所有单词

$ echo 'zero\nzo\nzoo' | grep -E 'zo{1,}'

 

注意:推荐掌握{n,m}即可,+,?,*,这几个不太直观,且容易弄混淆。

 

选择

# 匹配"www.shiyanlou.com"和"www.google.com"

$ echo 'www.shiyanlou.com\nwww.baidu.com\nwww.google.com' | grep -E 'www\.(shiyanlou|google)\.com'

# 或者匹配不包含"baidu"的内容

$ echo 'www.shiyanlou.com\nwww.baidu.com\nwww.google.com' | grep -Ev 'www\.baidu\.com'

注意:因为.号有特殊含义,所以需要转义。

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326339366&siteId=291194637