Regular expression of Linux grep

6, the regular expression of grep

1. From the self-study book

  1. In regular expressions, there is no difference between spaces and other characters.
  2. The characters recognized by regular expressions include:, *[]^${}\+?|()if you want to use a character as a text character, you must use an escape character \.
  3. Regular expression patterns are all case sensitive.
  4. By default, when a regular expression pattern is specified, it will match as long as the pattern appears anywhere in the data stream. There are two special characters that can be used to lock the pattern at the beginning or end of a line in the data stream.
  5. The caret ^defines the pattern starting from the beginning of the text line in the data stream. If the pattern appears outside the beginning of the line, the regular expression pattern cannot be matched.
  6. The opposite of the search mode at the beginning of the line is the search at the end of the line. The special character dollar sign $defines the anchor point at the end of the line. Put this special character after the text mode to indicate that the data line must end with the text mode.
  7. The special character dot is .used to match any single character except the newline character. It must match a character. If there is no character at the position of the dot character, the pattern is invalid.
  8. Use square brackets to define a character group. The square brackets []contain all the characters you want to appear in the character group. The square brackets plus everything in the upper brackets is equivalent to a point, but there are restrictions, and this restriction is the content in the square brackets. When you are not sure about the case of characters, character groups can be very useful.
  9. In regular expression mode, the role of character groups can also be reversed. You can look for characters that are not in the group, instead of looking for characters that are contained in the group. To do this, just add a caret at the beginning of the character group ^.
  10. The single -dash symbol can be used to represent the character interval in the character group. You only need to specify the first character of the interval, the single dash, and the last character of the interval.
  11. Placing an asterisk after a character *indicates that the character must appear zero or more times in the text that matches the pattern.
  12. Remember, there is a difference between the sed editor and the regular expression engine of the gawk program. The gawk program can use most of the extended regular expression pattern symbols, and can provide some additional filtering functions, which are not available in the sed editor. But because of this, gawk programs are usually slow when processing data streams.

ERE

  1. The question mark is ?similar to the asterisk, but with a slight difference. The question mark indicates that the preceding character can appear 0 or 1 times, but it is only limited to this. It will not match characters that appear multiple times.
  2. The plus sign +is another mode symbol similar to the asterisk, but also different from the question mark. The plus sign indicates that the preceding character can appear one or more times, but it must appear at least once. If the character does not appear, then the pattern will not match.
  3. The curly braces in ERE {}allow you to specify an upper limit for repeatable regular expressions. This is usually called an interval. Two formats can be used to specify the interval
  • m: The regular expression appears exactly m times.
  • m, n: The regular expression appears at least m times and at most n times.
  • By default, the gawk program does not recognize regular expression intervals. The --re- intervalcommand line options of the gawk program must be specified to recognize the regular expression interval.
  1. The pipe symbol |allows you to use logical OR to specify two or more patterns to be used by the regular expression engine when checking the data flow. If any pattern matches the data flow text, the text passes the test. If there is no pattern match, the data flow text
    match fails.noteThere can be no spaces between the regular expression and the pipe symbol, otherwise they will also be considered as part of the regular expression pattern. The regular expressions on both sides of the pipe symbol can use any regular expression pattern (including character groups) to define text.
  2. Regular expression patterns can also be ()grouped by parentheses . When you group regular expression patterns, the group will be treated as a standard character. You can use special characters for this group like normal characters.

Practical example

echo "The book store" | sed -n '/^book/p'  #啥也不返回
echo "Books are great" | sed -n '/^Book/p' #返回这行字符串
echo "This is a good book" | sed -n '/book$/p'  #返回这行字符串
echo "This book is good" | sed -n '/book$/p'    #啥也不返回
sed -n '/^this is a test$/p' test.txt   #返回文本文件中特定行
sed '/^$/d' test.txt   #滤掉文本中的空行,将其他行输出。
sed -n '/.at/p' test.txt
sed -n '/[ch]at/p' test.txt  #匹配存在 cat 或者 hat 的行并输出
sed -n '/[^ch]at/p' test.txt  #匹配除了ch之外的其他字符+at,并输出
echo "I'm getting too fat." | sed -n '/[a-ch-m]at/p'
echo "iek" | sed -n '/ie*k/p'  #iek
echo "bt" | gawk --re-interval '/be{1}t/{print $0}'  #null
echo "bet" | gawk --re-interval '/be{1}t/{print $0}' #bet
echo "beet" | gawk --re-interval '/be{1}t/{print $0}' #null
echo "The cat is asleep" | gawk '/cat|dog/{print $0}'  #The cat is asleep
echo "sat" | gawk '/sat(urday)?/{print $0}'  #sat

2. On the courseware

Global regular expression printing (grep)
regular expression is a formula that uses a certain pattern to match a type of string.
• Usually use regular expressions for search, replace and other operations.
• The use of regular expressions under appropriate circumstances can greatly improve work efficiency.
• There are two styles of regular expressions:

  • POSIX style regular expression
  • Perl-style regular expression (Perl-compatible regular expression)

POSIX basic regular expression (basic regular expression, BRE) engine
POSIX extended regular expression (extended regular expression, ERE) engine
Regular expressions are composed of some ordinary characters and some metacharacters.

  • Common characters include uppercase and lowercase letters, numbers (that is, all non-metacharacters)
  • Metacharacters have special meaning
Metacharacter meaning Types of For example Description
^ Match the first character BRE ^x String starting with the character x
$ Match trailing character BRE x$ String ending with x character
. Match any character BRE l..e love, life, live …
? Match any optional character ERE xy? x, xy
* Match zero or more repetitions BRE xy* x, xy, xyy, xyyy …
+ Match one or more repetitions ERE xy+ xy, xyy, xyyy…
[...] Match any character BRE [xyz] x, y,z
() Group regular expressions ERE (xy)+ xy, xyxy, xyxyxy, …
\{n\} Match n times BRE go\{2\}gle google
\{n,\} Match at least n times BRE go\{2,\}gle google, gooogle, goooogle ...
\{n,m\} Match n to m times BRE go\{2,4\}gle google, gooogle, goooogle
{n} Match n times ERE go{2}]gle google
{n,} Match at least n times ERE go{2,}gle google, gooogle,
{n,m} Match n to m times ERE go{2,4}gle google, gooogle, goooogle
| Connect multiple matches with OR logic ERE good|bon Match good or bon
\ Escape character BRE \* *
^ non- BRE [^xyz] Match any character except xyz
- Used to specify the range of characters BRE [a-zA-Z] Match any letter
\ Escape character BRE [.] .

BRE special character group-commonly used range represents interval:

character meaning
[[:digit:]] Means all numbers [0-9]
[[:lower:]] Represents all lowercase letters, equivalent to [a-z]
[[:upper:]] Represents all capital letters, equivalent to [A-Z]
[[:alpha:]] Represents all letters, equivalent to [a-zA-Z]
[[:alnum:]] Represents all letters and numbers, equivalent to [a-zA-Z0-9]
[[:graph:]] Represents all non-blank characters (excluding spaces and control characters)
[[:cntrl:]] Represents all control characters
[[:punct:]] Indicates all punctuation marks
[[:print:]] Represents all non-blank characters (including spaces)
[[:space:]] Represents a blank character

grep [options]'matching pattern' search file
parameters

  • -V: Reverse selection, list the lines that do not match the string or regular expression
  • -C: count the matching rows
  • ‐L: Only display file names that contain matching files
  • -H: Suppress the display of file names containing matching files
  • ‐N: Each matching line is only displayed according to the relative line number
  • ‐I: Produce a case-insensitive match, the default state is case-sensitive
  • ‐O: Only display the matched content
  • -A n match line+next line
  • -B n matching line + n lines before
  • -C n match line + lines before and after
  • --Color=auto highlight the matched characters and put them at the end
  • The -f pattern matches the file,, grep ‐f pattern_file grep_test_filethis does not currently
  • -e Multi-pattern matching, similar to -e in sed

.*
Simple example of nb grep usage:

Regular expression meaning example
^this All strings starting with this echo ‐e "this testing\nhello"|grep '^this'
testing$ All strings ending in testing echo ‐e "this testing\nhello"|grep 'testing$'
^hello$ hello string echo ‐e "this testing\nhello"|grep '^hello$'
this String containing this anywhere echo ‐e "testing this\nhello"|grep 'this'
tes* A string containing tes characters anywhere echo ‐e "testing this\nhello"|grep 'tes*'
te. Characters containing te. anywhere echo ‐e "test\nstet\ntt"|grep 'te.'
[st]t Any position contains st or tt or stt echo ‐e "test\nabc\ntt"|grep '[st]t'
[s|t]t Any position contains st or tt or stt echo ‐e "test\nattb\nct\nstt"|grep '[s|t]t'
ab\{2\} a followed by 2 b (abb) echo ‐e "ab\nabb\nabc"|grep 'ab\{2\}'
ab\{1,2\} a followed by 1 or 2 b (ab, abb) echo ‐e "ab\nabb\nabc"|grep 'ab\{1,2\}'
ab[0‐9] ab is followed by a number from 0 to 9 echo ‐e "ab\nab1\nab2"|grep 'ab[0‐9]'
\(ab\).*\1 Find abab or abxxxab echo ‐e "abab\nab ab\nabc"|grep '\(ab\).*\1'

Position locking of grep usage

  • ^: Anchor at the beginning of the line; used for the leftmost side of the pattern, ^PATTERN
  • $: Anchor at the end of the line; used for the rightmost side of the pattern, PATTERN$
  • ^PATTERN$: To make PATTERN exactly match an entire line
  • ^$:Blank line
  • ^[[:space:]]*$: Blank lines or lines with only blank characters
  • \<: Anchor at the beginning of the word, used on the left side of the word pattern, the format is\<PATTER
  • \>: Anchor at the end of the word, used on the right side of the word pattern, the format isPATTERN\>
  • \<PATTERN\>: Word anchoring

Grouping and Backward Guidance of Grep Usage

  • \( \): Group and match a string
  • \(PATTERN\): Treat the PATTERNmatched characters as a whole
    ;
  • Note: The characters matched by the pattern in the grouping brackets will be automatically recorded in the internal variable by the regular expression engine
  • 后向引用:模式中,如果使用\(\)实现了分组,在某行文本的检查中,如果()的模式匹配到了某内容,此内容在后面可以被引用;
  • 对前面的分组进行引用的符号为: \1 , \2 ,\3

一个nb的例子:

echo -e "\"ab\"\n'cd'\nef\n\"gh'\n'ij\"\nthis\nit's\n\"'sgg\"'" 
echo -e "\"ab\"\n'cd'\nef\n\"gh'\n'ij\"\nthis\nit's\n\"'sgg'\"" | grep "\([\"']\).*\1"
#"ab"
#'cd'
#"'sgg'"  把最后的' 和" 调换位置,结果不变,但是最后的'不会高亮显示了
echo -e "\"ab\"\n'cd'\nef\n\"gh'\n'ij\"\nthis\nit's\n\"'sgg\"'" | grep "\([\"']\).*\1" | sed "/\"'.*\"'/d"

在这里插入图片描述
对其解释如下:
第一个命令将这一大串东西是啥搞出来
第二个命令查找
第三个命令把查找的结果作为sed的输入,删除符合某种条件的字符。


3.总结

上面记得有点乱,现将主要的内容总结如下:
1、gawk程序可以使用大多数扩展正则表达式模式符号,并且能提供一些额外过滤功能,而这些功能都是sed编辑器所不具备的。但正因为如此, gawk程序在处理数据流时通常才比较慢。
2、.*nb
3、元字符*[]^${}\+?|()

元字符 含义 类型
^ 匹配首字符 BRE
$ 匹配尾字符 BRE
. 匹配任意一个字符 BRE
? 匹配任意一个可选字符 ERE
* 匹配零次或多次重复 BRE
+ 匹配一次或多次重复 ERE
[...] 匹配任意一个字符 BRE
() 对正则表达式分组 ERE
\{n\} 匹配n次 BRE
\{n,\} 匹配最少n次 BRE
\{n,m\} 匹配n到m次 BRE
{n} 匹配n次 ERE
{n,} 匹配最少n次 ERE
{n,m} 匹配n到m次 ERE
| 以或逻辑连接多个匹配 ERE
\ 转义字符 BRE
^ BRE
- 用于指明字符范围 BRE
\ 转义字符 BRE

4、grep [选项] '匹配模式' 搜索的文件
参数

  • ‐v :反向选取,列出不匹配串或正则表达式的行
  • ‐c:对匹配的行计数
  • ‐l :只显示包含匹配的文件的文件名
  • ‐h :抑制包含匹配文件的文件名的显示
  • ‐n :每个匹配行只按照相对的行号显示
  • ‐i :产生不区分大小写的匹配,缺省状态是区分大小写
  • ‐o :仅显示匹配到的内容
  • -A n 匹配行+后面n行
  • -B n 匹配行+前面n行
  • -C n 匹配行+前后的行
  • –color=auto 高亮显示匹配的字符,放在最后
  • -f 模式匹配文件,grep ‐f pattern_file grep_test_file ,这个目前不会
  • -e 多模式匹配,类似于sed 里面的 -e

Guess you like

Origin blog.csdn.net/Gou_Hailong/article/details/109470583