Chapter 1 What are Regular Expressions
- Regular expressions are a set of rules and methods defined for processing a large number of text | strings
- With the help of these special symbols defined, system administrators can quickly filter, replace or output the required strings. Linux regular expressions are generally processed in units of lines.
simply say
- A set of rules and methods defined for processing large amounts of text|strings
- Comes out in row units, processing one row at a time
A regular expression is a pattern that describes a set of strings, similar to numeric expressions, and forms smaller expressions through various operators
Chapter 2 Why Regular Expressions Are Used
Linux operation and maintenance work, a lot of filtering log work, simplifying complexity.
Simple and efficient.
Regular expression advanced tool; all three Musketeers support
Chapter 3 Two Confusion Notes
- Regular expressions are widely used and exist in various languages, supported by php perl grep sed awk. ls * wildcard
- But now I am learning regular expressions in Linux. The most commonly used commands for regular expressions are grep (egrep), sed, and awk.
- There is a fundamental difference between regular expressions and wildcards
Regular expressions are used to find: [file] content, text, strings. Generally, only the Three Musketeers support
wildcards to find: file names, common commands are supported
Chapter 4 Notes on Using Regular Expressions
-
linux regex to process strings in row units
-
It is easy to distinguish the filtered strings, and it must be learned with the grep/egrep command.
- Pay attention to the character set, exportLC_All=C: pay attention to the character set whenever and wherever you do
Chapter 5 Classification of Regular Expressions
The POSIX specification divides regular expressions into two types
- Basic regular expression (BRE, basic regular expression)
- Advanced features: extended regular expression (ERE, extended regular expression)
5.1 The difference between BRE and ERE is only the difference of metacharacters:
- BRE (Basic Regular Expression) only recognizes metacharacters ^$.[]* Other characters are recognized as ordinary characters: \(\)
- ERE (Extended Regular Expression) adds (){}?+| etc.
- The characters (){} are treated as metacharacters in BRE only if they are escaped with a backslash "", whereas in ERE any metasymbol preceded by a backslash will instead make it treated as a metacharacter treated as ordinary characters.
Chapter 6 How to distinguish between wildcards and regular expressions
- Judgment method without thinking: in the Three Musketeers awk, sed, grep, egrep are all regular, and others are wildcards
- The easiest way to distinguish between wildcards and regular expressions:
(1) file directory name ===> wildcard
(2) file content (string, text [file] content) ===> regular expression
- Wildcards and regular expressions have "*", "?", "[]", but these symbols of wildcards can represent any character by themselves, while these symbols of regular expressions can only represent the characters in front of these symbols
Chapter 7 Basic Regular Expressions
7.1 Basic Regular Expressions
^ |
^word searches for content starting with word |
[abc][0-9][\.,/] |
Match any character a or b or c in the character set: [az] matches all lowercase letters; it means a whole, with infinite possibilities; [abc] find a or b or c can be written as [ac] |
a\{n\} |
Repeat the previous a character n times, if you use egrep or sed -r to remove the slash |
--- |
--- |
Chapter 8 Extended Regular Expressions ERE
+ |
Repeat the previous character one or more times, the previous character is one or more consecutive, and take out the consecutive text/character |
第9章 正则小结
-
基础正则:BRE
|^|$|.||.|[abc]|[^abc]|
|---|---|
-
扩展正则:ERE
|+|||?|()|{}|a{n,m}|a{n,}|a{n}|
|---|---|
-
转义字符\:将字符的意思改变(不支持正则符号的,转变字符含义为正则,支持正则的转变为普通字符含义)
注意:
- grep默认不支持正则,因此正则表达式的符号对于grep来说就等同于普通字符含义,因此,想让grep直接处理正则符号必须通过转义字符\{\}来处理。
- grep -E 强制让grep直接认识正则符号,不需要再进行转义
- egrep 等效grep -E 天生就能认识正则符号
- 我们平时备份可以通过cp 文件名{,.bak}的形式进行,避免再打一次文件名
sed -r :让sed支持正则
第10章 基本正则和扩展正则区别
\? |
? |
\+ |
+ |
\{\} |
{} |
\( \ ) |
() |
\ |
|
所谓基础正则实际上就是得需要转义字符配合表达的正则,而扩展正则就是让命令扩展它的权限让他直接就认识正则表达符号(egrep,sed -r,awk直接支持)
第11章 补充说明
11.1 一些预定义的:
[:alnum:] |
[a-zA-Z0-9]匹配任意一个字母或数字字符 |
[[:alnum:]]+ |
[:alpha:] |
匹配任意一个字母字符(包括大小写字母) |
[[:alpha:]]{4} |
[:blank:] |
空格与制表符(横向纵向) |
[[:blank:]]* |
[:digit:] |
匹配任意一个数字字符 |
[[:digit:]]? |
[:lower:] |
匹配小写字母 |
[[:lower:]]{5,} |
[:upper:] |
匹配大写字母 |
([[:upper:]]+)? |
[:punct:] |
匹配标点符号 |
[[:point:]] |
[:space:] |
matches all whitespace characters including newlines, carriage returns, etc. |
[[:space:]]+ |
[:graph:] |
matches any visible and printable character |
[[:graph:]] |
[:xdigit:] |
any hexadecimal number |
[[:xdigit:]]+ |
[:cntrl:] |
Any one of the control characters (the first 32 characters in the ASCII character set) |
[[:cntrl:]] |
[:print:] |
any printable character |
[[:print:]] |
11.2 Metacharacters
Metacharacters are Perl-style regular expressions, only some text processing tools support it, not all text processing tools support it
\b |
word boundaries |
\bcool\b matches cool, not cool |
\B |
non-word boundaries |
cool\B matches cool and does not match cool |
\d |
single numeric character |
b\db matches b2b, not bcb |
\D |
single non-numeric character |
b\Db matches bcb does not match b2b |
\w |
single word characters (letters, numbers and _) |
\w matches 1 or a, not & |
\W |
single non-word character |
\W matches &, not 1 or a |
\n |
newline |
\n matches a new line |
\s |
single whitespace character |
x\sx matches xx, does not match xx |
\S |
single non-whitespace character |
x\S\x matches xkx, not xx |
\r |
Enter |
\r matches carriage return |
\t |
horizontal tab |
\t matches a horizontal tab |
\v |
vertical tab |
\v matches a vertical tab |
\f |
form feed |
\f matches a form feed |
Chapter 12 Regular Expression Summary
- egrep/grep learn about the regularity, simply look at the effect, the result
- egrep/grep -o parameter to see what the regex matches
- Just practice more, with grep, egrep, sed -r, awk is more powerful
Chapter 13 References
Click me to view: 30-minute introductory tutorial on regular expressions