Linux Basic Learning-11-Regular Expression of Linux Operating System

Chapter 1 What are Regular Expressions

  1. Regular expressions are a set of rules and methods defined for processing a large number of text | strings
  2. With the help of these special symbols defined, system administrators can quickly filter, replace or output the required strings. Linux regular expressions are generally processed in units of lines.

simply say

  • A set of rules and methods defined for processing large amounts of text|strings
  • Comes out in row units, processing one row at a time

A regular expression is a pattern that describes a set of strings, similar to numeric expressions, and forms smaller expressions through various operators

Chapter 2 Why Regular Expressions Are Used

Linux operation and maintenance work, a lot of filtering log work, simplifying complexity.
Simple and efficient.
Regular expression advanced tool; all three Musketeers support

Chapter 3 Two Confusion Notes

  • Regular expressions are widely used and exist in various languages, supported by php perl grep sed awk. ls * wildcard
  • But now I am learning regular expressions in Linux. The most commonly used commands for regular expressions are grep (egrep), sed, and awk.
  • There is a fundamental difference between regular expressions and wildcards

Regular expressions are used to find: [file] content, text, strings. Generally, only the Three Musketeers support
wildcards to find: file names, common commands are supported

Chapter 4 Notes on Using Regular Expressions

  1. linux regex to process strings in row units

  2. It is easy to distinguish the filtered strings, and it must be learned with the grep/egrep command.

QQ20170113-120100@2x.png-44.5kB

  1. Pay attention to the character set, exportLC_All=C: pay attention to the character set whenever and wherever you do
    QQ20170113-120538@2x.png-60.2kB

Chapter 5 Classification of Regular Expressions

The POSIX specification divides regular expressions into two types

  • Basic regular expression (BRE, basic regular expression)
  • Advanced features: extended regular expression (ERE, extended regular expression)

5.1 The difference between BRE and ERE is only the difference of metacharacters:

  • BRE (Basic Regular Expression) only recognizes metacharacters ^$.[]* Other characters are recognized as ordinary characters: \(\)
  • ERE (Extended Regular Expression) adds (){}?+| etc.
  • The characters (){} are treated as metacharacters in BRE only if they are escaped with a backslash "", whereas in ERE any metasymbol preceded by a backslash will instead make it treated as a metacharacter treated as ordinary characters.

Chapter 6 How to distinguish between wildcards and regular expressions

  1. Judgment method without thinking: in the Three Musketeers awk, sed, grep, egrep are all regular, and others are wildcards
  2. The easiest way to distinguish between wildcards and regular expressions:

(1) file directory name ===> wildcard
(2) file content (string, text [file] content) ===> regular expression

  1. Wildcards and regular expressions have "*", "?", "[]", but these symbols of wildcards can represent any character by themselves, while these symbols of regular expressions can only represent the characters in front of these symbols

Chapter 7 Basic Regular Expressions

7.1 Basic Regular Expressions

character describe
^ ^word searches for content starting with word

QQ20170113-132202@2x.png-30.8kB

$ word$ searches for content that ends with word

QQ20170113-132502@2x.png-36.3kB

^$ Indicates an empty line, not a space

QQ20170113-192441@2x.png-133.1kB

. Represents and can only represent any one character (does not match blank lines)

QQ20170113-192802@2x.png-19.3kB

\ Escape characters, let the characters with special meaning take off the vest and show the original shape, such as \. only means decimal point

QQ20170113-193830@2x.png-29.5kB

* Repeat the previous character or text 0 or more times, and the previous text or character 0 or more times in a row

QQ20170113-200334@2x.png-86.6kB

.* any number of characters

QQ20170113-195151@2x.png-83.1kB

^.* Start with any number of strings, .* as many as possible, as many as possible, greedy

QQ20170114-093126@2x.png-45.7kB

bracket expression  
[abc][0-9][\.,/] Match any character a or b or c in the character set: [az] matches all lowercase letters; it means a whole, with infinite possibilities; [abc] find a or b or c can be written as [ac]

QQ20170114-100040@2x.png-82.2kB

[^abc] Matches any character a or b or c that does not contain ^, which is the negation of [abc] and has a different meaning from ^

QQ20170114-100916@2x.png-37.3kB

a\{n,m\} Repeat the preceding a character n to m times (if you use egrep or sed -r to remove the slash)

QQ20170114-102002@2x.png-67.9kB

a\{n,\} Repeat the previous a character at least n times, if you use egrep or sed -r to remove the slash
a\{n\} Repeat the previous a character n times, if you use egrep or sed -r to remove the slash
--- ---

Chapter 8 Extended Regular Expressions ERE

Special characters Meaning and Examples
+ Repeat the previous character one or more times, the previous character is one or more consecutive, and take out the consecutive text/character

QQ20170114-104142@2x.png-97.8kB

Repeat the previous character 0 or 1 times (. is there and only 1)

QQ20170114-104617@2x.png-55.8kB

pipe character Represents or filters multiple characters at the same time

Screenshot 2017-01-14 10.57.12.png-92.5kB

() 分组过滤被括起来的东西表示一个整体(一个字符),后向引用

QQ20170114-111147@2x.png-72.7kB

第9章 正则小结

  • 基础正则:BRE
    |^|$|.||.|[abc]|[^abc]|
    |---|---|

  • 扩展正则:ERE
    |+|||?|()|{}|a{n,m}|a{n,}|a{n}|
    |---|---|

  • 转义字符\:将字符的意思改变(不支持正则符号的,转变字符含义为正则,支持正则的转变为普通字符含义)

注意:

  • grep默认不支持正则,因此正则表达式的符号对于grep来说就等同于普通字符含义,因此,想让grep直接处理正则符号必须通过转义字符\{\}来处理。
  • grep -E 强制让grep直接认识正则符号,不需要再进行转义
  • egrep 等效grep -E 天生就能认识正则符号
  • 我们平时备份可以通过cp 文件名{,.bak}的形式进行,避免再打一次文件名
    sed -r :让sed支持正则

第10章 基本正则和扩展正则区别

基础正则BRE 扩展正则ERE
\? ?
\+ +
\{\} {}
\( \ ) ()
\  

所谓基础正则实际上就是得需要转义字符配合表达的正则,而扩展正则就是让命令扩展它的权限让他直接就认识正则表达符号(egrep,sed -r,awk直接支持)

第11章 补充说明

11.1 一些预定义的:

正则表达式 描述 示例
[:alnum:] [a-zA-Z0-9]匹配任意一个字母或数字字符 [[:alnum:]]+
[:alpha:] 匹配任意一个字母字符(包括大小写字母) [[:alpha:]]{4}
[:blank:] 空格与制表符(横向纵向) [[:blank:]]*
[:digit:] 匹配任意一个数字字符 [[:digit:]]?
[:lower:] 匹配小写字母 [[:lower:]]{5,}
[:upper:] 匹配大写字母 ([[:upper:]]+)?
[:punct:] 匹配标点符号 [[:point:]]
[:space:] matches all whitespace characters including newlines, carriage returns, etc. [[:space:]]+
[:graph:] matches any visible and printable character [[:graph:]]
[:xdigit:] any hexadecimal number [[:xdigit:]]+
[:cntrl:] Any one of the control characters (the first 32 characters in the ASCII character set) [[:cntrl:]]
[:print:] any printable character [[:print:]]

11.2 Metacharacters

Metacharacters are Perl-style regular expressions, only some text processing tools support it, not all text processing tools support it

regular expression describe Example
\b word boundaries \bcool\b matches cool, not cool
\B non-word boundaries cool\B matches cool and does not match cool
\d single numeric character b\db matches b2b, not bcb
\D single non-numeric character b\Db matches bcb does not match b2b
\w single word characters (letters, numbers and _) \w matches 1 or a, not &
\W single non-word character \W matches &, not 1 or a
\n newline \n matches a new line
\s single whitespace character x\sx matches xx, does not match xx
\S single non-whitespace character x\S\x matches xkx, not xx
\r Enter \r matches carriage return
\t horizontal tab \t matches a horizontal tab
\v vertical tab \v matches a vertical tab
\f form feed \f matches a form feed

Chapter 12 Regular Expression Summary

  • egrep/grep learn about the regularity, simply look at the effect, the result
  • egrep/grep -o parameter to see what the regex matches
  • Just practice more, with grep, egrep, sed -r, awk is more powerful

Chapter 13 References

Click me to view: 30-minute introductory tutorial on regular expressions

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325950805&siteId=291194637