Regular Expressions:
Regular Experssion is a powerful, convenient and efficient text processing tools. Regular expression itself, plus as a general-purpose programming language pocket-sized model representation (general patten notation), giving users the ability to describe and analyze the text, with the additional support of specific tools, regular expressions can add, delete, , separation, stacking, trimming and insertion of text and various types of data.
Regular Expressions practical problem solving:
- Multi-word document complex filter criteria.
- Text filtering
- Matching conditions
Text retrieval tool Egrep:
- Examples of use: egrep '^ (From | Subject):' mailbox-file: the file matches the beginning of the message is from, subject, date the beginning of the line, and the purpose is to take to generate a mailing list
- Mail content: mail-01.txt
The From: Link Subject: Dragon Boat Festival blessing Mail Date: 2019-06-07 Here is the body of the message ...
- actual effect:
7ac72269c-ZBMAC: Documents Link $ egrep '^ (the From | Subject | a Date):' mail-01 .txt the From : Link Subject : Dragon Boat Festival blessing-mail a Date : 2019-06-07 ZBMAC -7ac72269c: Documents Link $
- Interprets the expression: ^ caret, is one of the regular expression metacharacters that match the beginning of the text. () Defines the range of action of expression. | Means 'or' meaning.
Metacharacters knowledge:
Start and end of the line:
- When checking a line of text, ^ represents the beginning of the end of the line $ representatives of the line.
- The best habit reader to understand the character in accordance with the regular expression: for example, ^ cat matches based on the first character of c as a line, followed by a a t followed by a text.
Character Group
- Matching one of several characters: [abc], it represents the match is a, b, or c.
- Example: '<H [123456]>' is used to match <H1>, <H2>, <H3> to <H6> tag.
- Character component character '-' indicates a range. The above example may be changed to '<H [1-6]>', hyphen as the beginning, not meta characters.
- [A-zA-z] shows a similar range. Note: Only in the character set, '-' is yuan characters.
- Negated character groups: [^ ...], when the caret ^ in the character set, the character matches the character set represents not listed.
Matches any character:
- '' Point, matches any character.
- But in the group of characters, Not metacharacter. Note: inside and outside, the definition and meaning of metacharacters character set is not the same.
Word delimiters:
- '\ <' And '\>'
- Greater than sign less than sign itself is not a metacharacter, only in conjunction with a slash after the meta characters.
- Not all versions of egrep support this meta-characters.
Metacharacters summary:
Metacharacter matches a single character |
||
. | Point No. | Matches any single character |
[...] | Character Group | Match single character listed |
[^...] | Negated character set | Match single character not listed |
\char | Escape character | If the char is meta characters or escape sequences no special meaning, the ordinary character matches the corresponding char |
It provides counting function metacharacters |
||
? | question mark | Allow a match, but not required |
* | Asterisk | Can match any number of times, it may not match |
+ | plus | You need to match at least once, most likely any number of times |
{Min, max} | Interval quantifier (not all versions of egrep support) | At least min and at most times allowable max |
Character matched positions |
||
^ | Caret marks | Matches the beginning of a line |
$ | The dollar sign | Matching the end of a line |
\< | Word delimiters (not all versions of egrep support) | Matching list starting position |
\> | Word delimiters (not all versions of egrep support) | End position matching words |
Other metacharacters | ||
| | alternation | Matches any separated list of expressions |
(...) | brackets | Alternation defining a range denoted by an element of quantifiers, the reverse reference to "capture" the text. |
\1,\2,... | 反向引用(非所有版本egrep都支持) | 匹配之前的第一、第二组括号内的字符表达式,匹配的文本 |
文章属于读《精通正则表达式》一书的笔记,写的相对比较粗略,书中描述的比较细致,整个入门就占用的一章的笔墨,但是对于没有一点基础的非常适合。这里列出了正则表达式的元字符,掌握了这些,应对常用的正则表达式的书写基本够用了。
比较受用的思想就是:注意正则表达式是以字符为单元的,所有的元字符限定权限都是作用于前一个字符或者表达式。
正则表达式术语汇总
正则:regex(regular expression)
匹配:matching
元字符:metacharacter
流派:flavor
子表达式:subexpression
字符:character