Regular expressions learning - getting started

Regular Expressions:

   Regular Experssion is a powerful, convenient and efficient text processing tools. Regular expression itself, plus as a general-purpose programming language pocket-sized model representation (general patten notation), giving users the ability to describe and analyze the text, with the additional support of specific tools, regular expressions can add, delete, , separation, stacking, trimming and insertion of text and various types of data.

Regular Expressions practical problem solving:

  1. Multi-word document complex filter criteria.
  2. Text filtering
  3. Matching conditions

Text retrieval tool Egrep:

  1. Examples of use: egrep '^ (From | Subject):' mailbox-file: the file matches the beginning of the message is from, subject, date the beginning of the line, and the purpose is to take to generate a mailing list
  2. Mail content: mail-01.txt
    The From: Link 
    Subject: Dragon Boat Festival blessing Mail 
    Date: 2019-06-07 
    
    Here is the body of the message ...
  3. actual effect:
    7ac72269c-ZBMAC: Documents Link $ egrep '^ (the From | Subject | a Date):' mail-01 .txt 
    the From : Link
     Subject : Dragon Boat Festival blessing-mail
     a Date : 2019-06-07 
    ZBMAC -7ac72269c: Documents Link $

     

  4. Interprets the expression: ^ caret, is one of the regular expression metacharacters that match the beginning of the text. () Defines the range of action of expression. | Means 'or' meaning.

Metacharacters knowledge:

Start and end of the line:

  1. When checking a line of text, ^ represents the beginning of the end of the line $ representatives of the line.
  2. The best habit reader to understand the character in accordance with the regular expression: for example, ^ cat matches based on the first character of c as a line, followed by a a t followed by a text.

Character Group

  1. Matching one of several characters: [abc], it represents the match is a, b, or c.
  2. Example: '<H [123456]>' is used to match <H1>, <H2>, <H3> to <H6> tag.
  3. Character component character '-' indicates a range. The above example may be changed to '<H [1-6]>', hyphen as the beginning, not meta characters.
  4. [A-zA-z] shows a similar range. Note: Only in the character set, '-' is yuan characters.
  5. Negated character groups: [^ ...], when the caret ^ in the character set, the character matches the character set represents not listed.

Matches any character:

  1. '' Point, matches any character.
  2. But in the group of characters, Not metacharacter. Note: inside and outside, the definition and meaning of metacharacters character set is not the same.

Word delimiters:

  1. '\ <' And '\>'
  2. Greater than sign less than sign itself is not a metacharacter, only in conjunction with a slash after the meta characters.
  3. Not all versions of egrep support this meta-characters.

Metacharacters summary:

Metacharacter matches a single character

. Point No. Matches any single character
[...] Character Group Match single character listed
[^...] Negated character set Match single character not listed
\char Escape character If the char is meta characters or escape sequences no special meaning, the ordinary character matches the corresponding char

It provides counting function metacharacters

question mark Allow a match, but not required
* Asterisk Can match any number of times, it may not match
+ plus You need to match at least once, most likely any number of times
{Min, max} Interval quantifier (not all versions of egrep support) At least min and at most times allowable max

Character matched positions

^ Caret marks Matches the beginning of a line
$ The dollar sign Matching the end of a line
\< Word delimiters (not all versions of egrep support) Matching list starting position
\> Word delimiters (not all versions of egrep support) End position matching words
Other metacharacters
| alternation Matches any separated list of expressions
(...) brackets Alternation defining a range denoted by an element of quantifiers, the reverse reference to "capture" the text.
\1,\2,... 反向引用(非所有版本egrep都支持) 匹配之前的第一、第二组括号内的字符表达式,匹配的文本

  文章属于读《精通正则表达式》一书的笔记,写的相对比较粗略,书中描述的比较细致,整个入门就占用的一章的笔墨,但是对于没有一点基础的非常适合。这里列出了正则表达式的元字符,掌握了这些,应对常用的正则表达式的书写基本够用了。

  比较受用的思想就是:注意正则表达式是以字符为单元的,所有的元字符限定权限都是作用于前一个字符或者表达式。

 

正则表达式术语汇总

  正则:regex(regular expression)

  匹配:matching

  元字符:metacharacter

  流派:flavor

  子表达式:subexpression

  字符:character

 

Guess you like

Origin www.cnblogs.com/sunlightlee/p/10993773.html