Getting started to learn regular expressions regex (update ing)

prologue

  • “以正则表达式来思考(think regularexpression)”——Proficient in regular expressions (third edition)
  • The so-called regular expression is 一种描述字符串结构模式的形式化表达方法.

In the early stages of development, this method was limited to describing regular texts, hence the name " 正则表达式(regular expression)".

With further research and development of regular expressions, in particular Perlthe practice and exploration of language, regular expression capability has greatly exceeded the traditional restrictions on math, 成为威力巨大的实用工具,在几乎所有主流语言中获得支持。

This is not the case, and even 功能稍强大一些的文本编辑工具(IDEA、VS Code)supports regular expressions.

Especially after the rise of the Web, most or all of the development tasks are the processing of strings. Compared with simple string comparison, search, and replacement, regular expressions provide much more powerful processing capabilities (most importantly, it can process strings that "conform to a certain abstract pattern", rather than solid, concrete String).

  • 熟练运用它们,能够节省大量的开发时间。

Advantage

  1. On the one hand, because of regular expressions 处理的对象是字符串,或者抽象地说,是一个对象序列, and this is precisely the fact 当今计算机体系的本质数据结构that most of the work we do around computers comes down to this 序列上的操作. Therefore, regular expressions have a wide range of uses.

  2. On the other hand, unlike most other technologies, 正则表达式具有超强的结构描述能力in a computer, it is a different structure that organizes the undifferentiated bytes into vastly different software objects, and then combines them into an omnipotent software system. Therefore, 描述了结构,就等于描述了系统。in this respect , The status of regular expressions is unique.

What is regular expression?

  • Regular expressions are ``a group of special texts composed of words and symbols`, which can be used to find sentences in the text that meet the format you want.

  • Regular expressions can match a given pattern from ⼀ ⼀ base-string 替换⽂本中的字符串, 验证表单, 提取字符串
    and the like.

  • A regular expression is ⼀种从左到右匹配主体字符串的模式.

Regular expressionThe word ” is more confusing, and we often use the abbreviated terms “ regex” or “ regexp”.

​ Assuming a user naming rule, let the user name contain characters, numbers, underscores and hyphens, and limit the number of characters, so that the name looks less ugly.
​ We use the following regular expression to verify a user name:
image-20210224103743288

Above regular expression can accept john_doe, jo-hn_doe, john12_as.
​ But it does not match Jobecause it contains a large number of words and is too short.

getting Started

  • " /"Yes in regular expressions 表达式开始和结束的标记.

Insert picture description here

1 Basic match

  • The regular expression is actually the format when performing the search, which is composed of a combination of some characters and numbers.
    • For example: a regular expression the, it means a rule: tstart with a word , then go h, and then go e.

“the” => The fat cat sat on the mat.

  • Regular expression 123matches string 123. It is compared with the entered regular expression character by character.
  • The regular expression is ⼤⼩写敏感的, so it Thewon't match the.

“The” => The fat cat sat on the mat.

2 metacharacters

  • Regular expressions 主要依赖于元字符.
  • Metacharacters do not represent their own character meaning,他们都有特殊的含义。
    • Some metacharacters have special meaning when they are written in brackets. The following is an introduction to some metacharacters:
Metacharacter description
. The period matches any single character except for the change character
[] Character type. Match any character in brackets
[^] Negative character type. Matches any character except the brackets
* Match >=0 repeated characters before *
+ Matches >=1 repeated characters before the + sign
? The characters before the mark? are optional
{n,m} Match num characters or character sets before large brackets (n <= num <= m)
(xyz) Character set, matching a string exactly equal to xyz
| OR operator, matches the character before or after the symbol
\ Escape character, used to match some reserved characters [ ] ( ) { } . * + ? ^ $ \ |
^ Match from the beginning
$ Match from the end

2.1 Anchor

2.1.1 ^ sign

  • ^Used 检查匹配的字符串是否在所匹配字符串的开头.
    • For example, in the abcmanipulation Using expressions ^awill get results a. But if it is used, it ^bwill not match any results. Because the string abcis not at the bbeginning.
    • For example, ^(T|t)hematches Theor thethe beginning of the string.

“(T|t)he” => The car is parked in the garage.

“^(T|t)he” => The car is parked in the garage.

2.2.2 $ number

  • Similarly in ^number, $numbers Use to match 字符是否是最后⼀个.
  • For example, (at\.)$matches at.the end of the string.

“(at\.)” => The fat cat. sat. on the mat.

“(at\.)$” => The fat cat. sat. on the mat.

2.2 Operators.

  • . It is the simplest example of metacharacters.

  • .Match 任意单个字符(包括空格)but 不匹配换⾏符.

    • For example, the expression .armatches ⼀个任意字符the ⾯ the are aand rstring.

“.ar” => The car parked in the garage.

2.3 Character set (character group)

  • 字符集Also called 字符类.

  • The brackets are []used 指定⼀个字符集.

  • In the []manipulation Use the hyphen to specify a range of character sets.

    • In []the set of characters 不关⼼顺序.
  • For example, the expression [Tt]hematching theand The.

“[Tt]he” => The car parked in the garage.

  • ⽅The period in the parentheses [.]means the period.
  • Expression ar[.]matching ar.string

“ar[.]” => A garage is a good place to park a car.

2.3.1 Negative character set

  • Shoots as usual for ^representation⼀个字符串的开头
    • However, using it in the beginning when it ⼀ a ⽅ brackets, [^]it 表示这个字符集是否定的.
  • For example, the expression [^c]armatches any character other than, which is followed arby c, such as sar, gar, par, #ar, &ar...

" [^c]ar" => The car parked in the garage.

2.4 Number of repetitions

  • After ⾯ followed metacharacters +, *or ?of⽤来指定匹配⼦模式的次数 .
    • These metacharacters have different meanings in different situations .

*And +qualifiers are 贪婪, as they will match as much text, only if they are followed by a ?can achieve minimal or non-greedy match.

2.4.1 * Sign

  • *Matches *the characters that appear before ⼤于等于0 次the characters.

*And +qualifiers are 贪婪, as they will match as much text, only if they are followed by a ?can achieve minimal or non-greedy match.

  • For example, the expression a*matches 0或更多个以a开头的字符.
  • The expression [a-z]*matches ⼀个⾏中所有以⼩写字⺟开头的字符串.

“[a-z]*” => The car parked in the garage #21.

  • *Combining characters and .characters can match all characters .*.
  • *And \s(表示匹配空格的符号)to link Using such expression \s*cat\s*matching 0或更多个空格开头、0或更多个空格结尾的cat字符串.

“\s*cat\s*” => The fat cat sat on the concatenation.

2.4.2 + sign

  • +The sign matches those characters +before 出现 >=1 次the sign.
    • For example, the expression c.+tmatches 以⾸字⺟ c 开头以 t 结尾,中间跟着⾄少⼀个字符的字符串.

“c.+t” => The fat cat sat on the mat.

2.4.3? Number

  • In regular expressions, the metacharacters mark ?before the screen for 字符为可选that 出现 0 或 1 次.
    • For example, the expression [T]?hematches the string heand The.

“[T]he” => The car is parked in the garage.

“[T]?he” => The car is parked in the garage.

2.5 {} number

  • In the regular expression {}is a ⼀ 量词Frequently used to 限定⼀个或⼀组字符可以重复出现的次数.
    • For example, the expression [0-9]{2,3}matches 最少 2 位最多 3 位 0~9 的数字.

“[0-9]{2,3}” => The number was 9.9997 but we rounded it off to 10.0.

  • 可以省略第⼆个参数
    • For example, [0-9]{2,}match ⾄少两位 0~9 的数字.

“[0-9]{2,}” => The number was 9.9997 but we rounded it off to 10.0.

Distinguish the two regular expressions, the above formula is the {2,}following formula{2}

“[0-9]{2}” => The number was 9.99 97 but we rounded it off to 10.0.

  • If the comma is also omitted, it means a fixed number of repetitions.
    • For example, [0-9]{3}match三位0~9的数字

“[0-9]{3}” => The number was 9.9997 but we rounded it off to 10.0.

2.6 | or operator

  • Or operator |means , use 作判断条件.
    • E.g. (T|t)he|carMatch (T|t)heorcar

“(T|t)he|car” => The car is parked in the garage.

2.7 (…) Signature group

  • 特征标群⼀ group is written (...)in 子模式.
  • (...)It will be included in the content 被看成⼀个整体, and mathematics smaller and middle sized brackets ()the same for Use.
    • For example, the expression (ab)*matches 连续出现 0 或更多个 ab 的字符串.
    • If there is no Use (...), then the expression ab*will match 连续出现 0 或更多个 b.
  • Then ⽐ as said before, {}is selectively used to represent a specified number of times before ⾯ ⼀ characters appear.
  • However, if the {}front plus signature group (...)is 表示整个标群内的字符重复 n 次.
  • We can also in ()the use or character |representation or.
    • For example, (c|g|p)armatching caror garorpar

“(c|g|p)ar” => The car is parked in the garage.

2.8 \ escape symbol

  • Backslash \in an expression Use to escape immediately following character. Use to specify { } [ ] / \ + * . $ ^ | ?these special characters.

  • If you want to match these special characters, you must add a backslash before them \.

    • but! ! !
    • \[]Invalid inside the character group ! ! !
  • For example, .it is to match all characters except Use ⾏ character outside of the transducer. If you want to match the period of submenus that .will have to be written\.

  • The following example \.?is selective matching.

“(f|c|m)at\.?” => The fat cat sat on the mat.

“Ega.att.com” => m egawatt.com pu ting ww ega.att.comzz

“Ega \ .att \ .com” => megawatt.compu ting ww ega.att.comzz

Advanced

  • " /"Yes in regular expressions 表达式开始和结束的标记.

Insert picture description here

1 Shorthand character set

Regular expressions provide some commonly used character set abbreviations. as follows:

Shorthand Features
. All characters except for line breaks (including spaces)
\w Matches all words⺟digits, which is equivalent to [a-zA-Z0-9_]
\W Match all non-character numbers, that is, symbols, which is equivalent to: [^\w]
\d Match numbers, i.e.[0-9]
\D Matches non-digits, i.e.[^\d]
\s Matches all space characters, which is equivalent to:[\t\n\f\r\p{Z}]
\S Match all non-space characters:[^\s]
\f Match a change character
\n Match a change character
\r Match a reply
\t Matches a tab
\v Matches a vertical tab
\p Match CR/LF (equivalent to \r\n), used to match DOS end signs

2 signs

  • Using regular objects in js:new RegExp("模式"[,"标记"]))
    • pattern(模式)The text representing the regular expression
    • flags(标志)Means
      • i (Ignore case)
      • g (Find all matching characters that appear in the full text)
      • m (Multi-line search)
      • gi(Full text search, ignoring case)
      • ig(Full text search, ignoring case)

2.1 i Case Insensitive

  • Modifiers are used ito ignore the size.
  • For example, the expression /The/girepresented in 全局the search The,
    • Which iwill be modified to ignore their condition zoomed ⼩ write, it becomes a search theand The, grepresentation 全局搜索.

“/The/” => The fat cat sat on the mat.

“/The/gi” => The fat cat sat on the mat.

2.2 g Global search

  • Modifiers gfrequently-used one to perform it 全局搜索匹配, that is “不仅仅返回第⼀个匹配的,⽽是返回全部”.
    • For example, the expression /.(at)/grepresented 搜索 任意字符(除了换⾏)+ at, and 返回全部结果.

“/.(at)/” => The fat cat sat on the mat.

“/.(at)/g” => The fat cat sat on the mat.

2.3 m Multiline search (Multiline)

  • Multi ⾏ modifier mfrequently-used to 执⾏⼀个多⾏匹配.
  • As described before (^,$)Use to 检查格式是否是在待检测字符串的开头或结尾. But if we want it to be effective at the beginning and end of each line, we need to use the ``multiple modifier m`.
    • For example, the expression ``/at(.)?$/gm is the 表示small-case character a followed by the small-case character t, and any character except for the break character `can be selected at the end.
  • According to the mmodifier, the following example expression matches the end of each line:

“/.at(.)?$/” => The fat
cat sat
on the mat.

“/.at(.)?$/gm” => The fat
cat sat
on the mat.

3 Greedy vs lazy matching

*And +qualifiers are 贪婪, as they will match as much text, only if they are followed by a ?can achieve minimal or non-greedy match.

  • Regular expressions 默认use 贪婪匹配模式, which means yes in this mode 匹配尽可能⻓的⼦串.
  • We can use ?will 贪婪匹配模式转化为惰性匹配模式.

“/(.*at)/” => The fat cat sat on the mat.

“/(.*?at)/” => The fat cat sat on the mat.

End (recommended regular expression website)

​ When regular expressions are used in more complicated situations, it will be a headache, so try more. Here are 3 websites that I usually use regular expressions:

Websites for learning regular expressions:

  1. https://regexone.com/

Platforms for testing regular expressions:

  1. https://regex101.com/r/dmRygT/1
  2. https://regexr.com/

Reference

[1] Detailed explanation of greedy and non-greedy modes of regular expressions (overview)

[2] Regular expressions-a rookie tutorial

Java's regular schools

//TODO to learn...

Guess you like

Origin blog.csdn.net/weixin_43438052/article/details/114014822