Article Directory
prologue
“以正则表达式来思考(think regularexpression)”
——Proficient in regular expressions (third edition)
- The so-called regular expression is
一种描述字符串结构模式的形式化表达方法
.
In the early stages of development, this method was limited to describing regular texts, hence the name "
正则表达式(regular expression)
".
With further research and development of regular expressions, in particular
Perl
the practice and exploration of language, regular expression capability has greatly exceeded the traditional restrictions on math,成为威力巨大的实用工具
,在几乎所有主流语言中获得支持。
This is not the case, and even
功能稍强大一些的文本编辑工具(IDEA、VS Code)
supports regular expressions.
Especially after the rise of the Web, most or all of the development tasks are the processing of strings. Compared with simple string comparison, search, and replacement, regular expressions provide much more powerful processing capabilities (most importantly, it can process strings that "conform to a certain abstract pattern", rather than solid, concrete String).
熟练运用它们,能够节省大量的开发时间。
Advantage
-
On the one hand, because of regular expressions
处理的对象是字符串,或者抽象地说,是一个对象序列
, and this is precisely the fact当今计算机体系的本质数据结构
that most of the work we do around computers comes down to this序列上的操作
. Therefore, regular expressions have a wide range of uses. -
On the other hand, unlike most other technologies,
正则表达式具有超强的结构描述能力
in a computer, it is a different structure that organizes the undifferentiated bytes into vastly different software objects, and then combines them into an omnipotent software system. Therefore,描述了结构,就等于描述了系统。
in this respect , The status of regular expressions is unique.
What is regular expression?
Regular expressions are ``a group of special texts composed of words and symbols`, which can be used to find sentences in the text that meet the format you want.
Regular expressions can match a given pattern from ⼀ ⼀ base-string
替换⽂本中的字符串
,验证表单
,提取字符串
and the like.
- A regular expression is
⼀种从左到右匹配主体字符串的模式
.
Regular expression
The word “ ” is more confusing, and we often use the abbreviated terms “regex
” or “regexp
”.
Assuming a user naming rule, let the user name contain characters, numbers, underscores and hyphens, and limit the number of characters, so that the name looks less ugly.
We use the following regular expression to verify a user name:
Above regular expression can accept john_doe
, jo-hn_doe
, john12_as
.
But it does not match Jo
because it contains a large number of words and is too short.
getting Started
- "
/
"Yes in regular expressions表达式开始和结束的标记
.
1 Basic match
- The regular expression is actually the format when performing the search, which is composed of a combination of some characters and numbers.
- For example: a regular expression
the
, it means a rule:t
start with a word , then goh
, and then goe
.
- For example: a regular expression
“the” => The fat cat sat on the mat.
- Regular expression
123
matches string123
. It is compared with the entered regular expression character by character. - The regular expression is
⼤⼩写敏感的
, so itThe
won't matchthe
.
“The” => The fat cat sat on the mat.
2 metacharacters
- Regular expressions
主要依赖于元字符
. - Metacharacters do not represent their own character meaning,
他们都有特殊的含义。
- Some metacharacters have special meaning when they are written in brackets. The following is an introduction to some metacharacters:
Metacharacter | description |
---|---|
. |
The period matches any single character except for the change character |
[] |
Character type. Match any character in brackets |
[^] |
Negative character type. Matches any character except the brackets |
* |
Match >=0 repeated characters before * |
+ |
Matches >=1 repeated characters before the + sign |
? |
The characters before the mark? are optional |
{n,m} |
Match num characters or character sets before large brackets (n <= num <= m) |
(xyz) |
Character set, matching a string exactly equal to xyz |
| |
OR operator, matches the character before or after the symbol |
\ |
Escape character, used to match some reserved characters [ ] ( ) { } . * + ? ^ $ \ | |
^ |
Match from the beginning |
$ |
Match from the end |
2.1 Anchor
2.1.1 ^ sign
^
Used检查匹配的字符串是否在所匹配字符串的开头
.- For example, in the
abc
manipulation Using expressions^a
will get resultsa
. But if it is used, it^b
will not match any results. Because the stringabc
is not at theb
beginning. - For example,
^(T|t)he
matchesThe
orthe
the beginning of the string.
- For example, in the
“(T|t)he” => The car is parked in the garage.
“^(T|t)he” => The car is parked in the garage.
2.2.2 $ number
- Similarly in
^
number,$
numbers Use to match字符是否是最后⼀个
. - For example,
(at\.)$
matchesat.
the end of the string.
“(at\.)” => The fat cat. sat. on the mat.
“(at\.)$” => The fat cat. sat. on the mat.
2.2 Operators.
-
.
It is the simplest example of metacharacters. -
.
Match任意单个字符(包括空格)
but不匹配换⾏符
.- For example, the expression
.ar
matches⼀个任意字符
the ⾯跟
the area
andr
string.
- For example, the expression
“.ar” => The car parked in the garage.
2.3 Character set (character group)
-
字符集
Also called字符类
. -
The brackets are
[]
used指定⼀个字符集
. -
In the
[]
manipulation Use the hyphen to specify a range of character sets.- In
[]
the set of characters不关⼼顺序
.
- In
-
For example, the expression
[Tt]he
matchingthe
andThe
.
“[Tt]he” => The car parked in the garage.
- ⽅The period in the parentheses
[.]
means the period.
- Expression
ar[.]
matchingar.
string
“ar[.]” => A garage is a good place to park a car.
2.3.1 Negative character set
- Shoots as usual for
^
representation⼀个字符串的开头
- However, using it in the beginning when it ⼀ a ⽅ brackets,
[^]
it表示这个字符集是否定的
.
- However, using it in the beginning when it ⼀ a ⽅ brackets,
- For example, the expression
[^c]ar
matches any character other than, which is followedar
byc
, such as sar, gar, par, #ar, &ar...
" [^c]ar" => The car parked in the garage.
2.4 Number of repetitions
- After ⾯ followed metacharacters
+
,*
or?
of⽤来指定匹配⼦模式的次数
.- These metacharacters have different meanings in different situations .
*
And+
qualifiers are贪婪
, as they will match as much text, only if they are followed by a?
can achieve minimal or non-greedy match.
2.4.1 * Sign
*
Matches*
the characters that appear before⼤于等于0 次
the characters.
*
And+
qualifiers are贪婪
, as they will match as much text, only if they are followed by a?
can achieve minimal or non-greedy match.
- For example, the expression
a*
matches0或更多个以a开头的字符
. - The expression
[a-z]*
matches⼀个⾏中所有以⼩写字⺟开头的字符串
.
“[a-z]*” => The car parked in the garage #21.
*
Combining characters and.
characters can match all characters.*
.*
And\s(表示匹配空格的符号)
to link Using such expression\s*cat\s*
matching0或更多个空格开头、0或更多个空格结尾的cat字符串
.
“\s*cat\s*” => The fat cat sat on the concatenation.
2.4.2 + sign
+
The sign matches those characters+
before出现 >=1 次
the sign.- For example, the expression
c.+t
matches以⾸字⺟ c 开头以 t 结尾,中间跟着⾄少⼀个字符的字符串
.
- For example, the expression
“c.+t” => The fat cat sat on the mat.
2.4.3? Number
- In regular expressions, the metacharacters mark
?
before the screen for字符为可选
that出现 0 或 1 次
.- For example, the expression
[T]?he
matches the stringhe
andThe
.
- For example, the expression
“[T]he” => The car is parked in the garage.
“[T]?he” => The car is parked in the garage.
2.5 {} number
- In the regular expression
{}
is a ⼀量词
Frequently used to限定⼀个或⼀组字符可以重复出现的次数
.- For example, the expression
[0-9]{2,3}
matches最少 2 位最多 3 位 0~9 的数字
.
- For example, the expression
“[0-9]{2,3}” => The number was 9.9997 but we rounded it off to 10.0.
可以省略第⼆个参数
。- For example,
[0-9]{2,}
match⾄少两位 0~9 的数字
.
- For example,
“[0-9]{2,}” => The number was 9.9997 but we rounded it off to 10.0.
Distinguish the two regular expressions, the above formula is the
{2,}
following formula{2}
“[0-9]{2}” => The number was 9.99 97 but we rounded it off to 10.0.
- If the comma is also omitted, it means a fixed number of repetitions.
- For example,
[0-9]{3}
match三位0~9的数字
- For example,
“[0-9]{3}” => The number was 9.9997 but we rounded it off to 10.0.
2.6 | or operator
- Or operator
|
means或
, use作判断条件
.- E.g.
(T|t)he|car
Match(T|t)he
orcar
- E.g.
“(T|t)he|car” => The car is parked in the garage.
2.7 (…) Signature group
特征标群
⼀ group is written(...)
in子模式
.(...)
It will be included in the content被看成⼀个整体
, and mathematics smaller and middle sized brackets()
the same for Use.- For example, the expression
(ab)*
matches连续出现 0 或更多个 ab 的字符串
. - If there is no Use
(...)
, then the expressionab*
will match连续出现 0 或更多个 b
.
- For example, the expression
- Then ⽐ as said before,
{}
is selectively used to represent a specified number of times before ⾯ ⼀ characters appear. - However, if the
{}
front plus signature group(...)
is表示整个标群内的字符重复 n 次
. - We can also in
()
the use or character|
representation or.- For example,
(c|g|p)ar
matchingcar
orgar
orpar
- For example,
“(c|g|p)ar” => The car is parked in the garage.
2.8 \ escape symbol
-
Backslash
\
in an expression Use to escape immediately following character. Use to specify{ } [ ] / \ + * . $ ^ | ?
these special characters. -
If you want to match these special characters, you must add a backslash before them
\
.- but! ! !
\
[]
Invalid inside the character group ! ! !
-
For example,
.
it is to match all characters except Use ⾏ character outside of the transducer. If you want to match the period of submenus that.
will have to be written\.
-
The following example
\.?
is selective matching.
“(f|c|m)at\.?” => The fat cat sat on the mat.
“Ega.att.com” => m egawatt.com pu ting ww ega.att.comzz
“Ega \ .att \ .com” => megawatt.compu ting ww ega.att.comzz
Advanced
- "
/
"Yes in regular expressions表达式开始和结束的标记
.
1 Shorthand character set
Regular expressions provide some commonly used character set abbreviations. as follows:
Shorthand | Features |
---|---|
. |
All characters except for line breaks (including spaces) |
\w |
Matches all words⺟digits, which is equivalent to [a-zA-Z0-9_] |
\W |
Match all non-character numbers, that is, symbols, which is equivalent to: [^\w] |
\d |
Match numbers, i.e.[0-9] |
\D |
Matches non-digits, i.e.[^\d] |
\s |
Matches all space characters, which is equivalent to:[\t\n\f\r\p{Z}] |
\S |
Match all non-space characters:[^\s] |
\f |
Match a change character |
\n |
Match a change character |
\r |
Match a reply |
\t |
Matches a tab |
\v |
Matches a vertical tab |
\p |
Match CR/LF (equivalent to \r\n ), used to match DOS end signs |
2 signs
- Using regular objects in js:
new RegExp("模式"[,"标记"]))
pattern(模式)
The text representing the regular expressionflags(标志)
Meansi
(Ignore case)g
(Find all matching characters that appear in the full text)m
(Multi-line search)gi
(Full text search, ignoring case)ig
(Full text search, ignoring case)
2.1 i Case Insensitive
- Modifiers are used
i
to ignore the size. - For example, the expression
/The/gi
represented in全局
the searchThe
,- Which
i
will be modified to ignore their condition zoomed ⼩ write, it becomes a searchthe
andThe
,g
representation全局搜索
.
- Which
“/The/” => The fat cat sat on the mat.
“/The/gi” => The fat cat sat on the mat.
2.2 g Global search
- Modifiers
g
frequently-used one to perform it全局搜索匹配
, that is“不仅仅返回第⼀个匹配的,⽽是返回全部”
.- For example, the expression
/.(at)/g
represented搜索 任意字符(除了换⾏)+ at
, and返回全部结果
.
- For example, the expression
“/.(at)/” => The fat cat sat on the mat.
“/.(at)/g” => The fat cat sat on the mat.
2.3 m Multiline search (Multiline)
- Multi ⾏ modifier
m
frequently-used to执⾏⼀个多⾏匹配
. - As described before
(^,$)
Use to检查格式是否是在待检测字符串的开头或结尾
. But if we want it to be effective at the beginning and end of each line, we need to use the ``multiple modifier m`.- For example, the expression ``/at(.)?$/gm is the
表示
small-case character a followed by the small-case character t, and any character except for the break character `can be selected at the end.
- For example, the expression ``/at(.)?$/gm is the
- According to the
m
modifier, the following example expression matches the end of each line:
“/.at(.)?$/” => The fat
cat sat
on the mat.
“/.at(.)?$/gm” => The fat
cat sat
on the mat.
3 Greedy vs lazy matching
*
And+
qualifiers are贪婪
, as they will match as much text, only if they are followed by a?
can achieve minimal or non-greedy match.
- Regular expressions
默认
use贪婪匹配模式
, which means yes in this mode匹配尽可能⻓的⼦串
. - We can use
?
will贪婪匹配模式转化为惰性匹配模式
.
“/(.*at)/” => The fat cat sat on the mat.
“/(.*?at)/” => The fat cat sat on the mat.
End (recommended regular expression website)
When regular expressions are used in more complicated situations, it will be a headache, so try more. Here are 3 websites that I usually use regular expressions:
Websites for learning regular expressions:
Platforms for testing regular expressions:
Reference
[1] Detailed explanation of greedy and non-greedy modes of regular expressions (overview)
[2] Regular expressions-a rookie tutorial
Java's regular schools
//TODO to learn...