1. What is a regular expression
Regular expression is a type of text that is used to retrieve text that meets certain specific patterns from the text.
Regular expressions match a string from left to right. English Regular Expression , we usually use its abbreviation "regex" or "regexp". Regular expressions can be used to replace text in strings, validate forms, extract strings from a string based on pattern matching, and so on.
2. Why learn regular expressions
Regular expressions are very commonly used in our development process, but many people do not write regular expressions. What do you do when you need to use regular expressions? The universal solution Baidu . It's really a good way. What if Baidu doesn't have the answer it wants? You can only write while learning by yourself. This article will help you write regular expressions while learning ---- common regular expressions are attached at the end of the article.
3. Metacharacters
Metacharacters are the basic elements of regular expressions. The metacharacter here is not the same as its usual meaning, but is interpreted in a special meaning. Some metacharacters have special meanings when written in square brackets. The metacharacters are as follows:
Metacharacter | description |
---|---|
. | Match any character except newline |
[] | Character class, matches any character contained in square brackets. |
[^] | Negative character class. Matches any character not contained in the square brackets. |
* | Match the preceding sub-expression zero or more times |
+ | Match the preceding sub-expression one or more times |
? | Matches the preceding subexpression zero or one time. |
{n,m} | Braces, match the preceding character at least n times, but not more than m times |
(x,y,z) | Match the characters xyz in the exact order |
| | Branch structure, match one of the characters in the branch structure |
\ | Escape character, it can restore the original meaning of metacharacters, allowing you to match reserved characters `[] () {}. * +? ^ $ \ |
^ | Start of matching line |
$ | Match the end of the line |
3.0. Examples and test methods
For example, regular expressions [H|g]ello
can match strings Hello
or hello
. So how do we test after we write the regular expression?
We can use the online regular expression testing platform for testing; for example, the rookie online test
We can also use the corresponding programming language for testing.
JavaScript
// 需要匹配的字符
var str = 'Hello'
// replace 里参数是正则表达式
var s = str.replace('[H|h]ello');
console.log(s)
Java
public static void main(String[] args) {
// 要验证的字符串
String str = "hello";
// 验证规则
String regEx = "[H|h]ello";
// 编译正则表达式
Pattern pattern = Pattern.compile(regEx);
// 忽略大小写的写法
// Pattern pat = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
// 字符串是否与正则表达式相匹配
boolean rs = matcher.matches();
System.out.println(rs);
}
Golang
package main
import (
"fmt"
"regexp"
)
func main() {
str := "hello"
matched, _ := regexp.MatchString("[H|h]ello", str)
fmt.Println(matched)
}
Python
import re
pattern = re.compile(u'[H|h]ello')
str = u'hello'
print(pattern.search(str))
1. English period
The English period .
is the simplest example of a metacharacter. Metacharacter .
matches any single character . It will not match newline and newline characters.
E.g. regular expression
.ar
It can match par, 6ar, zar, etc.
3.2. Character set
The character set is also called a character class. Square brackets are used to match character sets. Use hyphens in the character set to specify the character range. The order of the character ranges in square brackets is not important. E.g. regular expression
[tTs]he
It can match the characters the, The, and she.
Note: The. In square brackets indicates its literal meaning. In other words, it
[.]ar
can only match.ar
characters, not par, 6ar, zar and other characters.
3.3. Negative character set
Generally insert characters ^
indicate the start of a string, but when it appears in square brackets, it will cancel the character set. E.g. regular expression
[^tTs]he
It means that as long as it does not start and he
end characters with t, T, and s, it can be matched.
3.4. Repeat
Yuan characters +
, *
or ?
can be used to specify the sub-mode can occur many times. These metacharacters have different functions in different situations.
Asterisk
Symbol *
indicates that a matching rule on matching zero or more times . If it appears after the character set or character class, it means the repetition of the entire character set.
E.g. regular expression
[a-z]*
It can match any number of lowercase letters in a line.
*
It can also be .
used with, for example .*
, to match any string.
plus
Symbol +
one or more times on a matching character. E.g. regular expression
a+t
It can match aat, aaat... and other characters.
question mark
Yuan characters ?
used to represent the previous character is optional. The symbol matches the previous character zero or one time . E.g. regular expression
[Tts]?he
It can match the characters he, the, The, and she.
3.5. Braces
In regular expressions, curly braces are used to specify the number of times a character or a group of characters can be repeated. E.g. regular expression
[0-9]{
2,5}
Indicates that the numbers 0-9 can be repeated at least 2 times and at most 5 times.
Can also be written as
[0,9]{
2,}
Indicates that the numbers 0~9 are repeated at least 2 times
or
[0.9]{
5}
Indicates that 0~9 are repeated 5 times at most
3.6. Branch structure
Metacharacters are |
used to define the branch structure, which is like a condition (switch) between multiple expressions. E.g. regular expression
[Tts]he
After testing, you may think that there is no difference between the branch structure and the character set, but this is not the case. The biggest difference between the branch structure of the character set is that the character set only works at the character level, but the branch structure still works at the expression level. E.g. regular expression
[Tts]he|car
It can match the two types of The, the, she or car.
3.7. Transfer special characters
Regular expression backslash \
to escape the next character. This will allow you to use reserved characters as matching characters { } [ ] / \ + * . $ ^ | ?
. Add in front of the special character \
, you can use it to do the matching character. E.g. regular expression
(c|m)at\.
Can match mat.
,cat.
3.8. Locator
In regular expressions, in order to check whether the matched symbol is the start symbol or the end symbol, you can use the locator to ^
check whether the character is the start character, and use to $
check whether the matched character is the last character of the character.
Start locator
Caret ^
symbol matches for checking whether the character is the first character of the string. E.g. regular expression
^[Tt]he
It can match all characters beginning with the
or The
.
End locator
The dollar sign $
can check whether a set of characters ends with a certain subcharacter. E.g. regular expression
end$
It can match all end
ending characters.
4. Shorthand character set
Regular expressions provide abbreviations for commonly used character sets and commonly used regular expressions. The abbreviated character set is as follows:
Shorthand | description |
---|---|
. | Match any character other than the newline character |
\w | Match all letters and numbers characters: [a-zA-Z0-9_] |
\W | Match non-letter and numeric characters: [^\w] |
\d | Match numbers: [0-9] |
\D | Match non-digits: [^\d] |
\s | Matching space characters: [\t\n\f\r\p{Z}] |
\S | Match non-space characters: [^\s] |
5. Mark
Markers are also called modifiers because they modify the output of regular expressions. These flags can be used in any order or combination and are part of regular expressions.
mark | description |
---|---|
i | Case insensitive: Set the match to be case insensitive. |
g | Global search: Search for all matches in the string. |
m | Multi-line matching: will match each line of the input string. |
The usage of tags in regular expressions is as follows
正则表达式/标记符
Note: The online test identifier is in the options, so you don't need to write it out.
Characters are not case sensitive and search globally
\w\gi
Global search for multi-line matches for characters
\w\gm
6. Affirmation
Assertions satisfy a certain condition. Assertions are called zero-width assertions in some places . They are used to (not) match characters before or after certain characters. There are several types of assertions:
symbol | description |
---|---|
(?=exp) | Positive lookahead assertion: regular is used to match?=The content in front satisfies?=The following is exp |
(?<=exp) | Positive backward assertion: regular is used to match the content that satisfies a certain condition after exp |
(?!exp) | Negative positive line assertion: regular is used to match content that does not meet a certain condition after exp |
(?<!exp) | Negative backward line assertion: regular is used to match content that does not meet a certain condition after exp |
Positive lookahead
Positive lookahead assertion is used to find a certain content and use regularity to match the content before it. E.g. regular expression
(H|h)(?=ello)
It is used to return or ello
if it is preceded by h
or after it is found .H
H
h
Positive backward assertion
Backward lookahead assertion is used to find a certain content and use regularity to match the content behind it. E.g. regular expression
(?<=[h|H])ello
It is used to find H
or h
after, to match the following content is it right? ello
If it is to returnello
Negative lookahead
(H|h)(?!ello)
After finding H
or h
not following ello
, return H
orh
Negative backward assertion
(?<!(H|h))ello
After finding ello
, if the preceding is not H
or h
, returnello
7. Commonly used regular expressions
7.1. Mailbox
[email protected]
Only English letters, numbers, underscores, periods, and underscores are allowed
^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+$
7.2. Mobile phone number
13012345678
phone number
^1(3|4|5|6|7|8|9)\d{
9}$
7.3. Domain name
https://google.com/
^((http:\/\/)|(https:\/\/))?([a-zA-Z0-9]([a-zA-Z0-9\-]{
0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{
2,6}(\/)
7.4.ip
127.0.0.1
((?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){
3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d))
7.5. Account verification
pikachues_001
Start with a letter, allow 5-16 bytes, allow alphanumeric underscores
^[a-zA-Z][a-zA-Z0-9_]{
4,15}$
7.6. Character verification
Chinese character
^[\u4e00-\u9fa5]{
0,}$
English and numbers
A string consisting of numbers and 26 English letters
^[A-Za-z0-9]+$
A string of 26 English letters
^[A-Za-z]+$
A string of 26 uppercase English letters
^[A-Z]+$
A string consisting of 26 lowercase English letters
^[a-z]+$
All characters with a length of 3-20
^.{
3,20}$
Chinese, English, numbers including underscore
^[\u4E00-\u9FA5A-Za-z0-9_]+$
Chinese, English, numbers but not including underscore and other symbols
^[\u4E00-\u9FA5A-Za-z0-9]+$
7.7. Digital verification
Integer
^-?[1-9]\d*$
Positive integer
^[1-9]\d*$
Negative integer
^-[1-9]\d*$
Non-negative integer
^[1-9]\d*|0$
Non-positive integer
^-[1-9]\d*|0$
Floating point
^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
Number of positive floating points
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$
Floating point number
^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$
Non-negative floating point
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$
Non-normal floating point number
^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$