Teach you how to write regular expressions

1. What is a regular expression

Regular expression is a type of text that is used to retrieve text that meets certain specific patterns from the text.

Regular expressions match a string from left to right. English Regular Expression , we usually use its abbreviation "regex" or "regexp". Regular expressions can be used to replace text in strings, validate forms, extract strings from a string based on pattern matching, and so on.

2. Why learn regular expressions

Regular expressions are very commonly used in our development process, but many people do not write regular expressions. What do you do when you need to use regular expressions? The universal solution Baidu . It's really a good way. What if Baidu doesn't have the answer it wants? You can only write while learning by yourself. This article will help you write regular expressions while learning ---- common regular expressions are attached at the end of the article.

3. Metacharacters

Metacharacters are the basic elements of regular expressions. The metacharacter here is not the same as its usual meaning, but is interpreted in a special meaning. Some metacharacters have special meanings when written in square brackets. The metacharacters are as follows:

Metacharacter description
. Match any character except newline
[] Character class, matches any character contained in square brackets.
[^] Negative character class. Matches any character not contained in the square brackets.
* Match the preceding sub-expression zero or more times
+ Match the preceding sub-expression one or more times
? Matches the preceding subexpression zero or one time.
{n,m} Braces, match the preceding character at least n times, but not more than m times
(x,y,z) Match the characters xyz in the exact order
| Branch structure, match one of the characters in the branch structure
\ Escape character, it can restore the original meaning of metacharacters, allowing you to match reserved characters `[] () {}. * +? ^ $ \
^ Start of matching line
$ Match the end of the line

3.0. Examples and test methods

For example, regular expressions [H|g]ellocan match strings Helloor hello. So how do we test after we write the regular expression?

We can use the online regular expression testing platform for testing; for example, the rookie online test

image-20201206141825476

We can also use the corresponding programming language for testing.

JavaScript

// 需要匹配的字符
var str = 'Hello'
// replace 里参数是正则表达式
var s = str.replace('[H|h]ello');
console.log(s)

Java

 public static void main(String[] args) {
    
    
        // 要验证的字符串
        String str = "hello";
        // 验证规则
        String regEx = "[H|h]ello";
        // 编译正则表达式
        Pattern pattern = Pattern.compile(regEx);
        // 忽略大小写的写法
        // Pattern pat = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(str);
        // 字符串是否与正则表达式相匹配
        boolean rs = matcher.matches();
        System.out.println(rs);
    }

Golang

package main

import (
	"fmt"
	"regexp"
)

func main() {
    
    
	str := "hello"
	matched, _ := regexp.MatchString("[H|h]ello", str)
	fmt.Println(matched)
}

Python

import re
pattern = re.compile(u'[H|h]ello')
str = u'hello'
print(pattern.search(str))

1. English period

The English period .is the simplest example of a metacharacter. Metacharacter .matches any single character . It will not match newline and newline characters.

E.g. regular expression

.ar

It can match par, 6ar, zar, etc.

3.2. Character set

The character set is also called a character class. Square brackets are used to match character sets. Use hyphens in the character set to specify the character range. The order of the character ranges in square brackets is not important. E.g. regular expression

[tTs]he

It can match the characters the, The, and she.

Note: The. In square brackets indicates its literal meaning. In other words, it [.]arcan only match .archaracters, not par, 6ar, zar and other characters.

3.3. Negative character set

Generally insert characters ^indicate the start of a string, but when it appears in square brackets, it will cancel the character set. E.g. regular expression

[^tTs]he

It means that as long as it does not start and heend characters with t, T, and s, it can be matched.

3.4. Repeat

Yuan characters +, *or ?can be used to specify the sub-mode can occur many times. These metacharacters have different functions in different situations.

Asterisk

Symbol *indicates that a matching rule on matching zero or more times . If it appears after the character set or character class, it means the repetition of the entire character set.

E.g. regular expression

[a-z]*

It can match any number of lowercase letters in a line.

*It can also be .used with, for example .*, to match any string.

plus

Symbol +one or more times on a matching character. E.g. regular expression

a+t

It can match aat, aaat... and other characters.

question mark

Yuan characters ?used to represent the previous character is optional. The symbol matches the previous character zero or one time . E.g. regular expression

[Tts]?he

It can match the characters he, the, The, and she.

3.5. Braces

In regular expressions, curly braces are used to specify the number of times a character or a group of characters can be repeated. E.g. regular expression

[0-9]{
    
    2,5}

Indicates that the numbers 0-9 can be repeated at least 2 times and at most 5 times.

Can also be written as

[0,9]{
    
    2,}

Indicates that the numbers 0~9 are repeated at least 2 times

or

[0.9]{
    
    5}

Indicates that 0~9 are repeated 5 times at most

3.6. Branch structure

Metacharacters are |used to define the branch structure, which is like a condition (switch) between multiple expressions. E.g. regular expression

[Tts]he

After testing, you may think that there is no difference between the branch structure and the character set, but this is not the case. The biggest difference between the branch structure of the character set is that the character set only works at the character level, but the branch structure still works at the expression level. E.g. regular expression

[Tts]he|car

It can match the two types of The, the, she or car.

3.7. Transfer special characters

Regular expression backslash \to escape the next character. This will allow you to use reserved characters as matching characters { } [ ] / \ + * . $ ^ | ?. Add in front of the special character \, you can use it to do the matching character. E.g. regular expression

(c|m)at\.

Can match mat.,cat.

3.8. Locator

In regular expressions, in order to check whether the matched symbol is the start symbol or the end symbol, you can use the locator to ^check whether the character is the start character, and use to $check whether the matched character is the last character of the character.

Start locator

Caret ^symbol matches for checking whether the character is the first character of the string. E.g. regular expression

^[Tt]he

It can match all characters beginning with theor The.

End locator

The dollar sign $can check whether a set of characters ends with a certain subcharacter. E.g. regular expression

end$

It can match all endending characters.

4. Shorthand character set

Regular expressions provide abbreviations for commonly used character sets and commonly used regular expressions. The abbreviated character set is as follows:

Shorthand description
. Match any character other than the newline character
\w Match all letters and numbers characters: [a-zA-Z0-9_]
\W Match non-letter and numeric characters: [^\w]
\d Match numbers: [0-9]
\D Match non-digits: [^\d]
\s Matching space characters: [\t\n\f\r\p{Z}]
\S Match non-space characters: [^\s]

5. Mark

Markers are also called modifiers because they modify the output of regular expressions. These flags can be used in any order or combination and are part of regular expressions.

mark description
i Case insensitive: Set the match to be case insensitive.
g Global search: Search for all matches in the string.
m Multi-line matching: will match each line of the input string.

The usage of tags in regular expressions is as follows

正则表达式/标记符

Note: The online test identifier is in the options, so you don't need to write it out.

image-20201206201629399

Characters are not case sensitive and search globally

\w\gi

Global search for multi-line matches for characters

\w\gm

6. Affirmation

Assertions satisfy a certain condition. Assertions are called zero-width assertions in some places . They are used to (not) match characters before or after certain characters. There are several types of assertions:

symbol description
(?=exp) Positive lookahead assertion: regular is used to match?=The content in front satisfies?=The following is exp
(?<=exp) Positive backward assertion: regular is used to match the content that satisfies a certain condition after exp
(?!exp) Negative positive line assertion: regular is used to match content that does not meet a certain condition after exp
(?<!exp) Negative backward line assertion: regular is used to match content that does not meet a certain condition after exp

Positive lookahead

Positive lookahead assertion is used to find a certain content and use regularity to match the content before it. E.g. regular expression

(H|h)(?=ello)

It is used to return or elloif it is preceded by hor after it is found .HHh

Positive backward assertion

Backward lookahead assertion is used to find a certain content and use regularity to match the content behind it. E.g. regular expression

(?<=[h|H])ello

It is used to find Hor hafter, to match the following content is it right? elloIf it is to returnello

Negative lookahead

(H|h)(?!ello)

After finding Hor hnot following ello, return Horh

Negative backward assertion

(?<!(H|h))ello

After finding ello, if the preceding is not Hor h, returnello


7. Commonly used regular expressions

7.1. Mailbox

[email protected] Only English letters, numbers, underscores, periods, and underscores are allowed

^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+$

7.2. Mobile phone number

13012345678 phone number

	^1(3|4|5|6|7|8|9)\d{
    
    9}$

7.3. Domain name

https://google.com/

^((http:\/\/)|(https:\/\/))?([a-zA-Z0-9]([a-zA-Z0-9\-]{
    
    0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{
    
    2,6}(\/)

7.4.ip

127.0.0.1

((?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){
    
    3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d))

7.5. Account verification

pikachues_001 Start with a letter, allow 5-16 bytes, allow alphanumeric underscores

^[a-zA-Z][a-zA-Z0-9_]{
    
    4,15}$

7.6. Character verification

Chinese character

^[\u4e00-\u9fa5]{
    
    0,}$

English and numbers

A string consisting of numbers and 26 English letters

^[A-Za-z0-9]+$

A string of 26 English letters

^[A-Za-z]+$

A string of 26 uppercase English letters

^[A-Z]+$

A string consisting of 26 lowercase English letters

^[a-z]+$

All characters with a length of 3-20

^.{
    
    3,20}$

Chinese, English, numbers including underscore

^[\u4E00-\u9FA5A-Za-z0-9_]+$

Chinese, English, numbers but not including underscore and other symbols

^[\u4E00-\u9FA5A-Za-z0-9]+$

7.7. Digital verification

Integer

^-?[1-9]\d*$

Positive integer

^[1-9]\d*$

Negative integer

^-[1-9]\d*$

Non-negative integer

^[1-9]\d*|0$

Non-positive integer

^-[1-9]\d*|0$

Floating point

^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$

Number of positive floating points

^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$

Floating point number

^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$

Non-negative floating point

^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$

Non-normal floating point number

^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$

8. More extensions

Extended book

Guess you like

Origin blog.csdn.net/qq_41262903/article/details/111249140